# ü´Å Pneumothorax AI ‚Äî Global Pre-training
**T√úBƒ∞TAK 2209-A | Ahmet Demir | Dokuz Eyl√ºl √úniversitesi**

Bu notebook:
1. GitHub'dan projeyi klonlar
2. Kaggle NIH ChestX-ray14 veri setini indirir
3. U-Net++ modelini 50 epoch eƒüitir
4. En iyi checkpoint'i Google Drive'a kaydeder

**Tahmini s√ºre:** T4 GPU ile ~12 saat (50 epoch, 112k g√∂r√ºnt√º)

## 0. GPU Kontrol√º

In [None]:
import torch
print('GPU:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'YOK ‚Äî Runtime > GPU se√ß!')
print('CUDA:', torch.version.cuda)
!nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader

## 1. Projeyi Klonla

In [None]:
import os

REPO = 'https://github.com/Ahmet-demir-ai/pneumothorax-ai-detection'
PROJECT_DIR = '/content/pneumothorax-ai-detection'

if os.path.exists(PROJECT_DIR):
    !cd {PROJECT_DIR} && git pull
else:
    !git clone {REPO}

os.chdir(PROJECT_DIR)
print('√áalƒ±≈üma dizini:', os.getcwd())

## 2. Baƒüƒ±mlƒ±lƒ±klarƒ± Y√ºkle

In [None]:
!pip install -q \
    segmentation-models-pytorch \
    albumentations \
    pydicom \
    pynrrd \
    wandb \
    tqdm \
    fpdf2 \
    plotly

print('‚úì Kurulum tamamlandƒ±')

## 3. Kaggle API Token Ayarla

In [None]:
import os

# Kaggle token'ƒ±nƒ± buraya yapƒ±≈ütƒ±r:
KAGGLE_TOKEN = 'KGAT_ab6d3f32951d02e0cc877e406e207b72'  # salihekmen hesabƒ±
KAGGLE_USER  = 'salihekmen'

os.environ['KAGGLE_API_TOKEN'] = KAGGLE_TOKEN
os.environ['KAGGLE_USERNAME']  = KAGGLE_USER

# kaggle.json olu≈ütur (bazƒ± ara√ßlar i√ßin gerekli)
import json
kaggle_dir = os.path.expanduser('~/.kaggle')
os.makedirs(kaggle_dir, exist_ok=True)
with open(f'{kaggle_dir}/kaggle.json', 'w') as f:
    json.dump({'username': KAGGLE_USER, 'key': KAGGLE_TOKEN}, f)
os.chmod(f'{kaggle_dir}/kaggle.json', 0o600)

print('‚úì Kaggle token ayarlandƒ±')

## 4. NIH ChestX-ray14 ƒ∞ndir

In [None]:
NIH_DIR = 'data/raw/global/nih'
os.makedirs(NIH_DIR, exist_ok=True)

# Zaten indirilmi≈üse atla
nih_images = !find {NIH_DIR} -name '*.png' 2>/dev/null | wc -l
n_images = int(nih_images[0].strip())

if n_images > 100000:
    print(f'‚úì NIH zaten mevcut: {n_images:,} g√∂r√ºnt√º')
else:
    print(f'NIH indiriliyor (~42 GB)...')
    !kaggle datasets download \
        -d nih-chest-xrays/data \
        -p {NIH_DIR} \
        --unzip
    nih_images = !find {NIH_DIR} -name '*.png' | wc -l
    print(f'‚úì ƒ∞ndirme tamamlandƒ±: {nih_images[0].strip()} g√∂r√ºnt√º')

## 5. Manifest Olu≈ütur

In [None]:
!python scripts/data_manager.py --build_manifest
!python scripts/unify_annotations.py

## 6. W&B Giri≈ü

In [None]:
import wandb

# wandb.ai/ahmet-ai-t-bi-tak hesabƒ±nla giri≈ü yap
wandb.login()

## 7. Google Drive Baƒüla (Checkpoint Kaydet)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

DRIVE_CKPT = '/content/drive/MyDrive/tubitak_pneumothorax/checkpoints'
os.makedirs(DRIVE_CKPT, exist_ok=True)
print(f'‚úì Drive baƒülandƒ±: {DRIVE_CKPT}')

## 8. Pre-training Ba≈ülat

**T4 GPU'da tahmini s√ºre:**
- 1 epoch (112k g√∂r√ºnt√º, batch=32): ~14 dakika
- 50 epoch: ~12 saat

> ‚ö†Ô∏è Colab oturumu ~12 saatte kapanabilir. Checkpoint her fold'dan sonra kaydedilir.

In [None]:
!python scripts/train_global.py \
    --sources NIH \
    --encoder efficientnet-b0 \
    --img_size 512 \
    --epochs 50 \
    --batch_size 32 \
    --num_folds 5 \
    --lr 1e-4

## 9. Checkpoint'i Drive'a Kopyala

In [None]:
import shutil, glob

# Base model
base_model = 'checkpoints/global_base_model.pth'
if os.path.exists(base_model):
    shutil.copy(base_model, f'{DRIVE_CKPT}/global_base_model.pth')
    shutil.copy('checkpoints/global_base_model_meta.json',
                f'{DRIVE_CKPT}/global_base_model_meta.json')
    print(f'‚úì Base model Drive\'a kopyalandƒ±')

# T√ºm fold checkpoint'leri
for ckpt in glob.glob('checkpoints/global_folds/*.pth'):
    shutil.copy(ckpt, DRIVE_CKPT)
    print(f'  ‚Üí {os.path.basename(ckpt)}')

# Results CSV
results_csv = 'results/global_kfold_results.csv'
if os.path.exists(results_csv):
    shutil.copy(results_csv, DRIVE_CKPT)
    print(f'‚úì Results CSV kopyalandƒ±')

print('\n‚úì T√ºm dosyalar Drive\'a kaydedildi:', DRIVE_CKPT)

## 10. Sonu√ßlar

In [None]:
import pandas as pd
import json

# K-Fold sonu√ßlarƒ±
try:
    df = pd.read_csv('results/global_kfold_results.csv')
    print('=== K-FOLD SONU√áLARI ===')
    print(df.to_string(index=False))
    print(f'\nOrtalama Dice : {df["best_dice"].mean():.4f} ¬± {df["best_dice"].std():.4f}')
    print(f'Ortalama AUC  : {df["best_auc"].mean():.4f} ¬± {df["best_auc"].std():.4f}')
except FileNotFoundError:
    print('Results CSV bulunamadƒ± ‚Äî eƒüitim hen√ºz bitmemi≈ü olabilir')

# Base model meta
try:
    meta = json.load(open('checkpoints/global_base_model_meta.json'))
    print('\n=== BASE MODEL ===')
    for k, v in meta.items():
        print(f'  {k}: {v}')
except FileNotFoundError:
    pass

---
## ‚úÖ Sonraki Adƒ±m ‚Äî Fine-tuning

DEU DICOM + NRRD verileri geldiƒüinde:

```bash
# Checkpoint'i Drive'dan al
cp /content/drive/MyDrive/tubitak_pneumothorax/checkpoints/global_base_model.pth checkpoints/

# DEU verilerini koy
# data/local/dicom/*.dcm
# data/local/nrrd/*.nrrd

# Fine-tune et (~1-2 saat)
python scripts/fine_tune_local.py --freeze_encoder --epochs 20
```