# ü´Å Pneumothorax AI ‚Äî Global Pre-training (Kaggle)
**T√úBƒ∞TAK 2209-A | Ahmet Demir | Dokuz Eyl√ºl √úniversitesi**

**Avantajlar:**
- NIH Chest X-rays veri seti zaten Kaggle'da ‚Äî tekrar indirme yok (~42 GB klasik indirme atlanƒ±r)
- Haftalƒ±k 30 saat √ºcretsiz GPU (P100 veya T4)
- Checkpoint'ler Kaggle Output'a kaydedilir, sonra Drive/GitHub'a kopyalanabilir

**Gereksinimler:**
- Kaggle Notebook ‚Üí Settings ‚Üí Accelerator: **GPU P100** se√ß
- Internet: **ON** (pip install i√ßin)
- Bu notebook'u √ßalƒ±≈ütƒ±rmadan √∂nce NIH dataset'i **Input** olarak ekle:
  - `+ Add Data` ‚Üí Search: `nih-chest-xrays` ‚Üí `NIH Chest X-rays` (NIH Clinical Center)
  - Veri yolu: `/kaggle/input/nih-chest-xrays/`

## 0. GPU Kontrol√º

In [None]:
import torch
print('GPU :', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'YOK ‚Äî Accelerator > GPU se√ß!')
print('CUDA:', torch.version.cuda)
!nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader

## 1. NIH Veri Yolunu Kontrol Et
> Input olarak `nih-chest-xrays` eklendiyse `/kaggle/input/nih-chest-xrays/` altƒ±nda g√∂r√ºnmeli

In [None]:
import os, subprocess

NIH_INPUT = '/kaggle/input/nih-chest-xrays'
if os.path.exists(NIH_INPUT):
    count = int(subprocess.check_output(
        f'find {NIH_INPUT} -name "*.png" | wc -l', shell=True
    ).decode().strip())
    print(f'‚úì NIH veri seti bulundu: {count:,} PNG g√∂r√ºnt√º')
    print('  Klas√∂rler:', os.listdir(NIH_INPUT)[:5])
else:
    print('[!] NIH veri seti bulunamadƒ±!')
    print('    ‚Üí Saƒü panelden "+Add Data" ‚Üí "nih-chest-xrays" ekle')

## 2. Projeyi Klonla

In [None]:
REPO = 'https://github.com/ahmetai-cell/pneumothorax-ai-detection'
PROJECT_DIR = '/kaggle/working/pneumothorax-ai-detection'

if os.path.exists(PROJECT_DIR):
    !cd {PROJECT_DIR} && git pull
else:
    !git clone {REPO}

os.chdir(PROJECT_DIR)
print('√áalƒ±≈üma dizini:', os.getcwd())

## 3. Baƒüƒ±mlƒ±lƒ±klarƒ± Y√ºkle

In [None]:
!pip install -q \
    segmentation-models-pytorch \
    albumentations \
    pydicom \
    pynrrd \
    wandb \
    tqdm \
    fpdf2 \
    plotly

print('‚úì Kurulum tamamlandƒ±')

## 4. NIH Verisini Proje Dizinine Baƒüla
Sembolik link ile `/kaggle/input/nih-chest-xrays` ‚Üí `data/raw/global/nih`

In [None]:
import os

NIH_LOCAL = 'data/raw/global/nih'
os.makedirs('data/raw/global', exist_ok=True)
os.makedirs('data/processed', exist_ok=True)
os.makedirs('checkpoints', exist_ok=True)
os.makedirs('results', exist_ok=True)

NIH_INPUT = '/kaggle/input/nih-chest-xrays'

if os.path.exists(NIH_INPUT):
    if not os.path.exists(NIH_LOCAL):
        os.symlink(NIH_INPUT, NIH_LOCAL)
        print(f'‚úì Symlink olu≈üturuldu: {NIH_LOCAL} ‚Üí {NIH_INPUT}')
    else:
        print(f'‚úì Zaten mevcut: {NIH_LOCAL}')
    # G√∂r√ºnt√º sayƒ±sƒ±nƒ± doƒürula
    count = int(subprocess.check_output(
        f'find {NIH_INPUT} -name "*.png" | wc -l', shell=True
    ).decode().strip())
    print(f'  Toplam PNG: {count:,}')
else:
    print('[!] NIH Input eklenmemi≈ü. NIH_LOCAL zaten varsa devam edilebilir.')
    # Manual path fallback
    for alt in [
        '/kaggle/input/nih-chest-xrays/images',
        '/kaggle/input/chest-xray-nihcc/images',
        '/kaggle/input/nih-chest-xrays',
    ]:
        if os.path.exists(alt):
            print(f'  Alternatif bulundu: {alt}')
            break

## 5. Manifest Olu≈ütur

In [None]:
# NIH i√ßin manifest olu≈ütur (SIIM yoksa sadece NIH)
!python scripts/data_manager.py --build_manifest
!python scripts/unify_annotations.py

import pandas as pd
try:
    df = pd.read_csv('data/processed/master_manifest.csv')
    print(f'\n‚úì Manifest: {len(df):,} kayƒ±t')
    print(f'  Pozitif: {(df["is_pneumo"]==1).sum():,}')
    print(f'  Negatif: {(df["is_pneumo"]==0).sum():,}')
    print(f'  Kaynaklar: {df["source"].value_counts().to_dict()}')
except Exception as e:
    print(f'[!] {e}')

## 6. W&B Giri≈ü (Opsiyonel)

In [None]:
# W&B kullanmak istiyorsan token gir, istemiyorsan h√ºcreyi atla
WANDB_KEY = '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'

if WANDB_KEY:
    !pip install -q --upgrade wandb
    import wandb
    wandb.login(key=WANDB_KEY, relogin=True)
    USE_WANDB = True
    print('‚úì W&B baƒülantƒ±sƒ± tamam')
else:
    USE_WANDB = False
    print('W&B atlandƒ±')

## 7. Pre-training Ba≈ülat

**Kaggle P100'de tahmini s√ºre:**
- 1 epoch (112k g√∂r√ºnt√º, batch=32): ~10 dakika  
- 50 epoch: ~8-9 saat  
- 30 epoch: ~5 saat (√∂neri ‚Äî ilk √ßalƒ±≈ütƒ±rma i√ßin)

> ‚ö†Ô∏è Kaggle session 9 saatte kapanabilir. Checkpoint her fold'dan sonra kaydedilir.

In [None]:
WANDB_FLAG = '' if USE_WANDB else '--no_wandb'
CKPT_DIR   = '/kaggle/working/checkpoints'

!python scripts/train_global.py \
    --sources NIH \
    --encoder efficientnet-b0 \
    --img_size 512 \
    --epochs 30 \
    --batch_size 32 \
    --num_folds 5 \
    --lr 1e-4 \
    {WANDB_FLAG} \
    --checkpoint_dir {CKPT_DIR}

## 8. Sonu√ßlar

In [None]:
import pandas as pd, json, glob

results_csv = f'{CKPT_DIR}/../results/global_kfold_results.csv'
# Alternatif yollarƒ± dene
for p in [results_csv, 'results/global_kfold_results.csv',
          f'{CKPT_DIR}/global_kfold_results.csv']:
    if os.path.exists(p):
        df = pd.read_csv(p)
        print('=== K-FOLD SONU√áLARI ===')
        print(df.to_string(index=False))
        print(f'\nOrtalama Dice : {df["best_dice"].mean():.4f} ¬± {df["best_dice"].std():.4f}')
        print(f'Ortalama AUC  : {df["best_auc"].mean():.4f} ¬± {df["best_auc"].std():.4f}')
        break
else:
    print('Results CSV bulunamadƒ± ‚Äî eƒüitim hen√ºz bitmemi≈ü olabilir')

# Base model meta
for mp in [f'{CKPT_DIR}/global_base_model_meta.json',
           'checkpoints/global_base_model_meta.json']:
    if os.path.exists(mp):
        meta = json.load(open(mp))
        print('\n=== BASE MODEL ===')
        for k, v in meta.items():
            print(f'  {k}: {v}')
        break

# Checkpoint listesi
ckpts = glob.glob(f'{CKPT_DIR}/**/*.pth', recursive=True)
print(f'\n  Kaydedilen checkpoint: {len(ckpts)}')
for c in sorted(ckpts):
    size_mb = os.path.getsize(c) / 1e6
    print(f'  {os.path.basename(c):40s}  {size_mb:.1f} MB')

## 9. Checkpoint'i ƒ∞ndir / Y√ºkle

Kaggle'da Output'a kaydedilen dosyalarƒ± saƒü paneldeki **Output** sekmesinden indirebilirsin.  
Veya Kaggle Dataset olarak kaydet ‚Üí Colab/yerel ortama ekle.

In [None]:
import shutil, os

# T√ºm √∂nemli dosyalarƒ± /kaggle/working/ altƒ±na kopyala (Output'ta g√∂r√ºn√ºr)
OUTPUT = '/kaggle/working'

files_to_copy = [
    (f'{CKPT_DIR}/global_base_model.pth',       f'{OUTPUT}/global_base_model.pth'),
    (f'{CKPT_DIR}/global_base_model_meta.json',  f'{OUTPUT}/global_base_model_meta.json'),
    ('results/global_kfold_results.csv',          f'{OUTPUT}/global_kfold_results.csv'),
]

for src, dst in files_to_copy:
    if os.path.exists(src):
        shutil.copy(src, dst)
        print(f'‚úì {os.path.basename(src)} ‚Üí {dst}')
    else:
        print(f'  - {src} bulunamadƒ±')

# Fold checkpoint'leri de kopyala
for ckpt in glob.glob(f'{CKPT_DIR}/global_folds/*.pth'):
    dst = f'{OUTPUT}/{os.path.basename(ckpt)}'
    shutil.copy(ckpt, dst)
    print(f'‚úì {os.path.basename(ckpt)}')

print(f'\n‚úì T√ºm dosyalar Output sekmesinde indirilebilir durumda')

---
## ‚úÖ Sonraki Adƒ±m ‚Äî Fine-tuning

```bash
# Checkpoint'i yerel makineye indir, sonra:
cp ~/Downloads/global_base_model.pth checkpoints/

# DEU verilerini koy
# data/local/dicom/*.dcm
# data/local/nrrd/*.nrrd

# Fine-tune et (~1-2 saat)
python scripts/fine_tune_local.py --freeze_encoder --epochs 20
```