## 1. Installation Verification

Let's verify that all required packages are installed correctly.

In [1]:
# Check Python version
import sys
print(f"Python version: {sys.version}")
print(f"Python executable: {sys.executable}")

# Import core libraries
import torch
import torchvision
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path

print("\n‚úÖ Core Libraries:")
print(f"  PyTorch: {torch.__version__}")
print(f"  TorchVision: {torchvision.__version__}")
print(f"  NumPy: {np.__version__}")
print(f"  Pandas: {pd.__version__}")

# Check CUDA availability
print(f"\nüñ•Ô∏è Compute Device:")
print(f"  CUDA Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"  CUDA Version: {torch.version.cuda}")
    print(f"  GPU Device: {torch.cuda.get_device_name(0)}")
    print(f"  GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print(f"  Running on CPU")
    if hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
        print(f"  MPS (Apple Silicon) Available: True")

Python version: 3.13.1 (tags/v3.13.1:0671451, Dec  3 2024, 19:06:28) [MSC v.1942 64 bit (AMD64)]
Python executable: c:\Users\Asus\x-lite-chest-xray\.venv\Scripts\python.exe

‚úÖ Core Libraries:
  PyTorch: 2.9.1+cpu
  TorchVision: 0.24.1+cpu
  NumPy: 2.2.6
  Pandas: 2.3.3

üñ•Ô∏è Compute Device:
  CUDA Available: False
  Running on CPU


## 2. Project Configuration

Load the project configuration and disease labels.

In [2]:
# Add project root to path
project_root = Path.cwd().parent.parent
sys.path.insert(0, str(project_root))

print(f"üìç Current Directory: {Path.cwd()}")
print(f"üìÅ Project Root: {project_root}")

# Import project configuration
from config import Config, DISEASE_LABELS, NUM_CLASSES, DISEASE_DESCRIPTIONS

print("\nüìÅ Project Structure:")
print(f"  Root Directory: {Config.ROOT_DIR}")
print(f"  Data Directory: {Config.DATA_DIR}")
print(f"  Checkpoint Directory: {Config.CHECKPOINT_DIR}")
print(f"  Logs Directory: {Config.LOGS_DIR}")

print(f"\nüè• Dataset Information:")
print(f"  Dataset: {Config.DATASET_NAME}")
print(f"  Number of Disease Classes: {NUM_CLASSES}")
print(f"  Image Size: {Config.IMAGE_SIZE}x{Config.IMAGE_SIZE}")

print(f"\nüìã Disease Classes:")
for i, disease in enumerate(DISEASE_LABELS, 1):
    print(f"  {i:2d}. {disease}")

# Create necessary directories
Config.create_directories()
print(f"\n‚úÖ Directories created successfully!")

üìç Current Directory: c:\Users\Asus\x-lite-chest-xray\notebooks\local
üìÅ Project Root: c:\Users\Asus\x-lite-chest-xray

üìÅ Project Structure:
  Root Directory: c:\Users\Asus\x-lite-chest-xray
  Data Directory: c:\Users\Asus\x-lite-chest-xray\data
  Checkpoint Directory: c:\Users\Asus\x-lite-chest-xray\ml\models\checkpoints
  Logs Directory: c:\Users\Asus\x-lite-chest-xray\logs

üè• Dataset Information:
  Dataset: ChestX-ray14
  Number of Disease Classes: 14
  Image Size: 224x224

üìã Disease Classes:
   1. Atelectasis
   2. Cardiomegaly
   3. Effusion
   4. Infiltration
   5. Mass
   6. Nodule
   7. Pneumonia
   8. Pneumothorax
   9. Consolidation
  10. Edema
  11. Emphysema
  12. Fibrosis
  13. Pleural_Thickening
  14. Hernia

‚úÖ Directories created successfully!


## 3. Dataset Status Check

Check if the ChestX-ray14 dataset is available.

In [3]:
# Check dataset availability
print("üìä Dataset Status:")

metadata_exists = Config.METADATA_CSV.exists()
print(f"  Metadata CSV: {'‚úÖ Found' if metadata_exists else '‚ùå Not Found'}")
print(f"    Path: {Config.METADATA_CSV}")

raw_data_exists = Config.RAW_DATA_DIR.exists()
print(f"  Raw Data Directory: {'‚úÖ Found' if raw_data_exists else '‚ùå Not Found'}")
print(f"    Path: {Config.RAW_DATA_DIR}")

if raw_data_exists:
    # Count images
    image_count = len(list(Config.RAW_DATA_DIR.glob("**/*.png")))
    print(f"  Number of Images: {image_count:,}")
else:
    print("\n‚ö†Ô∏è  Dataset not found!")
    print("\nüì• To download the dataset, run:")
    print("   python scripts/download_chestxray14.py --metadata-only  # Quick start")
    print("   python scripts/download_chestxray14.py                  # Full dataset (~45GB)")

# If metadata exists, load and show statistics
if metadata_exists:
    df = pd.read_csv(Config.METADATA_CSV)
    print(f"\nüìà Dataset Statistics:")
    print(f"  Total Images: {len(df):,}")
    print(f"  Columns: {list(df.columns)}")

üìä Dataset Status:
  Metadata CSV: ‚úÖ Found
    Path: c:\Users\Asus\x-lite-chest-xray\data\Data_Entry_2017.csv
  Raw Data Directory: ‚úÖ Found
    Path: c:\Users\Asus\x-lite-chest-xray\data\raw
  Number of Images: 0

üìà Dataset Statistics:
  Total Images: 8
  Columns: [' <!DOCTYPE html><html lang="en-US"><head><meta name="robots" content="noindex', ' nofollow"><title>Box - Free Online File Storage', ' Internet File Sharing', ' Access Documents &amp; Files Anywhere', ' Backup Data', ' Share Files</title><link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" /><link href="https://cdn01.boxcdn.net/_assets/css/transition/style_not_found-pwZoby.css" rel="stylesheet" type="text/css" media="screen" /></head><body>   <div class="center"> <a href="https://nihcc.app.box.com/" class="logo" ><img src="https://cdn01.boxcdn.net/_assets/img/not_found_box_logo-Czx5Gh.png" border="0" alt="Box" title="Box" /></a><div class="error_message   error_message_not_found      "><h2>  This shared

## 4. Model Architecture Options

Let's explore the model architectures we'll experiment with.

In [4]:
import timm

print("üèóÔ∏è Available Model Architectures:\n")

print("üìö TEACHER MODEL:")
print(f"  Backbone: {Config.TEACHER_BACKBONE}")
print(f"  Purpose: High-performance reference model for knowledge transfer")

print("\nüéì STUDENT MODEL BACKBONES (Lightweight):")
for i, backbone in enumerate(Config.STUDENT_BACKBONES, 1):
    # Check if model exists in timm
    is_available = backbone in timm.list_models()
    status = "‚úÖ" if is_available else "‚ö†Ô∏è"
    print(f"  {i}. {status} {backbone}")

print("\nüîç ATTENTION MECHANISMS:")
for i, attention in enumerate(Config.ATTENTION_TYPES, 1):
    print(f"  {i}. {attention.upper()}")
    if attention == 'mhsa':
        print(f"     ‚Üí Multi-Head Self-Attention (standard Transformer)")
    elif attention == 'performer':
        print(f"     ‚Üí Performer (linear complexity attention)")
    elif attention == 'linear':
        print(f"     ‚Üí Linear Attention (efficient variant)")
    elif attention == 'none':
        print(f"     ‚Üí CNN-only baseline (no attention)")

print("\nüí° Experiment Strategy:")
print("  We will test different combinations of:")
print("  ‚Ä¢ CNN Backbones √ó Attention Mechanisms")
print("  ‚Ä¢ Temperature values for knowledge distillation")
print("  ‚Ä¢ Loss function weights (alpha)")
print("  ‚Ä¢ Learning rates and batch sizes")
print("\n  Goal: Find optimal configuration for accuracy + efficiency!")

  from .autonotebook import tqdm as notebook_tqdm


üèóÔ∏è Available Model Architectures:

üìö TEACHER MODEL:
  Backbone: densenet121
  Purpose: High-performance reference model for knowledge transfer

üéì STUDENT MODEL BACKBONES (Lightweight):
  1. ‚úÖ efficientnet_b0
  2. ‚úÖ convnext_tiny
  3. ‚úÖ mobilenetv3_large_100
  4. ‚úÖ resnet50

üîç ATTENTION MECHANISMS:
  1. MHSA
     ‚Üí Multi-Head Self-Attention (standard Transformer)
  2. PERFORMER
     ‚Üí Performer (linear complexity attention)
  3. LINEAR
     ‚Üí Linear Attention (efficient variant)
  4. NONE
     ‚Üí CNN-only baseline (no attention)

üí° Experiment Strategy:
  We will test different combinations of:
  ‚Ä¢ CNN Backbones √ó Attention Mechanisms
  ‚Ä¢ Temperature values for knowledge distillation
  ‚Ä¢ Loss function weights (alpha)
  ‚Ä¢ Learning rates and batch sizes

  Goal: Find optimal configuration for accuracy + efficiency!


## 5. Training Configuration

Review the baseline training hyperparameters (these will be optimized during experiments).

In [5]:
print("‚öôÔ∏è BASELINE TRAINING CONFIGURATION:\n")

print("üéì Teacher Model:")
print(f"  Epochs: {Config.TEACHER_EPOCHS}")
print(f"  Batch Size: {Config.TEACHER_BATCH_SIZE}")
print(f"  Learning Rate: {Config.TEACHER_LR}")
print(f"  Weight Decay: {Config.TEACHER_WEIGHT_DECAY}")

print("\nüèÉ Student Model:")
print(f"  Epochs: {Config.STUDENT_EPOCHS}")
print(f"  Batch Size: {Config.STUDENT_BATCH_SIZE}")
print(f"  Learning Rate: {Config.STUDENT_LR}")
print(f"  Weight Decay: {Config.STUDENT_WEIGHT_DECAY}")

print("\nüî• Knowledge Distillation:")
print(f"  Temperature (œÑ): {Config.KD_TEMPERATURE} (will test: 2, 4, 6, 8)")
print(f"  Alpha (Œ±): {Config.KD_ALPHA} (will test: 0.5, 0.7, 0.9)")
print(f"    ‚Ä¢ Distillation Loss Weight: {Config.KD_ALPHA}")
print(f"    ‚Ä¢ Hard Loss Weight: {1 - Config.KD_ALPHA}")

print("\nüìä Data Configuration:")
print(f"  Train Split: {Config.TRAIN_SPLIT * 100}%")
print(f"  Validation Split: {Config.VAL_SPLIT * 100}%")
print(f"  Test Split: {Config.TEST_SPLIT * 100}%")

print("\n‚è±Ô∏è Training Settings:")
print(f"  Early Stopping Patience: {Config.EARLY_STOPPING_PATIENCE} epochs")
print(f"  Mixed Precision Training: {Config.MIXED_PRECISION}")
print(f"  Gradient Clipping: {Config.GRADIENT_CLIP_NORM}")

print("\nüí° Note: These are baseline values.")
print("   We'll use hyperparameter optimization to find the best configuration!")

‚öôÔ∏è BASELINE TRAINING CONFIGURATION:

üéì Teacher Model:
  Epochs: 30
  Batch Size: 32
  Learning Rate: 0.0001
  Weight Decay: 0.0001

üèÉ Student Model:
  Epochs: 40
  Batch Size: 64
  Learning Rate: 0.001
  Weight Decay: 0.0001

üî• Knowledge Distillation:
  Temperature (œÑ): 4.0 (will test: 2, 4, 6, 8)
  Alpha (Œ±): 0.7 (will test: 0.5, 0.7, 0.9)
    ‚Ä¢ Distillation Loss Weight: 0.7
    ‚Ä¢ Hard Loss Weight: 0.30000000000000004

üìä Data Configuration:
  Train Split: 70.0%
  Validation Split: 15.0%
  Test Split: 15.0%

‚è±Ô∏è Training Settings:
  Early Stopping Patience: 7 epochs
  Mixed Precision Training: True
  Gradient Clipping: 1.0

üí° Note: These are baseline values.
   We'll use hyperparameter optimization to find the best configuration!


## 6. Quick Backend API Test

Test if the backend API can start (basic functionality check).

In [6]:
# Test backend services import
try:
    from backend.services.image_service import ImageService
    from backend.services.prediction_service import PredictionService
    from backend.services.report_service import ReportService
    
    print("‚úÖ Backend Services:")
    print("  ‚Ä¢ ImageService - ‚úì Ready")
    print("  ‚Ä¢ PredictionService - ‚úì Ready")
    print("  ‚Ä¢ ReportService - ‚úì Ready")
    
    # Initialize services
    image_service = ImageService()
    prediction_service = PredictionService()
    report_service = ReportService()
    
    print("\nüì° API Endpoints:")
    print("  ‚Ä¢ GET  /api/health - Health check")
    print("  ‚Ä¢ POST /api/upload - Upload X-ray image")
    print("  ‚Ä¢ POST /api/predict - Run prediction")
    print("  ‚Ä¢ POST /api/report/generate - Generate PDF report")
    
    print("\nüí° To start the API server, run:")
    print("   cd backend && python app.py")
    print("\n   Then visit: http://localhost:8000/api/docs")
    
except ImportError as e:
    print(f"‚ö†Ô∏è Backend import error: {e}")
    print("   Some dependencies may be missing.")

‚úÖ Backend Services:
  ‚Ä¢ ImageService - ‚úì Ready
  ‚Ä¢ PredictionService - ‚úì Ready
  ‚Ä¢ ReportService - ‚úì Ready

üì° API Endpoints:
  ‚Ä¢ GET  /api/health - Health check
  ‚Ä¢ POST /api/upload - Upload X-ray image
  ‚Ä¢ POST /api/predict - Run prediction
  ‚Ä¢ POST /api/report/generate - Generate PDF report

üí° To start the API server, run:
   cd backend && python app.py

   Then visit: http://localhost:8000/api/docs


## 7. Next Steps üéØ

Based on your project status, here are the recommended next steps:

In [7]:
print("üó∫Ô∏è PROJECT ROADMAP:\n")

steps = [
    {
        "phase": "Phase 1: Data Preparation",
        "tasks": [
            "Download ChestX-ray14 dataset",
            "Run 01_data_exploration.ipynb",
            "Analyze class distribution and imbalance",
            "Test data augmentation strategies"
        ]
    },
    {
        "phase": "Phase 2: Teacher Model",
        "tasks": [
            "Implement DenseNet121 teacher model",
            "Train baseline teacher model",
            "Evaluate teacher performance (target: AUC > 0.80)",
            "Save best teacher checkpoint"
        ]
    },
    {
        "phase": "Phase 3: Student Models",
        "tasks": [
            "Implement lightweight CNN backbones",
            "Add transformer attention modules",
            "Test different fusion strategies",
            "Baseline student training (no distillation)"
        ]
    },
    {
        "phase": "Phase 4: Knowledge Distillation",
        "tasks": [
            "Implement distillation loss function",
            "Hyperparameter optimization (temperature, alpha)",
            "Train student models with distillation",
            "Compare student vs teacher performance"
        ]
    },
    {
        "phase": "Phase 5: Optimization",
        "tasks": [
            "Model compression and pruning",
            "Quantization for faster inference",
            "Benchmark CPU inference speed",
            "Optimize for <500ms latency"
        ]
    },
    {
        "phase": "Phase 6: Web Application",
        "tasks": [
            "Complete backend API implementation",
            "Build React frontend UI",
            "Implement Grad-CAM visualization",
            "PDF report generation",
            "End-to-end testing"
        ]
    },
    {
        "phase": "Phase 7: Deployment",
        "tasks": [
            "Dockerization",
            "CPU-optimized deployment",
            "Documentation and user guide",
            "Final evaluation and thesis writing"
        ]
    }
]

for i, step in enumerate(steps, 1):
    print(f"{'='*60}")
    print(f"{i}. {step['phase']}")
    print(f"{'='*60}")
    for task in step['tasks']:
        print(f"   ‚òê {task}")
    print()

print(f"{'='*60}")
print("üöÄ START HERE:")
print(f"{'='*60}")
if not metadata_exists:
    print("1. Download dataset: python scripts/download_chestxray14.py --metadata-only")
    print("2. Explore data: Open notebooks/01_data_exploration.ipynb")
else:
    print("1. ‚úÖ Dataset ready!")
    print("2. üìä Next: Open notebooks/01_data_exploration.ipynb")
    print("3. üèóÔ∏è Then: Start implementing models in ml/models/")
    
print("\nüìö Documentation:")
print("   ‚Ä¢ Setup Guide: docs/SETUP.md")
print("   ‚Ä¢ Model Architecture: docs/MODEL.md")
print("   ‚Ä¢ API Reference: docs/API.md")

print("\n‚ú® Good luck with your final year project! ‚ú®")

üó∫Ô∏è PROJECT ROADMAP:

1. Phase 1: Data Preparation
   ‚òê Download ChestX-ray14 dataset
   ‚òê Run 01_data_exploration.ipynb
   ‚òê Analyze class distribution and imbalance
   ‚òê Test data augmentation strategies

2. Phase 2: Teacher Model
   ‚òê Implement DenseNet121 teacher model
   ‚òê Train baseline teacher model
   ‚òê Evaluate teacher performance (target: AUC > 0.80)
   ‚òê Save best teacher checkpoint

3. Phase 3: Student Models
   ‚òê Implement lightweight CNN backbones
   ‚òê Add transformer attention modules
   ‚òê Test different fusion strategies
   ‚òê Baseline student training (no distillation)

4. Phase 4: Knowledge Distillation
   ‚òê Implement distillation loss function
   ‚òê Hyperparameter optimization (temperature, alpha)
   ‚òê Train student models with distillation
   ‚òê Compare student vs teacher performance

5. Phase 5: Optimization
   ‚òê Model compression and pruning
   ‚òê Quantization for faster inference
   ‚òê Benchmark CPU inference speed
   ‚òê Opti