# Pneumonia Detection - Model Training (Simplified)

This notebook trains a lightweight CNN model for pneumonia detection.

**Note**: Due to TensorFlow environment issues, this demonstrates the training pipeline. For full training, use a clean virtual environment or Google Colab.

## Setup

```python
# Required packages
import os
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import layers, models
```

## Data Preparation

```python
# Data paths
train_dir = '../data/train'
test_dir = '../data/test'

# Image parameters
img_size = (224, 224)
batch_size = 32

# Data augmentation for training
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    validation_split=0.2
)

test_datagen = ImageDataGenerator(rescale=1./255)

# Create generators
train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=img_size,
    batch_size=batch_size,
    class_mode='binary',
    subset='training'
)

val_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=img_size,
    batch_size=batch_size,
    class_mode='binary',
    subset='validation'
)

test_generator = test_datagen.flow_from_directory(
    test_dir,
    target_size=img_size,
    batch_size=batch_size,
    class_mode='binary'
)

print(f"Training samples: {train_generator.samples}")
print(f"Validation samples: {val_generator.samples}")
print(f"Test samples: {test_generator.samples}")
```

## Model Architecture

```python
# Build a simple CNN
model = models.Sequential([
    # Block 1
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(224, 224, 3)),
    layers.MaxPooling2D((2, 2)),
    
    # Block 2
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    # Block 3
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    # Dense layers
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid')
])

model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

model.summary()
```

## Training

```python
# Train the model
history = model.fit(
    train_generator,
    epochs=10,
    validation_data=val_generator,
    callbacks=[
        tf.keras.callbacks.EarlyStopping(patience=3, restore_best_weights=True),
        tf.keras.callbacks.ModelCheckpoint('../models/simple_cnn.h5', save_best_only=True)
    ]
)

# Save model
model.save('../models/simple_cnn_final.h5')
```

## Results Visualization

```python
# Plot training history
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Accuracy
ax1.plot(history.history['accuracy'], label='Train Accuracy')
ax1.plot(history.history['val_accuracy'], label='Val Accuracy')
ax1.set_title('Model Accuracy')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Accuracy')
ax1.legend()
ax1.grid(True)

# Loss
ax2.plot(history.history['loss'], label='Train Loss')
ax2.plot(history.history['val_loss'], label='Val Loss')
ax2.set_title('Model Loss')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Loss')
ax2.legend()
ax2.grid(True)

plt.tight_layout()
plt.savefig('../results/training_history.png')
plt.show()
```

## Evaluation

```python
# Evaluate on test set
test_loss, test_acc = model.evaluate(test_generator)
print(f"\nTest Accuracy: {test_acc:.4f}")
print(f"Test Loss: {test_loss:.4f}")
```

## Results Summary

**Dataset**:
- Training: 5,216 images (4,173 train / 1,043 validation)
- Test: 624 images
- Classes: NORMAL (26%) vs PNEUMONIA (74%)

**Model**: Simple 3-block CNN
- Parameters: ~1.5M
- Training time: ~10-15 minutes (GPU)

**Expected Performance**:
- Training Accuracy: 85-90%
- Validation Accuracy: 80-85%
- Test Accuracy: 80-85%

**Note**: For better results, use transfer learning with VGG16 or ResNet50 (see `src/models.py`)

## To Run This Notebook

1. Fix TensorFlow installation:
```bash
python -m venv venv
source venv/bin/activate
pip install tensorflow numpy matplotlib pillow
```

2. Or use Google Colab:
- Upload this notebook to Colab
- Upload the data folder or mount Google Drive
- Run all cells with free GPU