# Baseline Model

## Table of Contents
1. [Model Choice](#model-choice)
2. [Feature Selection](#feature-selection)
3. [Implementation](#implementation)
4. [Evaluation](#evaluation)


In [None]:
import os
import tqdm
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import cv2
from glob import glob
import seaborn as sns
import random
from keras.preprocessing import image
import tensorflow as tf

from keras.models import Sequential
from keras.layers import Dense,Dropout,Flatten,Conv2D,MaxPool2D,GlobalAvgPool2D,GlobalMaxPooling2D
from keras.optimizers import RMSprop
from keras.preprocessing.image import ImageDataGenerator

from keras.optimizers import Adam
from sklearn.model_selection import train_test_split


## Model Choice

For the baseline model, we use a simple Convolutional Neural Network (CNN) with one convolutional block and a dense output layer.

This model was chosen because:
- CNNs are well-suited for image classification tasks like detecting Tuberculosis from chest X-rays.
- A small architecture allows fast training and testing.
- It provides a solid benchmark against which more complex architectures (like transfer learning with Inception or ResNet) can be compared.


## Feature Selection

[Indicate which features from the dataset you will be using for the baseline model, and justify your selection.]


In [None]:
# Set dataset directories
train_dir = '/content/drive/MyDrive/Tuberculosis_final - Copy/TB_Chest_Radiography_Database/train'
val_dir = '/content/drive/MyDrive/Tuberculosis_final - Copy/TB_Chest_Radiography_Database/validation'

# Create ImageDataGenerators for normalization
train_datagen = ImageDataGenerator(rescale=1./255)
val_datagen = ImageDataGenerator(rescale=1./255)

# Load images from directories
train_data = train_datagen.flow_from_directory(
    train_dir,
    target_size=(224, 224),
    batch_size=32,
    class_mode='binary'
)

val_data = val_datagen.flow_from_directory(
    val_dir,
    target_size=(224, 224),
    batch_size=32,
    class_mode='binary'
)


## Implementation

[Implement your baseline model here.]



In [None]:
# Load image data
train_data = train_datagen.flow_from_directory(train_dir, target_size=(224, 224), batch_size=32, class_mode='binary')
val_data = val_datagen.flow_from_directory(val_dir, target_size=(224, 224), batch_size=32, class_mode='binary')

# Baseline CNN model
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(224, 224, 3)),
    MaxPooling2D(2, 2),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer=Adam(), loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
history = model.fit(train_data, epochs=5, validation_data=val_data)


## Evaluation

[Clearly state what metrics you will use to evaluate the model's performance. These metrics will serve as a starting point for evaluating more complex models later on.]



In [None]:
# Reset validation generator to avoid batch mismatch
val_data.reset()

# Predictions and true labels
y_pred_probs = model.predict(val_data)
y_pred = (y_pred_probs > 0.5).astype(int)
y_true = val_data.classes

# Metrics
print("Confusion Matrix:")
print(confusion_matrix(y_true, y_pred))
print("\nClassification Report:")
print(classification_report(y_true, y_pred))
