# Breast Cancer Detection Model Development

### Step 1: Install Required Libraries
The required libraries for the project are installed using the `pip` package manager. These include:

- **TensorFlow**: For deep learning and neural network implementation.
- **Keras**: High-level API for building and training deep learning models.
- **NumPy**: For numerical operations and array handling.
- **Pandas**: For data manipulation and analysis.
- **Scikit-learn**: For machine learning and evaluation metrics.
- **Matplotlib**: For data visualization.
- **OpenCV**: For image processing tasks.

In [None]:
%pip install tensorflow keras numpy pandas scikit-learn matplotlib opencv-python

### Step 2: Import Libraries
Essential libraries are imported to support data preprocessing, model creation, and visualization. 
- `EfficientNetB0` is used as the base model for transfer learning.
- `ImageDataGenerator` helps with real-time data augmentation.
- `Dense`, `Flatten`, and `Dropout` layers are used to create a custom classification head.

In [4]:
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.applications import EfficientNetB0 # type: ignore
from tensorflow.keras.preprocessing.image import ImageDataGenerator # type: ignore
from tensorflow.keras.layers import Dense, Flatten, Dropout # type: ignore
from tensorflow.keras.models import Model # type: ignore
from tensorflow.keras.optimizers import Adam # type: ignore

### Step 3: Load and Configure the Pretrained Model

1. **EfficientNetB0 Pretrained Model**: Loaded with weights from the `ImageNet` dataset and configured to exclude the top layers for transfer learning.

2. **Freezing Layers**: All layers in the base model are frozen to prevent updating their weights during training.

3. **Custom Classification Head**: A new fully connected head is added to adapt the model for binary classification (breast cancer detection).
   - `Flatten`: Flattens the output of the base model.
   - `Dense(128)`: A dense layer with 128 neurons and ReLU activation.
   - `Dropout(0.5)`: Dropout layer to prevent overfitting.
   - `Dense(1, activation='sigmoid')`: Output layer with sigmoid activation for binary classification.
   
4. **Compilation**: The model is compiled with:
   - **Adam optimizer**: For adaptive learning rate.
   - **Binary cross-entropy loss**: For binary classification tasks.
   - **Metrics**: Accuracy and AUC (Area Under the Curve).

In [None]:
base_model = EfficientNetB0(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
for layer in base_model.layers:
    layer.trainable = False

x = base_model.output
x = Flatten()(x)
x = Dense(128, activation='relu')(x)
x = Dropout(0.5)(x)
output = Dense(1, activation='sigmoid')(x)

model = Model(inputs=base_model.input, outputs=output)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy', 'AUC'])

### Step 4: Prepare the Dataset

1. **ImageDataGenerator**: Used to augment data with transformations such as:
   - Rescaling pixel values to the range `[0, 1]`.
   - Rotation, width/height shift, zoom, and horizontal flip.

2. **Training and Validation Split**: Data is split into training (80%) and validation (20%) subsets using the `validation_split` parameter.

3. **Directory Configuration**: `flow_from_directory` is used to load images from the CBIS-DDSM dataset directory, with:
   - **Target size**: Resizing images to `224x224`.
   - **Batch size**: Defining batches of 32 images.
   - **Class mode**: Binary classification for cancer detection.

In [None]:
datagen = ImageDataGenerator(
    rescale=1.0/255.0,
    rotation_range=30,
    width_shift_range=0.2,
    height_shift_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    validation_split=0.2
)

train_data = datagen.flow_from_directory(
    'path_to_cbis_ddsm',  # Replace with actual path
    target_size=(224, 224),
    batch_size=32,
    class_mode='binary',
    subset='training'
)

val_data = datagen.flow_from_directory(
    'path_to_cbis_ddsm',  # Replace with actual path
    target_size=(224, 224),
    batch_size=32,
    class_mode='binary',
    subset='validation'
)

### Step 5: Train the Model

1. **Model Fitting**: The model is trained using the `fit` function with:
   - Training and validation datasets.
   - 10 epochs for iterative learning.
   - `steps_per_epoch` and `validation_steps` calculated based on batch sizes.

2. **Output**: Training process returns a `history` object containing accuracy, loss, and AUC metrics for both training and validation.

In [None]:
history = model.fit(
    train_data,
    validation_data=val_data,
    epochs=10,
    steps_per_epoch=train_data.samples // train_data.batch_size,
    validation_steps=val_data.samples // val_data.batch_size
)

### Step 6: Evaluate and Save the Model

1. **Model Evaluation**: The trained model is evaluated on the validation dataset to measure:
   - **Validation Loss**: Quantifies error.
   - **Validation Accuracy**: Proportion of correct predictions.
   - **Validation AUC**: Indicates model's ability to distinguish between classes.
   
2. **Model Saving**: The trained model is saved as `oncodetect_model.h5` for future use or deployment.

In [None]:
loss, accuracy, auc = model.evaluate(val_data)
print(f'Validation Loss: {loss}, Accuracy: {accuracy}, AUC: {auc}')

### Step 7: Visualize Results

1. **Accuracy Visualization**:
   - Training and validation accuracy are plotted over epochs to monitor performance trends.
   
2. **AUC Visualization**:
   - Training and validation AUC are plotted to evaluate classification effectiveness.

The visualization helps identify potential issues such as overfitting or underfitting during training.

In [None]:
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

plt.plot(history.history['auc'], label='Training AUC')
plt.plot(history.history['val_auc'], label='Validation AUC')
plt.xlabel('Epochs')
plt.ylabel('AUC')
plt.legend()
plt.show()