# 6.2 Project 2: Convolutional Neural Networks (CNNs) in Medical Image Analysis

Business understanding – A medical research institute wants to develop an automated system to detect lung cancer in CT scans.

Data understanding – The institute has a dataset of CT scans labeled as either cancerous or non-cancerous.

Data preparation – We will preprocess the data by normalizing the pixel values and resizing the images to a uniform size. We will also split the data into training, validation, and test sets.

Modeling – We will apply a CNN to the image dataset to classify the CT scans as cancerous or non-cancerous. We will experiment with different CNN architectures and hyperparameters to optimize the model's performance.

Evaluation – We will evaluate the CNN model's performance using metrics such as accuracy, precision, recall, and F1-score. We will also compare the CNN model's performance to other image classification algorithms such as SVMs and decision trees.

## Data Preparation:

In this step, we will preprocess the data by normalizing the pixel values and resizing the images to a uniform size. We will also split the data into training, validation, and test sets.

In [None]:
# Load and pre-process the data
import numpy as np
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Define the data directories
train_dir = 'lung_cancer/train'
val_dir = 'lung_cancer/val'
test_dir = 'lung_cancer/test'

# Set the target image size and batch size
img_size = (64, 64)
batch_size = 32

# Define the data generators with data augmentation for training data
train_datagen = ImageDataGenerator(rescale=1./255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True)
train_generator = train_datagen.flow_from_directory(train_dir, target_size=img_size, batch_size=batch_size, class_mode='binary')

# Define the data generator without data augmentation for validation and test data
val_datagen = ImageDataGenerator(rescale=1./255)
val_generator = val_datagen.flow_from_directory(val_dir, target_size=img_size, batch_size=batch_size, class_mode='binary')
test_generator = val_datagen.flow_from_directory(test_dir, target_size=img_size, batch_size=batch_size, class_mode='binary')


## Modeling:

In this step, we will apply a CNN to the image dataset to classify the CT scans as cancerous or non-cancerous. We will experiment with different CNN architectures and hyperparameters to optimize the model's performance.

In [None]:
# Define the CNN model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(img_size[0], img_size[1], 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
history = model.fit(train_generator, steps_per_epoch=len(train_generator), epochs=20, validation_data=val_generator, validation_steps=len(val_generator))


## Evaluation:

In this step, we will evaluate the CNN model's performance using metrics such as accuracy, precision, recall, and F1-score. We will also compare the CNN model's performance to other image classification algorithms such as SVMs and decision trees.

In [None]:
# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(test_generator, steps=len(test_generator))
print('Test accuracy:', test_acc)

# Make predictions on the test set
predictions = model.predict(test_generator)
predictions = np.round(predictions)

# Calculate evaluation metrics
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

acc = accuracy_score(test_generator.classes, predictions)
prec = precision_score(test_generator.classes, predictions)
rec = recall_score(test_generator.classes, predictions)
f1 = f1_score(test_generator.classes, predictions)

print('Accuracy:', acc)
print('Precision:', prec)
print('Recall:', rec)
print('F1-score:', f1)
