<a href="https://colab.research.google.com/github/Rafsan7238/BracU_Thesis_P2/blob/main/Pre_Thesis_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Application of Deep Convolutional Neural Network in Breast Cancer Prediction Using Digital Mammograms**

**Authors:** Rafsan Al Mamun, Md. Al Imran Sefat, Gazi Abu Rafin, Adnan

**Project GitHub Link:** [https://github.com/Rafsan7238/BracU_Thesis_P2](https://github.com/Rafsan7238/BracU_Thesis_P2)



---



***Abstract:*** *Cancer, a diagnosis that is so dreaded and scary, that its fear alone can strike even the
strongest of souls. The disease is often thought of as untreatable and unbearably painful, with
usually, no cure available. Among all the cancers, breast cancer is the second most deadliest ,
especially among women. What decides the patients’ fate is the early diagnosis of the cancer,
facilitating subsequent clinical management. Mammography plays a vital role in the
screening of breast cancers as it can detect any breast masses or calcifications early. However,
the extremely dense breast tissues pose difficulty in the detection of cancer mass, thus,
encouraging the use of machine learning (ML) techniques and artificial neural networks
(ANN) to assist radiologists in faster cancer diagnosis. This paper explores the MIAS
database, containing 332 digital mammograms from women, which were augmented and
preprocessed, and fed into different convolutional neural network (CNN) models, with the
aim of differentiating healthy tissues from cancerous ones with high accuracy. The paper,
along with a new proposed CNN model for better identification of breast cancer, focuses on
the significance of computer-aided detection (CAD) models overall in the early diagnosis of
breast cancer. While a diagnosis of breast cancer may still leave patients dreaded, we believe
our research can be a symbol of hope for all.*

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## **Data Collection and Preprocessing**

In [None]:
pip install -U albumentations

Collecting albumentations
  Downloading albumentations-1.0.3-py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 4.1 MB/s 
[?25hCollecting opencv-python-headless>=4.1.1
  Downloading opencv_python_headless-4.5.3.56-cp37-cp37m-manylinux2014_x86_64.whl (37.1 MB)
[K     |████████████████████████████████| 37.1 MB 47 kB/s 
Installing collected packages: opencv-python-headless, albumentations
  Attempting uninstall: albumentations
    Found existing installation: albumentations 0.1.12
    Uninstalling albumentations-0.1.12:
      Successfully uninstalled albumentations-0.1.12
Successfully installed albumentations-1.0.3 opencv-python-headless-4.5.3.56


In [None]:
import cv2
from google.colab.patches import cv2_imshow
import os
import albumentations as A

def load_data():
  """
    Load image data from directory '/content/drive/MyDrive/Thesis/Dataset'.

    Load each image file from the subdirectories of Dataset, turn it into B/W and augment it. 
    After each augmentation append the images into the images list, and their corresponding
    labels in the labels list. 

    Return tuple `(images, labels)`. `images` should be a list of all
    of the images in the data directory, where each image is formatted as a
    numpy ndarray with dimensions 224 x 224 x 3. `labels` should
    be a list of labels (0 for healthy or 1 for cancer), representing the categories for each of the
    corresponding `images`.
    """

  images = []
  labels = []

  # Loop through the healthy dataset
  directory_path = "/content/drive/MyDrive/Thesis/Dataset/Healthy"
  os.chdir(directory_path)
  count = 1

  for file in os.listdir():
    if file.endswith(".jpg"):

      print(f"Working with {count}  healthy images out of 272")
      count += 1

      file_path = os.path.join(directory_path, file)
      img = cv2.imread(file_path)
      
      #TODO

      # 1. Resize to 224*224      
      img = cv2.resize(img, (224,224), interpolation = cv2.INTER_AREA)
      
      images.append(img)
      labels.append(0)

      # 2. Turn to LAB for CLAHE
      image_lab = cv2.cvtColor(img, cv2.COLOR_BGR2LAB)
      lab_planes = cv2.split(image_lab)
      
      # 3. CLAHE
      # The declaration of CLAHE 
      # Apply CLAHE on the luminescence channel
      # clipLimit -> Threshold for contrast limiting
      clahe = cv2.createCLAHE(clipLimit = 5)
      lab_planes[0] = clahe.apply(lab_planes[0])

      # Merge the LAB planes into an LAB image, and convert it back to RGB
      image_lab = cv2.merge(lab_planes)
      final_img = cv2.cvtColor(image_lab, cv2.COLOR_LAB2BGR)

      images.append(final_img)
      labels.append(0)

      # 4. augmentation and append


      # 4.a) rotate 10  
      transform = A.Compose([
        A.Rotate(10)
      ])
      transformed = transform(image = final_img)
      transformed_image = transformed["image"]

      images.append(transformed_image)
      labels.append(0)

      # 4.b) rotate 20 
      transform = A.Compose([
        A.Rotate(20)
      ])
      transformed = transform(image = final_img)
      transformed_image = transformed["image"]

      images.append(transformed_image)
      labels.append(0)

      # 4.c) horizontal flip 
      transform = A.Compose([
        A.HorizontalFlip(p=1)
      ])
      transformed = transform(image = final_img)
      transformed_image = transformed["image"]

      images.append(transformed_image)
      labels.append(0)

       # 4.d) vertical flip 
      transform = A.Compose([
        A.VerticalFlip(p=1)
      ])
      transformed = transform(image = final_img)
      transformed_image = transformed["image"]

      images.append(transformed_image)
      labels.append(0)

      # 4.e) random tone curve 
      transform = A.Compose([
        A.RandomToneCurve(always_apply = True)
      ])
      transformed = transform(image = final_img)
      transformed_image = transformed["image"]
      
      images.append(transformed_image)
      labels.append(0)

      # 4.f) GaussNoise 
      transform = A.Compose([
        A.GaussNoise()
      ])
      transformed = transform(image = final_img)
      transformed_image = transformed["image"]
      
      images.append(transformed_image)
      labels.append(0)


      # 4.g) Blur 
      transform = A.Compose([
        A.Blur()
      ])
      transformed = transform(image = final_img)
      transformed_image = transformed["image"]
     
      images.append(transformed_image)
      labels.append(0)

    

  ##### Same for cancer dataset

  # Loop through the cancer dataset
  directory_path = "/content/drive/MyDrive/Thesis/Dataset/Cancer"
  os.chdir(directory_path)
  count = 1

  for file in os.listdir():
    if file.endswith(".jpg"):
     
      print(f"Working with {count} cancer images out of 50")
      count += 1

      file_path = os.path.join(directory_path, file)
      img = cv2.imread(file_path)
      
      #TODO

      # 1. Resize to 224*224      
      img = cv2.resize(img, (224,224), interpolation = cv2.INTER_AREA)
     
      images.append(img)
      labels.append(1)

      # 2. Turn to LAB for CLAHE
      image_lab = cv2.cvtColor(img, cv2.COLOR_BGR2LAB)
      lab_planes = cv2.split(image_lab)
      
      # 3. CLAHE
      # The declaration of CLAHE 
      # Apply CLAHE on the luminescence channel
      # clipLimit -> Threshold for contrast limiting
      clahe = cv2.createCLAHE(clipLimit = 5)
      lab_planes[0] = clahe.apply(lab_planes[0])

      # Merge the LAB planes into an LAB image, and convert it back to RGB
      image_lab = cv2.merge(lab_planes)
      final_img = cv2.cvtColor(image_lab, cv2.COLOR_LAB2BGR)

      images.append(final_img)
      labels.append(1)

      # 4. augmentation and append


      # 4.a) rotate 10  
      transform = A.Compose([
        A.Rotate(10)
      ])
      transformed = transform(image = final_img)
      transformed_image = transformed["image"]
      
      images.append(transformed_image)
      labels.append(1)

      # 4.b) rotate 20 
      transform = A.Compose([
        A.Rotate(20)
      ])
      transformed = transform(image = final_img)
      transformed_image = transformed["image"]
      
      images.append(transformed_image)
      labels.append(1)

      # 4.c) horizontal flip 
      transform = A.Compose([
        A.HorizontalFlip(p=1)
      ])
      transformed = transform(image = final_img)
      transformed_image = transformed["image"]
      
      images.append(transformed_image)
      labels.append(1)

       # 4.d) vertical flip 
      transform = A.Compose([
        A.VerticalFlip(p=1)
      ])
      transformed = transform(image = final_img)
      transformed_image = transformed["image"]
     
      images.append(transformed_image)
      labels.append(1)

      # 4.e) random tone curve 
      transform = A.Compose([
        A.RandomToneCurve(always_apply = True)
      ])
      transformed = transform(image = final_img)
      transformed_image = transformed["image"]
      
      images.append(transformed_image)
      labels.append(1)

      # 4.f) GaussNoise 
      transform = A.Compose([
        A.GaussNoise()
      ])
      transformed = transform(image = final_img)
      transformed_image = transformed["image"]
     
      images.append(transformed_image)
      labels.append(1)


      # 4.g) Blur 
      transform = A.Compose([
        A.Blur()
      ])
      transformed = transform(image = final_img)
      transformed_image = transformed["image"]
      
      images.append(transformed_image)
      labels.append(1)

  # Return a tuple of (images, labels)

  return (images, labels)

## **Image Visualisation**

In [None]:
images, labels = load_data()

Working with 1  healthy images out of 272
Working with 2  healthy images out of 272
Working with 3  healthy images out of 272
Working with 4  healthy images out of 272
Working with 5  healthy images out of 272
Working with 6  healthy images out of 272
Working with 7  healthy images out of 272
Working with 8  healthy images out of 272
Working with 9  healthy images out of 272
Working with 10  healthy images out of 272
Working with 11  healthy images out of 272
Working with 12  healthy images out of 272
Working with 13  healthy images out of 272
Working with 14  healthy images out of 272
Working with 15  healthy images out of 272
Working with 16  healthy images out of 272
Working with 17  healthy images out of 272
Working with 18  healthy images out of 272
Working with 19  healthy images out of 272
Working with 20  healthy images out of 272
Working with 21  healthy images out of 272
Working with 22  healthy images out of 272
Working with 23  healthy images out of 272
Working with 24  hea

In [None]:
print(len(images))
print(len(labels))

2898
2898


## **Necessary Imports for Models**

In [None]:
import tensorflow as tf

from tensorflow.keras.applications import ResNet50, EfficientNetB0, MobileNetV3Small, VGG19
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Dense, Flatten, Dropout
from tensorflow.keras.metrics import Accuracy, Precision, Recall

from sklearn.model_selection import train_test_split

import numpy as np
import matplotlib.pyplot as plt

## **Transfer Learning Models**

### **Pre-trained ResNet50 Model**

In [None]:
def get_resnet_pretrained():
  base_model = ResNet50(input_shape=(224, 224,3), include_top=False, weights="imagenet")

  for layer in base_model.layers:
    layer.trainable = False


  x = Flatten()(base_model.output)
  x = Dense(512, activation='relu')(x)
  x = Dropout(0.5)(x)
  x = Dense(1, activation='sigmoid')(x)

  model = Model(base_model.input, x)

  model.compile(
      optimizer = "adam",
      loss="binary_crossentropy",
      metrics=['accuracy']
  )

  return model

### **Pre-trained EfficientNet B0**

In [None]:
def get_efficientnet_pretrained():
  base_model = EfficientNetB0(input_shape=(224, 224,3), include_top=False, weights="imagenet")

  for layer in base_model.layers:
    layer.trainable = False

  x = Flatten()(base_model.output)
  x = Dense(512, activation='relu')(x)
  x = Dropout(0.2)(x)
  x = Dense(1, activation='sigmoid')(x)

  model = Model(base_model.input, x)

  model.compile(
      optimizer = "adam",
      loss="binary_crossentropy",
      metrics=['accuracy']
  )

  return model

### **Pre-trained MobileNet V3 Small Model**

In [None]:
def get_mobilenet_pretrained():
  base_model = MobileNetV3Small(input_shape=(224, 224,3), include_top=False, weights="imagenet")

  for layer in base_model.layers:
    layer.trainable = False

  x = Flatten()(base_model.output)
  x = Dense(512, activation='relu')(x)
  x = Dropout(0.5)(x)
  x = Dense(1, activation='sigmoid')(x)

  model = Model(base_model.input, x)

  model.compile(
      optimizer = "adam",
      loss="binary_crossentropy",
      metrics=["accuracy"]
  )

  return model

### **Pre-trained VGG19 Model**

In [None]:
def get_vgg_pretrained():
  base_model = VGG19(input_shape=(224, 224,3), include_top=False, weights="imagenet")

  for layer in base_model.layers:
    layer.trainable = False

  x = Flatten()(base_model.output)
  x = Dense(512, activation='relu')(x)
  x = Dropout(0.5)(x)
  x = Dense(1, activation='sigmoid')(x)

  model = Model(base_model.input, x)

  model.compile(
      optimizer = "adam",
      loss="binary_crossentropy",
      metrics=["accuracy"]
  )

  return model

## **Custom-Made CNN Model**

### **Model Design**

## **Main Method**

**Split the Data into Train and Test Sets**

In [None]:
x_train, x_test, y_train, y_test = train_test_split(np.array(images), np.array(labels), test_size=0.2, shuffle=True)

### **Pre-trained ResNet50**

**Train and Test**

In [None]:
pretrained_resnet_model = get_resnet_pretrained()
pretrain_resnet50_history = pretrained_resnet_model.fit(x_train, y_train, validation_split=0.1, epochs=8, steps_per_epoch=100)
pretrained_resnet_model.evaluate(x_test, y_test, verbose=2)
pretrained_resnet_y_pred = pretrained_resnet_model.predict(x_test).ravel()

Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
19/19 - 4s - loss: 0.2058 - accuracy: 0.9397


**Evaluation**

### **Pre-trained EfficientNet-B0**

**Train and Test**

In [None]:
pretrained_efficientnet_model = get_efficientnet_pretrained()
pretrain_efficientnet_history = pretrained_efficientnet_model.fit(x_train, y_train, validation_split=0.1, epochs=8, steps_per_epoch=100)
pretrained_efficientnet_model.evaluate(x_test, y_test, verbose=2)
pretrained_efficientnet_y_pred = pretrained_efficientnet_model.predict(x_test).ravel()

Downloading data from https://storage.googleapis.com/keras-applications/efficientnetb0_notop.h5
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
19/19 - 3s - loss: 0.1632 - acc: 0.9552


**Evaluation**

### **Pre-trained MobileNet V3 Small**

**Train and Test**

In [None]:
pretrained_mobilenet_model = get_mobilenet_pretrained()
pretrain_mobilenet_history = pretrained_mobilenet_model.fit(x_train, y_train, validation_split=0.1, epochs=8)
pretrained_mobilenet_model.evaluate(x_test, y_test, verbose=2)
pretrained_mobilenet_y_pred = pretrained_mobilenet_model.predict(x_test).ravel()

Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
19/19 - 1s - loss: 0.1144 - accuracy: 0.9534


**Evaluation**

### **Pre-trained VGG19**

**Train and Test**

In [None]:
pretrained_vgg_model = get_vgg_pretrained()
pretrain_vgg_history = pretrained_vgg_model.fit(x_train, y_train, validation_split=0.1, epochs=8)
pretrained_vgg_model.evaluate(x_test, y_test, verbose=2)
pretrained_vgg_y_pred = pretrained_vgg_model.predict(x_test).ravel()

Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
19/19 - 3s - loss: 0.1767 - accuracy: 0.9483


**Evaluation**

### **Custom Model**

**Train and Test**

**Evaluation**