<a href="https://colab.research.google.com/github/Rafsan7238/BracU_Thesis_P2/blob/main/Pre_Thesis_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Application of Deep Convolutional Neural Network in Breast Cancer Prediction Using Digital Mammograms**

**Authors:** Rafsan Al Mamun, Md. Al Imran Sefat, Gazi Abu Rafin, Adnan

**Project GitHub Link:** [https://github.com/Rafsan7238/BracU_Thesis_P2](https://github.com/Rafsan7238/BracU_Thesis_P2)



---



***Abstract:*** *Cancer, a diagnosis that is so dreaded and scary, that its fear alone can strike even the
strongest of souls. The disease is often thought of as untreatable and unbearably painful, with
usually, no cure available. Among all the cancers, breast cancer is the second most deadliest ,
especially among women. What decides the patients’ fate is the early diagnosis of the cancer,
facilitating subsequent clinical management. Mammography plays a vital role in the
screening of breast cancers as it can detect any breast masses or calcifications early. However,
the extremely dense breast tissues pose difficulty in the detection of cancer mass, thus,
encouraging the use of machine learning (ML) techniques and artificial neural networks
(ANN) to assist radiologists in faster cancer diagnosis. This paper explores the MIAS
database, containing 332 digital mammograms from women, which were augmented and
preprocessed, and fed into different convolutional neural network (CNN) models, with the
aim of differentiating healthy tissues from cancerous ones with high accuracy. The paper,
along with a new proposed CNN model for better identification of breast cancer, focuses on
the significance of computer-aided detection (CAD) models overall in the early diagnosis of
breast cancer. While a diagnosis of breast cancer may still leave patients dreaded, we believe
our research can be a symbol of hope for all.*

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## **Data Collection and Preprocessing**

The dataset used for this project has been collected from the MIAS dataset. The dataset consisted of raw breast mammograms of 161 patients having healthy, malignant or benign breast tissues. 

These raw mammograms have been divided into 2 groups: healthy and cancer, having mammograms of patients with healthy and benign tissues, and malignant tissues respectively. 

However, the dataset is too small to be used in CNNs. Hence, they had to be augmented using the following processes in order to increase the no. of mammograms to be worked with. 

In [None]:
import cv2
from google.colab.patches import cv2_imshow
import os

def load_data():
  """
    Load image data from directory '/content/drive/MyDrive/Thesis/Dataset'.

    Load each image file from the subdirectories of Dataset, turn it into B/W and augment it. 
    After each augmentation append the images into the images list, and their corresponding
    labels in the labels list. 

    Return tuple `(images, labels)`. `images` should be a list of all
    of the images in the data directory, where each image is formatted as a
    numpy ndarray with dimensions 1024 x 1024 x 1. `labels` should
    be a list of labels (healthy or cancer), representing the categories for each of the
    corresponding `images`.
    """

  images = []
  labels = []

  # Loop through the healthy dataset
  directory_path = "/content/drive/MyDrive/Thesis/Dataset/Healthy"
  os.chdir(directory_path)
  count = 0

  for file in os.listdir():
    if file.endswith(".jpg"):
      # count = count + 1 
      # if(count == 5):
      #   break

      # print(f"Working with file: {file}")

      file_path = os.path.join(directory_path, file)
      img = cv2.imread(file_path)
      
      #TODO

      # 1. Resize to 224*224      
      resized = cv2.resize(img, (224,224), interpolation = cv2.INTER_AREA)
      # print('Original Shape : ' , img.shape)
      # print('Resized Shape : ' , resized.shape)

      # 2. Turn B/W
      image_bw = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
      
      # 3. CLAHE
      # The declaration of CLAHE 
      # clipLimit -> Threshold for contrast limiting
      clahe = cv2.createCLAHE(clipLimit = 5)
      final_img = clahe.apply(image_bw) + 30  

      cv2.imshow("Original Image", img)
      cv2.imshow("B/w Image", img)
      cv2.imshow("Clahe Image", img)
      print(img, image_bw, final_img)


      images.append(final_img)
      labels.append("healthy")

      # augmentation and append
      #rotate 10 and append
      #rotate 20 append
      #other rotates, shear, flip, contrast, brightness, noise etc
      


  # Same for cancer dataset


  # Return a tuple of (images, labels)

  return (images, labels)


# main_method to check function
image, label = load_data()
# cv2_imshow(image[0])
# print(label[0])