<a href="https://colab.research.google.com/github/Ayo-Cyber/Deep-Learning/blob/main/Copy_of_BREAST_CANCER_IMAGE_CLASSIFICATION_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Importing Libraries

#### Importing the necessary libraries are very essential in solving Deep Learning Problems , cause importing and installing necessary libaries and modules helps for smooth running of the program you intend to run.


#### Libraries needed 
*   The Pandas and Numpy library
*   The python os and glob library
*   The Tensorflow frame work and keras library 
*   Visualizations libraries like Seaborn and Matplotlib
*   The Scikit learn library 



In [None]:
import pandas as pd #importing pandas to read data and also use for data manipulation
import numpy as np #for connecting dictionaries to array and also for linear algebra use 

import os #for the use reading files from directories 
import glob

#for data visualization
import matplotlib.pyplot as plt #matplotlb.pyplot is used for data visualization
import seaborn as sns #seaborn is a library built upon matplotlib used for data visualization

#deep learning tools....
import tensorflow as tf #tensorflow is a deep learning tool used for building neural networks
from tensorflow import keras #keras acts as the interface for tensorflow 
from tensorflow.keras.preprocessing import image, image_dataset_from_directory #this tensorflow.keras.preprocessing is used for reading images in tensorflow
from tensorflow.keras import layers, activations, optimizers, losses, metrics, initializers
from tensorflow.keras.applications import EfficientNetB7

#scikit learn libraries 
from skimage import data , io , filters #skimage is used for reading data in image format 

from sklearn.preprocessing import LabelEncoder #labelencoder is used for encoding text characters into understandable computer formats (i.e : numbers)
from sklearn.model_selection import train_test_split 
#model_selection is used when we decide to split the dataset into two splits one for training and the otehr for testing 

## Reading Data

#### In this section we read our image data from the google drive we stored them in for easy usage . 


*   First we store the directory of the folder of our images in the main_directory variable 
*   Then we use the os.listdir to further move into the folder to check other folders , which are the three folders (<i>benign</i> , <i>normal</i> , <i>malignant</i>)
*   After that we created two list variables to store in the image_files and there respective labels .
*   Furthermore a for loop was used to loop through the image_directory to get the image files for sectioning
*   We then However used the glob function from pyton the define a technique to match some specific pattern interms of the image directory  
*   After using the glob function we then iterate through the image directory variable to get the image labels and files 
*   Afterwards the length of both the image_files and image_labels is being printed out .

In [None]:
#creating a main_directory to the path where the main folder is stored in our google drive 
main_directory = '/content/drive/MyDrive/Breast_Cancer_Images_Folder' 

#the os.listdir helps us to return the list containing the enteries in the main_directory file_path
image_directory = os.listdir(main_directory)

#creating two empty list to store our image files and the other to store the image labels
image_files = []
image_labels = []

#creating a for loop to loop through image_directory entry
# --->storing the main image directory path into the variable file_list 
# ---->then we store the labels of the images into the image_labels list created intially 
# ----->and also store the directory of the images into the image_files list using the extend function

for folders in image_directory:

  file_list = glob.glob(main_directory + '/'+ folders + '/*')
  image_labels.extend([folders for l in file_list])
  image_files.extend(file_list)

#printing out the length of list 
len(image_files), len(image_labels)

(2030, 2030)

*  This section is a sub section of the reading data section and it deals with removing the mask images in the dataset , so we can train the model based on the normal images we have to make the model run smoothly . 

Steps Taken 


*   First we create two new list to store our new set of files and labels into 
*   Then we loop through the image files and labels to check for files with the 'mask' in it's name and drop them .
*   Afterwards we print the new length of our dataset



In [None]:
image_files_edit = []
image_labels_edit = []

for file, label in zip(image_files, image_labels):
    if 'mask' not in file:
        image_files_edit.append(file)
        image_labels_edit.append(label)

    
len(image_files_edit), len(image_labels_edit)

(792, 792)

## Image Preprocessing 

#### In this section we use the keras library and some scikit learn library functions to preprocess the images into something more scalable for the model to understand and work with . 

#### Steps Taken 
*  I defined the image shape and i created a function to prepare the image 
*  In the prepare_image function i loaded the images and use the specified image shape to make it easier to scale instead of importing or trying to scale the image in other ways .
*   Moreover i returned the images in an array format 
*   After returning the image in array format , i moved forward to use the kreas image preprocessing tool to prepreocess the image and also returned it in that format .
*   Furthermore i created a dictionary called images and in it i created two variables image and target to store the newly preprocessed image data .
*   i created a for loop to iterate the already cleaned image_files and image_labels list and then appending the newly created prepared image into the image dictionary .

In [None]:
Img_shp = (224 , 224)
def prepare_image(file):
    img = image.load_img(file, target_size=Img_shp)
    img_array = image.img_to_array(img)
    return tf.keras.applications.efficientnet.preprocess_input (img_array)

images = {
    'image': [], 
    'target': []
}


for i, (file, label) in enumerate(zip(image_files_edit, image_labels_edit)):
    images['image'].append(prepare_image(file))
    images['target'].append(label)

print('Image Preprocessed ....')

In [None]:
images['image'] = np.array(images['image'])
images['target'] = np.array(images['target'])

* Exploratory Analysis

In [None]:
count = [0]*3
for i in np.arange(len(images['target'])):
  if images['target'][i] == 'normal':
    count[0] = count[0] + 1
  elif images['target'][i] == 'benign':
    count[1] = count[1] + 1
  else:
    count[2] = count[2] + 1


cancer_data = {
    'names' : ['normal' , 'bengin' , 'maliginant'] , 
    'count' : [count[0] , count[1] , count[2]]
}

cancer_df = pd.DataFrame.from_dict(cancer_data)

In [None]:
cancer_df

Unnamed: 0,names,count
0,normal,133
1,bengin,395
2,maliginant,0


In [None]:
encoder = LabelEncoder()

images['target'] = encoder.fit_transform(images['target'])


In [None]:
cancer_classes = encoder.classes_

cancer_classes

array(['benign', 'normal'], dtype='<U6')

## Spliting Of Data

In [None]:
from sklearn.model_selection import train_test_split

X_train , X_test , y_train , y_test = train_test_split(images['image'] , images['target'] , test_size = 0.2 , random_state=5)

X_train.shape , X_test.shape , y_train.shape , y_test.shape

((422, 224, 224, 3), (106, 224, 224, 3), (422,), (106,))

In [None]:
base_model = EfficientNetB7(
    include_top=False,
    weights='imagenet',
    input_shape=(*Img_shp, 3),
    classes=3)

sections = 3

base_model.trainable = False
model_calc = base_model.output

model_calc = layers.Conv2D(256, 3, padding='valid')(model_calc)
model_calc = layers.Activation('relu')(model_calc)
model_calc = layers.Dropout(0.5)(model_calc)

model_calc = layers.Conv2D(128, 3, padding='valid')(model_calc)
model_calc = layers.Activation('relu')(model_calc)
model_calc = layers.Dropout(0.5)(model_calc)

model_calc = layers.Flatten()(model_calc)
model_calc = layers.Dense(64 , activation = 'relu')(model_calc)
model_calc = layers.Dropout(0.5)(model_calc)
model_calc = layers.Dense(64 , activation = 'relu')(model_calc)
model_calc = layers.Dropout(0.5)(model_calc)

model_calc = layers.Dense(sections  , activation='softmax')(model_calc)

model = keras.models.Model(inputs = base_model.inputs , outputs = model_calc)

model.compile(
    optimizer = 'adam',
    loss = 'sparse_categorical_crossentropy',
    metrics = [tf.keras.metrics.SparseCategoricalAccuracy()]
)

In [None]:
model.fit(X_train , y_train , epochs = 10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7fa35e78c150>

In [None]:
model.evaluate(X_test,y_test, batch_size=32, verbose=1)



[0.45908322930336, 0.7452830076217651]