# Use A CNN To Process The Image

# Requirements 

### Datasets:
Kaggle Data set
The BreaKHis database contains microscopic biopsy images benign and malignant breast tumors, wherein this dataset, I have done to separate training data and test data with different folders, each file has a different slide image. In this dataset I only took a partial sample of 400x optical zoom, if you are further interested in the dataset, please refer to this paper:

FA Spanhol, LS Oliveira, C. Petitjean and L. Heutte, "A Dataset for Breast Cancer Histopathological Image Classification," in IEEE Transactions on Biomedical Engineering, vol. 63, no. 7, pp. 1455-1462, July 2016, doi: 10.1109 / TBME.2015.2496264.

Given this, the dataset is not in my control. This dataset purpose is to deep learning learner

### Python:
Python3 (3.9.18)

### Modules:
TensorFlow (2.14.0)

Numpy (1.24.3)

Pillow (10.0.1)

The Goal of this will be to build a CNN to determine malignant vs benign tumors based on patient slides. 

To Do List
1) Load the data and Assess the data imaging slides to get the amount of slides as well as how best to proceed
Two options will be pursued, both a down sampling and and up sampling with affine 
2a) down sample to assess items
2b) run augmentations and other transformations as this will enable us get some practice in
one question i have is what is the difference betweem skimage and keras preprocessing
It seems that unless we are running real imaging analysis such as pixel based or edge detection or segemention that its better to run it through keras because it is more efficient at setting up the augmentation transformations. 

Thus I will use keras data preprocessing image and feed it through the keras tensforflow cnn


3) setup these slides so that we have an equal amount of images in both the malignant and the benign set in the training set
4) setup the cnn model

# Convolutional Neural Network

### Importing the libraries

In [51]:
import tensorflow as tf
import numpy as np
import os
import PIL

In [52]:
from platform import python_version
print(python_version())
tf.__version__

3.9.18


'2.14.0'

## Part 1 - Loading And PreProcessing The Dataset

In [53]:
#Apply transformation to training set only. Applying only to the training set avoids overfitting.
#Transformations are geometric transformations, rotate images, zoom in/out, flip images, etc. This is called augmentation
#Augmentation avoids overtraining on the training set because it augments the variety of original images to avoid overfit


In [54]:
dataset_path = 'P:/Portfolio Sets/Benign vs Malignant Slides Classification/BreaKHis 400X/train' #watch the direction of slashes, '\' will confuse python use '/' or '\\'

### The Training set

In [55]:
train_dataset = tf.keras.utils.image_dataset_from_directory(
    dataset_path, #file path
    labels = 'inferred',  # Automatically infer labels from directory structure (folder names)
    label_mode = 'categorical',  # categorical/int/etc.
    image_size = (64, 64), # Target image size for resizing
    batch_size = 32,  # Batch size for training
    seed = 100, #same random selection instance each time, required for splitting into training and validation sets
    validation_split = 0.2,  # Split the dataset into training and validation sets
    subset = 'training')  # Specify if it's the training subset

Found 1148 files belonging to 2 classes.
Using 919 files for training.


In [56]:
#the keras dataset from directory is unable to detect secondary sub folders thus if I select a folder the next step will determine the layers 
# sub folders past the first layer in otherwords I need to ensure to select
train_dataset.class_names # selects what the label set is

['benign', 'malignant']

In [57]:
# Load the validation set
validation_dataset = tf.keras.utils.image_dataset_from_directory(
    dataset_path, 
    labels = 'inferred',
    label_mode ='categorical',
    image_size = (64, 64),
    batch_size = 32,
    seed = 100,
    validation_split = 0.2,
    subset = 'validation')  # Specify if it's the validation subset

validation_dataset.class_names

Found 1148 files belonging to 2 classes.
Using 229 files for validation.


['benign', 'malignant']

### Preprocessing the Validation Set

## Part 2 - Building the CNN

In [58]:
cnn = tf.keras.models.Sequential()

In [59]:
### Initialising the CNN

### Step 1 - Convolution

In [60]:
# First Layer
cnn.add(tf.keras.layers.Conv2D( #add function applies new layer
    filters = 32, # Number of features
    kernel_size = 3, # Dimensions of feature detector (single digit is squared (X -> 3 x 3) or paired acceptable (X, Y -> (X x Y)  
    activation = 'relu', # Activation type 
    input_shape = (64, 64, 3))) # Tuple that selects image properties (batch size(optional), size, size, 3(RGB) or 1(B&W))

### Step 2 - Pooling

In [61]:
# Pooling - Down-Sampling Operation that ???????
cnn.add(tf.keras.layers.MaxPool2D(
    pool_size = 2, # frame size of the pool (single digit is squared (X -> 3 x 3) or paired acceptable (X, Y -> (X x Y)  
    strides = 2, # pixels the frame will move over when pooling (single digit is squared (X -> 3 x 3) or paired acceptable (X, Y -> (X x Y)
    padding = 'valid'))

### Adding a second convolutional layer

cnn.add(tf.keras.layers.Conv2D(filters = 32, kernel_size = 3, activation='relu'))
# input layer was removed because it was already applied earlier in the first convolutional layer
cnn.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2,padding='valid'))

### Step 3 - Flattening

In [62]:
# Flattening - Does what
cnn.add(tf.keras.layers.Flatten())
#automatically flattens all the CNN


### Step 4 - Full Connection

In [63]:
# Connecting - Connect the Layers
cnn.add(tf.keras.layers.Dense(units = 128, activation = 'relu'))
# units = the number of neurons for this layer (higher usually means more accuracy)

### Step 5 - Output Layer

In [64]:
# Output Layer - Final Layer To Predict Classification
cnn.add(tf.keras.layers.Dense(units = 2, activation='softmax'))
# set the number of neurons for final classification output, binary (units = 1) vs for multiclass/categorical (units = number of categories)
#activation will be sigmoid for binary (units = 1), for multiclass(categorical) could softmax

### Step 6 - Compile The CNN

In [65]:
cnn.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])
#compiles the cnn to the optimizer and loss function using accuracy as the metric 

## Part 3 - Training the CNN

### Training the CNN on the Training set and evaluating it on the Validation Set

In [66]:
cnn.fit(x = train_dataset, validation_data = validation_dataset, epochs = 25)
#trains the cnn
#need to look closer into model.fit() parameters and why we use validation_data and not a y value

Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.src.callbacks.History at 0x27261802130>

In [31]:
model = tf.keras.Sequential([tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)), #Conv2d(filter (# of features) = 32, kernel_size = (size of  feature detector a 3 is 3x3), , activation = 'relu' for rectifier activation, input_shape = [size, size, rgb(3) or bw(1)] 
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation = 'relu'),
    tf.keras.layers.Dense(2, activation = 'softmax')  # This is a multilabel categorical classification
])

model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', metrics = ['accuracy'])

# Train your model
model.fit(train_dataset, validation_data = validation_dataset, epochs = 25)  # Adjust the number of epochs as needed


Epoch 1/25
Epoch 2/25
Epoch 3/25
Epoch 4/25
Epoch 5/25
Epoch 6/25
Epoch 7/25
Epoch 8/25
Epoch 9/25
Epoch 10/25
Epoch 11/25
Epoch 12/25
Epoch 13/25
Epoch 14/25
Epoch 15/25
Epoch 16/25
Epoch 17/25
Epoch 18/25
Epoch 19/25
Epoch 20/25
Epoch 21/25
Epoch 22/25
Epoch 23/25
Epoch 24/25
Epoch 25/25


<keras.src.callbacks.History at 0x1ec8bf73c70>

## Part 4 - Making a single prediction

import numpy as np
from keras import utils
test_image = utils.load_img('P:\Machine-Learning Course\Machine Learning A-Z (Codes and Datasets)\AG Worksheets\dataset\Single Prediction\dogorcat1.jpg', target_size = (64, 64))
test_image = utils.img_to_array(test_image) #changes to an array to the cnn model can analyze
test_image = np.expand_dims(test_image, axis = 0) #adds an extra dimension to enable the image to have the batch dimension
#batch dimension required because the model has to run at a certain batch number, in this example 32
#dimension of batch is added to the 1st dimension

#we can now run the predict method
result = cnn.predict(test_image)

#to figure out which class is index as what
training_set.class_indices
if result[0][0] == 1: #result 1st slot as batch dimension so we run result[0] to enter the batch the next [0] selects the element in the batch(the single dog image) 
    prediction = 'Dog'
    
else:
    prediction = 'Cat'
print(prediction)