# Notebook 3: CNN Model

The objective of this project is to create an image classification model by classifying x-rays whether someone has pneumonia or not. The tutorial I followed to run a convolutional neural network to identify whether one has pneumonia or not can be found from [tensorflow.com](https://www.tensorflow.org/tutorials/images/classification)



In [3]:
import glob
import sys
import os
import shutil
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from PIL import Image

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# For reproducible results:
from numpy.random import seed
seed(1)
%matplotlib inline

The dataset has the following directory structure:

<pre>
<b>data</b>
|__ <b>train</b>
    |______ <b>PNEUMONIA</b>: [pneumonia_0.jpg, pneumonia_1.jpg, pneumonia_2.jpg ....]
    |______ <b>NORMAL</b>: [normal_0.jpg, normal_1.jpg, normal_2.jpg ...]
|__ <b>test</b>
    |______ <b>PNEUMONIA</b>: [pneumonia_0.jpg, pneumonia_1.jpg, pneumonia_2.jpg ....]
    |______ <b>NORMAL</b>: [normal_0.jpg, normal_1.jpg, normal_2.jpg ...]
</pre>

In [2]:
train_directory = '../data/train/'
test_directory = '../data/test/'

normal_tr = glob.glob('../data/train/NORMAL/*.jpeg')
pneumonia_tr = glob.glob('../data/train/PNEUMONIA/*.jpeg')

normal_test = glob.glob('../data/test/NORMAL/*.jpeg')
pneumonia_test = glob.glob('../data/test/PNEUMONIA/*.jpeg')


print(f"Total training normal images: {len(normal_tr)}")
print(f"Total training pneumonia images: {len(pneumonia_tr)}")
print(f"Total test normal images: {len(normal_test)}")
print(f"Total train normal images: {len(pneumonia_test)}")
print("--")
print("Total training images:", len(glob.glob('../data/train/*/*.jpeg')))
print("Total test images:", len(glob.glob('../data/test/*/*.jpeg')))

Total training normal images: 1341
Total training pneumonia images: 3875
Total test normal images: 242
Total train normal images: 398
--
Total training images: 5216
Total test images: 640


Loading `train_data` and `test_data` which are the images transformed into floating point tensors.

In [5]:
%run '../assets/tensor_data.py'

Found 5216 images belonging to 2 classes.
Found 640 images belonging to 2 classes.


In [13]:
# These are the usual ipython objects, including this one you are creating
ipython_vars = ['In', 'Out', 'exit', 'quit', 'get_ipython', 'ipython_vars']
# list of objects
variables = [x for x in dir() if not x.startswith('_') and x not in sys.modules and x not in ipython_vars]

if 'train_data' in variables and 'test_data' in variables:
    print('train_data and test_data have successfully been imported.')
else:
    print('train_data and test_data have not been imported.')

train_data and test_data have successfully been imported.


## Convolutional Neural Network

The images now must be processed. As mentioned before in Notebook 1, each images' resolution, pixel width and height, zoom, and angle is different, so the images have to be normalized by using keras' `ImageDataGenerator`.

`ImageDataGenerator` will transform images to floating point tensors, which can the be inputted into the neural network. The following steps will be performed:
1. Images will be read in
2. Images will be resized to [224x224 pixels](https://datascience.stackexchange.com/questions/16601/reason-for-square-images-in-deep-learning) because many models such as VGG and ResNet like squares apparently.<font color = 'red'>***</font>
    
3. Images will then be converted to floating point tensors
4. Tensors will be rescaled from values between 0 and 255 to values between 0 and 1 because small input values are better to train for neural networks.

The data augmentation portion is necessary to create more inputs/observations to train the model. More training data will be generated by reshaping and modifying existing training images. The following will be applied for augmentation:

1. `horizontal_flip = True` - enough said
2. `rotation_range = 30` - randomly rotate an image by 30 degrees
3. `zoom_range=0.3` - randomly zoom into an image up to 30%


<font color = 'red'>***</font> More in depth explanation: Increasing input image size will lead to an increase in noise and variance that will require the network to deal with more processing, such asmore pooling or layers.
    
Documentation source for pre-processing - [keras - preprocessing](https://keras.io/preprocessing/image/)