# Convolutional Neural Network


#### Dense Network looks at entire image (global scale)
- it looks at patterns in the entire image - image must be centered, etc.
- it cannot recognize local patterns if they were moved to another part of the image

#### Convolutional Neural Network looks at parts of image (local scale)
- can learn local patterns and find them anywhere in image
- CNN scans image to find features and passes those features to a dense classifier

#### CNN Architecture:
- Not densly connected
- multiple layers used to pick up on complex patterns
    - first layer may pick on edges and lines
    - second layer takes this as input and may start forming shapes
    - last layer might look at shapes and determine if they form a pattern

#### Features Maps:
- A 3D tensor with two spacial axes (width and height) and one depth axis
- CNN layers take feature maps as input and return a new feature map that
represent the presence of specific filters from the previous feature map
- this is called a response map

#### Layer Parameters - CNN defined by two key parameters
- **Filter**: *m* x *n* pattern of pixels that we are looking for in image
    - number of filters in CNN represents how many patterns each layer is
    looking for and what the depth of our response map will be
    - each layer of depth in the reponse map is a matrix containing values
    inicating if each filter was present at that location or not (find by calculating dot product of sample and filter)
    - trainable parameter
- **Sample Size**: each layer is going to examine *n* x *m* blocks of pixels in each image
    - typically, 3x3, or 5x5 blocks (sample size)
    - sampling size is same size as filter
    - layers work by sliding filers of *n* x *m* pixels over every possible position in our image
    and populating a new response map indicating whether or not the filter is present at each location

#### Pooling
- Simplify process by reducing size of feature maps
- takes average, max, or min value in a 2x2 area of feature map, and make that whole area into one pixel in new map

## Image Data
Three Dimensions:
- Image Height
- Image Width
- Color Channels

Color Channels:
- Image is made of several layers, one for the values of each color
- for rgb, red, green, and blue each have their own layers, with pixel values from 0-255

### Dataset

Problem: Classify 10 different everyday objects using the CIFAR Image dataset in tensorflow

It contains 60,000 32x32 color images with 6000 images of each class

It has the following labels:
- Airplane
- Automobile
- Bird
- Cat
- Deer
- Dog
- Frog
- Horse
- Ship
- Truck

In [2]:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
import numpy as np

2022-12-13 23:22:54.145399: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-12-13 23:22:54.291992: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-12-13 23:22:54.292006: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-12-13 23:22:54.825462: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-

In [3]:


# Class names
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

# Load and split dataset
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()

# Normalize pixel values to be between 0 and 1
train_images = train_images / 255.0
test_images = test_images / 255.0

## Build Model
### CNN Architecture
Common architecture is a stack of Conv2D and MaxPooling2D layers followed by a few dense layers
- stack of convolutional and maxPooling layers extract the features from the image
    - maxPooling layer after each convolutional layer reducing map size with max pixel value
- features are flattened and fed to densly connected layers that determine the class of an image based on features

In [None]:
model = models.Sequential()

# ---------------- Convolutional Base ----------------

# Layer 1: input shape of data is 32x32x3 - will process 32 filters of size 3x3 over input data - will use relu activation function
#          output map of this layer will be 30x30x32 - 30x30 instead of 32x32 bc no padding - last dimen bc 32 filters
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32,32,3)))


# Layer 2: Preform Max Pooling operation using 2x2 samples and a stride of 2 (shrink feature map by factor of 2 )
#          output map will be 15x15x32  - reduce each layer of depth by factor of 2
model.add(layers.MaxPooling2D((2,2)))

# Layer 3: Same as Layer 1, but the input feature map is the output of layer 1 (after max poolng)
#          Also increases frequency of filters from 32 to 64 (can afford this since the feature map size is shrinking from pooling)
#          Output map will be 13x13x64 - lose two pixels bc no padding - 64 filters
model.add(layers.Conv2D(64, (3,3), activation='relu'))

# Layer 4: Same as layer 2
#          Output shape will be 6x6x64 - reduce by factor of 2
model.add(layers.MaxPooling2D((2,2)))

# Layer 5: Same as layer 3
#          Output shape will be 4x4x64  - same as 1 and 3
model.add(layers.Conv2D(64, (3,3), activation='relu'))

# ------------------- Dense Layers -------------------

# Layer 6: Flatten the matrices of feature maps to one dimension - Output shape is 1x1024
model.add(layers.Flatten()) 

# Layer 7: 64 neuron dense layer to predict based on identified features - output later is 1x64 (one output for each neuron)
model.add(layers.Dense(64))

# Layer 8: 10 neuron output layer for 10 classes - output shape is 1x10 (probability distribution of each class)
model.add(layers.Dense(10))

model.summary()

### Compile and Train Model
Define the loss function, the optimizer, the metrics to track, and the number of epochs

In [None]:
# Compile model
model.compile(
    optimizer='adam',       # Choose the adam algorithm to preform gradient descent
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),  # Function to claculate the loss
    metrics=['accuracy']    # Keep track of accuracy during training
)

# Train Model
history = model.fit(
    train_images,           # Train Images
    train_labels,           # Train labels
    epochs=8,              # Choose 10 epochs
    validation_data=(test_images, test_labels)  # Testing data
)

In [None]:
# Evaluate Model
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print(test_acc)

In [None]:
# Predict Entries
predictions = model.predict(test_images)    # probability distribution of each class for each image

entry = 1   # entry of images to look at

# Print expected label
print(f"Expected: {class_names[test_labels[entry][0]]}")

# show image of entry
plt.figure()
plt.imshow(test_images[entry])
plt.colorbar()
plt.grid(False)
plt.show()

# get the maximum value in the list to get the predicted class - argmax returns index of largest value
print(f"Predicted: {class_names[np.argmax(predictions[entry])]}")

# Working with Small Datasets
It is difficult to train a CNN from scratch in situations where you don't have millions of images

## Data Augmentation
Create a larger dataset from a smaller one - avoid overfitting

Perfoms random transformations on images so that model can generalize better
- transformations like compressions, rotations, stretches, color changes

Can be done with keras

In [None]:
# from keras.preprocessing import image
import keras.utils as image
from keras.preprocessing.image import ImageDataGenerator

# Creates a data generator object that transforms images
datagen = ImageDataGenerator (
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest' 
)

# pick an image to transform
test_img = train_images[20]
img = image.img_to_array(test_img)
img = img.reshape((1,) + img.shape) # shape of 1 by (figure it out) + image shape

# Generate and save images
for i, batch in enumerate(datagen.flow(img, save_prefix='test', save_format='jpeg')):
    plt.figure(1)
    plot = plt.imshow(image.img_to_array(batch[0]))
    if i >= 4:
        break

# Pretrained Models
Can encorporate pretrained model in our own model - fine tune last few layers
- base layers are able to pick up features very well that are common to all images
- we can add layers to fine tune model for our problem

### Dataset

Use cats vs dogs dataset from tensorflow
Dataset contains (image, label) pairs where images have different dimensions and 3 color channels

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

In [None]:
# Load Data Set
import tensorflow_datasets as tfds

# Split dataset into 80% training, 10% testing, and 10% validation

(raw_train, raw_validation, raw_test), metadata = tfds.load(
    'cats_vs_dogs',
    split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],
    with_info=True,
    as_supervised=True,
)

### Preprocessing
Need to make all images the same size

In [None]:
IMG_SIZE = 160  # all images will be sizes to 160x160

def format_image(image, label):
    image = tf.cast(image, tf.float32)  # cast every pixel to float
    image = (image/127.5) - 1           # half of 255
    image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))    # resize image
    return image, label

# Apply to images
train = raw_train.map(format_image)
validation = raw_validation.map(format_image)
test = raw_test.map(format_image)

In [None]:
# Shuffle and batch

BATCH_SIZE = 32
SHUFFLE_BUFFER_SIZE = 1000

train_batches = train.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
validation_batches = validation.batch(BATCH_SIZE)
test_batches = test.batch(BATCH_SIZE)

### Pick a Pretrained Model
Using MobileNet V2
- trained on 1.4 million images and has 1000 different classes

Only want to include the convolutional base of this model

Will load the architecture and the weights of this model

- Input is our 1x160x160x3 image
- Final output layer of base CNN will be 32x5x5x1280, with a RelU activation function
    - 5x5 is the size of each filter
    - 32 is the 32 layers of different filters/features

In [None]:
IMG_SHAPE = (IMG_SIZE, IMG_SIZE, 3)

# Create the Base Model from pretrained MobileNet V2
base_model = tf.keras.applications.MobileNetV2(
    input_shape=IMG_SHAPE,      # Give the input shape for this application
    include_top=False,          # Don't include the top layers that classify to the 1000 classes
    weights='imagenet'          # Use the model weights from imagenet Google dataset
)

# Freeze the base - don't want to change the parameters of the base when training the classifier
base_model.trainable = False

base_model.summary()

### Adding the Classfier
- Instead of flattening, use global average pooling layer that will average the entire 5x5 area of each 2D feature map and return a single 1280 element vector per filter
- add a single dense neuron as prediction layer and output - only 2 output classes

In [None]:
global_average_layer = tf.keras.layers.GlobalAveragePooling2D()

prediction_layer = tf.keras.layers.Dense(1)

# Combine prediction layer with base
model = tf.keras.Sequential([
    base_model,             # CNN base
    global_average_layer,   # Flattening layer
    prediction_layer        # Output Layer
])

model.summary()

### Training the Model
Train just the top layers

In [None]:
base_learning_rate = 0.0001     # how much we are allowed to modify the network while training - set low
model.compile(
    optimizer=tf.keras.optimizers.RMSprop(learning_rate=base_learning_rate),   # optimizer
    loss = tf.keras.losses.BinaryCrossentropy(from_logits=True),   # Only two classes, so use binary
    metrics=['accuracy']
)

history = model.fit(
    train_batches,
    epochs=3,
    validation_data=validation_batches
)

In [None]:
# Save model
model.save("dogs_vs_cats.h5")

# Load model
new_model = tf.keras.models.load_model('dogs_vs_cats.h5')

In [None]:
# Predict
predictions = model.predict(test_batches, batch_size=BATCH_SIZE, verbose=2)    # probability distribution of each class for each image

In [None]:

entry = 0   # entry of images to look at

# Print expected label
image, label = list(raw_test.take(entry+1))[-1]
expected = metadata.features['label'].int2str(label)
print(f"Expected: {expected}")


# show image of entry
plt.figure()
plt.imshow(image)
plt.colorbar()
plt.grid(False)
plt.show()

# get the maximum value in the list to get the predicted class - argmax returns index of largest value
prediction = 0 if predictions[entry] <= .5 else 1
print(f"Predicted: {metadata.features['label'].int2str(prediction)}")