# Intro / preparing data

Different datasets based on CIFAR 10 will be tested on a basic model using data augmentation. 
We will add layers of data augmentation one by one and see the evolution.

Visualization of metrics is possible at the end of this notebook using TensorBoard

If needed, you can find CIFAR 10 images at :

In [1]:
%load_ext tensorboard

In [2]:
###### Local imports
from src.model_utilities import train_model, create_model, retrieve_data

###### Random imports
import os
import datetime

###### Tensorflow imports
import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential, Model, load_model
from tensorflow.keras.layers import Conv2D, MaxPooling2D, AveragePooling2D, Activation, Dropout, Flatten, Dense, Dropout
from tensorflow.keras.preprocessing.image import ImageDataGenerator

###### Sklearn imports
from sklearn.model_selection import train_test_split


###### Setting global variables
TRAIN_BASE_DIRECTORY = "./data/train"
VAL_BASE_DIRECTORY = "./data/test"
IMAGE_SIZE = 32
BATCH_SIZE = 64

# Initializing model

We set up a basic model architecture which will be used for all trainings

In [3]:
model = Sequential()

model.add(Conv2D(32, kernel_size=3, activation='relu', padding='same', input_shape=(32,32,3))) 
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dense(64, activation='relu'))
# model.add(Dropout(0.2))

model.add(Conv2D(64, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# model.add(Dropout(0.2))

#Fin obligatoire
model.add(Flatten())
model.add(Dense(10, activation='softmax'))

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 32, 32, 32)        896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 16, 16, 32)        0         
_________________________________________________________________
dense (Dense)                (None, 16, 16, 64)        2112      
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 16, 16, 64)        36928     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 8, 8, 64)          0         
_________________________________________________________________
flatten (Flatten)            (None, 4096)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                4

# First dataset : vanilla

We read the CIFAR 10 dataset and train the model as is

In [4]:
####### Setting the data generator
datagen = ImageDataGenerator(validation_split=0.2)

In [5]:
####### Retrieving data using the ImageDataGenerator
train_generator ,val_generator = retrieve_data(datagen,
                                               TRAIN_BASE_DIRECTORY,
                                               VAL_BASE_DIRECTORY,
                                               BATCH_SIZE)

Found 40000 images belonging to 10 classes.
Found 8000 images belonging to 10 classes.


In [6]:
train_model(model, train_generator, val_generator, name="Base_Model", n_epochs=20)

Instructions for updating:
Please use Model.fit, which supports generators.
Epoch 1/20
Instructions for updating:
use `tf.profiler.experimental.stop` instead.
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


# Adding random rotation on image

We add a random tilt between 0 and 20 degrees

In [7]:
####### Setting the data generator
datagen20 = ImageDataGenerator(validation_split=0.2,
                               rotation_range=20)

In [8]:
####### Retrieving data using the ImageDataGenerator
train_generator20 ,val_generator20 = retrieve_data(datagen20,
                                               TRAIN_BASE_DIRECTORY,
                                               VAL_BASE_DIRECTORY,
                                               BATCH_SIZE)

Found 40000 images belonging to 10 classes.
Found 8000 images belonging to 10 classes.


In [9]:
model20 = create_model()

train_model(model20, train_generator20, val_generator20, name="20percent_Rotation_Model", n_epochs=20)

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_2 (Conv2D)            (None, 32, 32, 32)        896       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 16, 16, 32)        0         
_________________________________________________________________
dense_2 (Dense)              (None, 16, 16, 64)        2112      
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 16, 16, 64)        36928     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 8, 8, 64)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 4096)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)               

# Adding random horizontal flip

Keeping the 20 degrees tilt we add a random horizontal flip

In [4]:
####### Setting the data generator
datagen20hflip = ImageDataGenerator(validation_split=0.2,
                                    rotation_range=20,
                                    horizontal_flip=True)

In [5]:
####### Retrieving data using the ImageDataGenerator
train_generator20hflip ,val_generator20hflip = retrieve_data(datagen20hflip,
                                               TRAIN_BASE_DIRECTORY,
                                               VAL_BASE_DIRECTORY,
                                               BATCH_SIZE)

Found 40000 images belonging to 10 classes.
Found 8000 images belonging to 10 classes.


In [6]:
model20hflip = create_model()

train_model(model20hflip, train_generator20hflip, val_generator20hflip, name="20percent_Rotation+Horizontal_Flip_Model", n_epochs=20)

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_2 (Conv2D)            (None, 32, 32, 32)        896       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 16, 16, 32)        0         
_________________________________________________________________
dense_2 (Dense)              (None, 16, 16, 64)        2112      
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 16, 16, 64)        36928     
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 8, 8, 64)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 4096)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)               

# Adding random vertical flip and brightness range

Even if the previous model isn't overfitting anymore, we'll try a last data augmentation ton see if we can get a better result

In [7]:
####### Setting the data generator
datagen20bothflips = ImageDataGenerator(validation_split=0.2,
                                    rotation_range=20,
                                    horizontal_flip=True,
                                    vertical_flip=True,
                                    brightness_range=(0.2, 0.8))

In [8]:
####### Retrieving data using the ImageDataGenerator
train_generator20bothflips ,val_generator20bothflips = retrieve_data(datagen20bothflips,
                                               TRAIN_BASE_DIRECTORY,
                                               VAL_BASE_DIRECTORY,
                                               BATCH_SIZE)

Found 40000 images belonging to 10 classes.
Found 8000 images belonging to 10 classes.


In [9]:
model20bothflips = create_model()

train_model(model20bothflips, train_generator20bothflips, val_generator20bothflips, name="20percent_Rotation+Horizontal_Vertical_Flip_Brightness_Model", n_epochs=20)

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_4 (Conv2D)            (None, 32, 32, 32)        896       
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 16, 16, 32)        0         
_________________________________________________________________
dense_4 (Dense)              (None, 16, 16, 64)        2112      
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 16, 16, 64)        36928     
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 8, 8, 64)          0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 4096)              0         
_________________________________________________________________
dense_5 (Dense)              (None, 10)               

# Data viz

Check and uncheck models to make graphs more readable

In [13]:
%tensorboard --logdir logs --host=localhost --port=8005

Reusing TensorBoard on port 8005 (pid 11640), started 0:58:57 ago. (Use '!kill 11640' to kill it.)

# Conclusion

As we went through the different phases of data augmentation, we noticed an improvement of the loss. At the start, we could clearly see that the model was overfitting.

Combining all the previously used data augmentation techniques resulted in a model where validation loss is almost the exact same as the training loss.

