<!--NOTEBOOK_INFORMATION-->
<img id="r-1060983" data-claire-element-id="1061343" src="http://www.siteduzero.com/favicon.ico" alt="Image utilisateur">
    <p>
        **<font color='#D2691E'size="6">Image classification (7/9)</font>**.
    </p>
    <p>
         This notebook discusses the classification of project images with the approach of convolution neuron networks.
     It uses the <b> transfer learning </b> technique, implemented on the <b> VGG16 </b>, with <b> feature extraction from the convolution layers </b> and <b> fully connected layer training </b>.
     </p>
    
<p>
     N.B: It has been parameterized thanks to the work done on the notebooks provided in appendix
     <b> Annex_1_CNN_approach_transfer_learning_over_parameters </b> and <b> Annex_2_CNN_approach_results_analyzis </b>
</p>

<p>
    <center>
        **<font color='	#D2691E'size="6">ROADMAP</font>**
    </center>
</p>
<img align="left" style="padding-right:10px;" src="./images/part_7.jpg">

<p>
    <center>
        **<font color='	#D2691E'size="6">PLAN</font>**
    </center>
</p>

<p>
        **<font color='#D2691E'size="4">0) Libraries and functions import</font>**
</p>
<p>
        **<font color='#D2691E'size="4">I) CNN parameters calibration</font>**
</p>
<p>
        **<font color='#D2691E'size="4">II) Looping over the CNN parameters</font>**
</p>

<p>
        **<font color='#D2691E'size="4">0) Libraries and functions import</font>**
</p>

In [1]:
import pickle 
import pandas as pd
import numpy as np
import random
import time

from keras.applications.vgg16 import VGG16
from keras.layers import Flatten,Dense,Dropout
from keras.models import Model
from keras.callbacks import EarlyStopping
from keras.optimizers import SGD, Adam

Using TensorFlow backend.


In [2]:
from context import datasources_path, pickles_path, temp_files_path

In [3]:
from functions_tailored import select_N_random_races
from functions_tailored import build_train_validation_and_test_datasets

In [4]:
L_evaluation_cols = ['races',
                     'training_len',
                     'testing_len',
                     'n_races',
                     'batch_size',
                     'learning_rate',
                     'fitting_time',
                     'prediction_time',
                     'epochs_losses',
                     'epochs_accuracies',
                     'epochs_val_losses',
                     'epochs_val_accuracies',
                     'test_loss',
                     'test_accuracy',
                     'optimizer'
                    ]

In [5]:
RELOAD_EVALUATION = True

In [6]:
if RELOAD_EVALUATION == True :
    df_evaluation_final = pd.read_csv('df_evaluation_final.csv')
    df_evaluation_final.drop('Unnamed: 0', axis=1, inplace=True)

elif RELOAD_EVALUATION == False :
    df_evaluation_final = pd.DataFrame(columns = L_evaluation_cols)
df_evaluation_final

Unnamed: 0,races,training_len,testing_len,n_races,batch_size,learning_rate,fitting_time,prediction_time,epochs_losses,epochs_accuracies,epochs_val_losses,epochs_val_accuracies,test_loss,test_accuracy,optimizer
0,"['beagle', 'bernese_mountain_dog', 'dhole', 'e...",982,441,10,200,8e-06,7584,109,"[11.640229279543377, 8.675667757900571, 6.2707...","[0.13645621251064502, 0.320773934152607, 0.486...","[8.413873997266078, 6.248528911602387, 4.20287...","[0.29411764899643583, 0.43343653213867095, 0.5...",1.335279,0.868481,adam
1,"['beagle', 'bernese_mountain_dog', 'dhole', 'e...",982,441,10,200,8e-06,21151,97,"[13.164204929608191, 12.914754133360448, 12.32...","[0.10081466296479076, 0.11099796265727876, 0.1...","[12.512846996909694, 11.754830443084055, 10.95...","[0.10216718466935143, 0.12074303737734862, 0.1...",1.216085,0.854875,sgd
2,"['beagle', 'bernese_mountain_dog', 'dhole', 'e...",982,441,10,200,1.9e-05,5452,97,"[11.432109793917467, 6.4637340460192645, 3.881...","[0.16598777837156037, 0.4755600845012548, 0.67...","[8.568536076383324, 4.294536177218883, 3.03362...","[0.3188854513614908, 0.5975232375295538, 0.684...",1.159407,0.884354,adam
3,"['beagle', 'bernese_mountain_dog', 'dhole', 'e...",982,441,10,200,1.9e-05,11787,98,"[12.859470243123786, 11.982571741714011, 10.75...","[0.11608961356572367, 0.13849287499842478, 0.1...","[11.430217060880395, 9.780132278938412, 7.9990...","[0.12074303419412843, 0.18885448999449195, 0.2...",1.033548,0.877551,sgd
4,"['beagle', 'bernese_mountain_dog', 'dhole', 'e...",982,441,10,200,7.6e-05,3344,97,"[10.06785662256773, 4.840170306732116, 3.07574...","[0.2983706677275617, 0.6598778002859376, 0.790...","[5.666034890402212, 3.9021638406688584, 2.7525...","[0.5789473896425206, 0.7151702664584937, 0.798...",2.807766,0.807256,adam
5,"['beagle', 'bernese_mountain_dog', 'dhole', 'e...",982,441,10,200,7.6e-05,7370,102,"[11.965150679687376, 9.44600970196384, 6.06742...","[0.1395112039661699, 0.29633401464784703, 0.52...","[9.814261461559095, 6.352773416153049, 4.72292...","[0.24767801368568704, 0.49535603161566766, 0.6...",1.040372,0.897959,sgd


<p>
        **<font color='#D2691E'size="4">I) CNN parameters calibration</font>**
</p>

In [7]:
L_batch_sizes = [300,200]
dict_lr_ranges = {
    0:[5e-6,1e-5],
    1:[1e-5,5e-5],
    2:[5e-5,1e-4]
}
EPOCHS = 100 
NN_CALLBACKS = EarlyStopping(monitor='val_loss', min_delta=0, patience=3, verbose=0, mode='auto')
L_filtered_races = pickle.load(open(pickles_path+"L_10_filtered_races.p", "rb" ))
RACE_NUMBER = len(L_filtered_races)
L_filtered_races

['beagle',
 'bernese_mountain_dog',
 'dhole',
 'english_setter',
 'japanese_spaniel',
 'kelpie',
 'labrador_retriever',
 'rottweiler',
 'siberian_husky',
 'west_highland_white_terrier']

<p>
        **<font color='#D2691E'size="4">II) Looping over the CNN parameters</font>**
</p>

In [8]:
iter_start_time = time.time()

print('filtered_races : %s'%L_filtered_races)
print()
print('building the train and test datasets ...')
print()
dict_data = build_train_validation_and_test_datasets(L_filtered_races,'label_encoder_iterations','label_encoder_iterations')

X_train = dict_data['X_train']
X_val = dict_data['X_val']
X_test = dict_data['X_test']
    
y_train = dict_data['y_train']
y_val = dict_data['y_val']
y_test = dict_data['y_test']
            
training_len = X_train.shape[0]
testing_len = X_test.shape[0]
            
print('looping over the vgg16 parameters ...')
print()
for lr_iteration in range(0,20):            
    for BATCH_SIZE in L_batch_sizes:
        for key in dict_lr_ranges.keys():
            lr_interval = dict_lr_ranges[key]
            LEARNING_RATE = random.uniform(lr_interval[0], lr_interval[1])
            init_training_time = time.time()
            print()
            print()
            print("#############################################################################################")
            print("training the model with params :")
            print("Batch size = %s | Learning rate = %s"%(BATCH_SIZE,LEARNING_RATE))

            for optimizer in ['adam','sgd']:
                # Charger VGG-16 pré-entraîné sur ImageNet et sans les couches fully-connected
                model = VGG16(weights="imagenet", include_top=False, input_shape=(224, 224, 3))

                # choose the layers which are updated by training
                for layer in model.layers:
                    layer.trainable = False

                # Récupérer la sortie de ce réseau
                x = model.output
                # Ajouter la fonction Flatten à la nouvelle couche fully-connected pour la classification à 2 classes
                predictions = Flatten()(x)

                # Définir le nouveau modèle
                new_model = Model(inputs=model.input, outputs=predictions)

                # Ajout d'une couche Fully connected à 4096 neurones ayant une fonction d'activation "relu"
                x = new_model.output
                predictions = Dense(4096, activation='relu')(x)
                new_model = Model(inputs=new_model.input, outputs=predictions)

                """Ajout d'une couche Dropout"""
                x = new_model.output
                dropout = Dropout(0.2)(x)
                new_model = Model(inputs=new_model.input, outputs=dropout)


                # Ajout d'une couche supplémentaire Fully connected à 4096 neurones ayant une fonction d'activation "relu"
                x = new_model.output
                predictions = Dense(4096, activation='relu')(x)
                new_model = Model(inputs=new_model.input, outputs=predictions)

                """Ajout d'une couche Dropout"""
                x = new_model.output
                dropout = Dropout(0.2)(x)
                new_model = Model(inputs=new_model.input, outputs=dropout)

                # Ajout d'une couche Fully connected à 2 neurones ayant une fonction d'activation "softmax"
                x = new_model.output
                predictions = Dense(RACE_NUMBER, activation='softmax')(x)
                new_model = Model(inputs=new_model.input, outputs=predictions)


                # Compiler le modèle
                if optimizer == 'adam':
                    nn_optimizer = Adam(lr=LEARNING_RATE, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
                if optimizer == 'sgd':
                    nn_optimizer = SGD(lr=LEARNING_RATE, momentum=0.9)
                new_model.compile(loss="categorical_crossentropy", optimizer=nn_optimizer, metrics=["accuracy"])

                init_fitting_time = time.time()
                print('fitting the model ...')
                print('optimizer : %s'%optimizer)
                print()
                # Entraîner sur les données d'entraînement (X_train, y_train)
                history = new_model.fit(X_train,
                                        y_train,
                                        validation_data=(X_val,y_val),
                                        epochs=EPOCHS,
                                        callbacks = [NN_CALLBACKS],
                                        batch_size=BATCH_SIZE,
                                        verbose=1)

                fitting_time = int(time.time() - init_fitting_time)
                print('fitting_time : %s seconds'%fitting_time)
                print()

                init_prediction_time = time.time()
                print('computing the accuracy on the test data ...')
                print()
                performances = new_model.evaluate(X_test, y_test, verbose=0)
                test_loss = performances[0]
                test_accuracy = performances[1]
                print("test_loss : %s | test_accuracy : %s"%(test_loss,test_accuracy))
                prediction_time = int(time.time() - init_prediction_time)
                print('prediction_time : %s seconds'%prediction_time)
                print()

                dict_evaluation = {
                    'training_len':training_len,
                    'testing_len':testing_len,
                    'n_races':RACE_NUMBER,
                    'races':L_filtered_races,
                    'batch_size':BATCH_SIZE,
                    'learning_rate':LEARNING_RATE,
                    'fitting_time':fitting_time,
                    'prediction_time':prediction_time,
                    'epochs_losses':history.history['loss'],
                    'epochs_accuracies':history.history['acc'],
                    'epochs_val_losses':history.history['val_loss'],
                    'epochs_val_accuracies':history.history['val_acc'],
                    'test_loss':test_loss,
                    'test_accuracy':test_accuracy,
                    'optimizer':optimizer
                    }

                df_evaluation_final = df_evaluation_final.append(dict_evaluation, ignore_index = True)
                df_evaluation_final.to_csv('df_evaluation_final.csv', header=True)

                print('CSV written !')
                print()
                print('training_time : %s seconds'% (time.time() - init_training_time))
                print()

filtered_races : ['beagle', 'bernese_mountain_dog', 'dhole', 'english_setter', 'japanese_spaniel', 'kelpie', 'labrador_retriever', 'rottweiler', 'siberian_husky', 'west_highland_white_terrier']

building the train and test datasets ...

looping over the vgg16 parameters ...



#############################################################################################
training the model with params :
Batch size = 300 | Learning rate = 8.170208154307469e-06
fitting the model ...
optimizer : adam

Train on 982 samples, validate on 323 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
fitting_time : 10754 seconds

computing the accuracy on the test data ...

test_loss : 1.5964049271132603 | test_accuracy : 0.8299319727891157
prediction_time : 98 seconds

CSV written !

training_time

Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
fitting_time : 21204 seconds

computing the accuracy on the test data ...

test_loss : 0.8384133109413634 | test_accuracy : 0.9002267561531931
prediction_time : 98 seconds

CSV written !

training_time : 27311.47902393341 seconds



#############################################################################################
training the model with params :
Batch size = 300 | Learning rate = 5.890056624677413e-05
fitting the model ...
optimizer : adam

Train on 982 samples, validate on 323 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100

Epoch 25/100
Epoch 26/100


KeyboardInterrupt: 