## Loading the arabic words dataset

*    **class_mode="categorical"**: 2D output (aka. list of numbers of length N), [0, 0, 1, 0], which is a one-hot encoding (only one number is 1/ "hot") representing the donkey. This is for mutually exclusive labels. A dog cannot be a cat, a human is not a dog.

In [1]:
from keras.preprocessing.image import ImageDataGenerator


# load train data
train_path = r"C:\Users\ADEM\Desktop\ESPRIT_Education\4er\PI DS\image preprocessing\Data Augmentation\signature dataset\train" 
val_path = r"C:\Users\ADEM\Desktop\ESPRIT_Education\4er\PI DS\image preprocessing\Data Augmentation\signature dataset\val" 
# create a new generator
datagen = ImageDataGenerator(rotation_range = 3, #random rotation of 0 to 3 degrees
                            width_shift_range=0.05,
                            height_shift_range=0.05,
                            shear_range = 0.01,
                            zoom_range = 0.01,        
                            horizontal_flip = False,
                            vertical_flip=False,
                            fill_mode="constant",cval=255)

train = datagen.flow_from_directory(train_path, 
                                    class_mode="categorical", 
                                    shuffle=False,  
                                    target_size=(256, 256))
# load val data
val = datagen.flow_from_directory(val_path, 
                                  class_mode="categorical", 
                                  shuffle=False,  
                                  target_size=(256, 256))

Found 1212 images belonging to 4 classes.
Found 212 images belonging to 4 classes.


## Model


    
   *  starting with an  **input layer**
  
   *  **bloc1** :   
   
       *  a single **convolutional layer** with a small ***filter size (5,5)** and a modest number of nodes(filters) ***(15)***
       *  the result is a two-dimensional array of output values that represent a filtering of the input that maps the detected features in what we call a ***“feature map“***
       *  ***Multiple Filters*** <==> learning multiple features in parallel for a given input.
       *  ***Multiple Layers*** <==> The stacking of conv layers allows a hierarchical decomposition of the input        
       *  followed by a **max pooling layer**
       *  operates on each feature map separately to **create a new set** of the same number of pooled feature maps
       *  much like a filter to be applied to feature maps, ***almost always 2×2*** pixels applied with a stride of 2 pixels, and will always reduce the size of each feature map by a factor of pool size
  
   *  **bloc2** :   
   
       *  a single **convolutional layer** with a small ***filter size (5,5)** and a modest number of nodes(filters) ***(30)*** 
       *  followed by a **max pooling layer**
       *  operates on each feature map separately to **create a new set** of the same number of pooled feature maps      
  
   *  **bloc3** :   
   
       *  a single **convolutional layer** with a small ***filter size (3,3)** and a modest number of nodes(filters) ***(40)*** 
       *  followed by a **max pooling layer**
       *  operates on each feature map separately to **create a new set** of the same number of pooled feature maps             
 
         
         
   *  then be **flattened** to provide features to the classifier.  
     
     


####  The Back-End (Classifer):  

  
   *  ***multi-class classification task***
       *  require an **output layer with 6 nodes** in order to predict the probability distribution of an image belonging to each of the 6 classes.
       *  require the use of a ***softmax activation function*** (probability function)
       *  we can add a ***60 nodes dense layer*** to interpret the features.
       *  All layers will use the ***ReLU activation function*** and the He weight initialization scheme for the simple reason is that we have the input data in image formats which means ***all values of the image matrix will be from 0 to 255**, and to avoid problems that can accures after conv layer or pooling we will be ***replacing all negative values with 0 and keep all the remaining values as they are***.


#### Optimization :

*  stochastic gradient descent optimizer 
      *  an optimization algorithm 
      *  The job of the algorithm is to find a set of internal model parameters that perform well against some performance measure such as logarithmic loss or mean squared error.
*  learning rate of 0.01 
*  a momentum of 0.9
*  and "categorical cross-entropy" for the  loss function
*  to accelerate the learning of a model : ***Batch normalization*** after convolutional and fully connected layers, designed to automatically standardize the inputs to a layer and has the effect of **dramatically accelerating the training process** of a neural network, the layer will keep track of statistics for each input variable and use them to standardize the data.

In [2]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPool2D, Flatten, Dense, InputLayer, BatchNormalization, Dropout
from tensorflow.keras.optimizers import SGD
from numpy import mean
from numpy import std
from matplotlib import pyplot as plt
import h5py
def define_model():
    # build a sequential model
    model = Sequential()
    model.add(InputLayer(input_shape=(256, 256, 3)))
    
    # 1st conv block
    #25 for 10 classes ==> (10*10)/4 
    model.add(Conv2D(10, (5, 5), activation='relu', strides=(1, 1), padding='same'))
    model.add(MaxPool2D(pool_size=(2, 2), padding='same'))
    
    # 2nd conv block
    #50 for 10 classes ==> (10*10)/2 
    model.add(Conv2D(20, (5, 5), activation='relu', strides=(2, 2), padding='same'))
    model.add(MaxPool2D(pool_size=(2, 2), padding='same'))
    model.add(BatchNormalization())
    
    # 3rd conv block
    #70 for 10 classes ==> (10*10)*0.7 
    model.add(Conv2D(28, (3, 3), activation='relu', strides=(2, 2), padding='same'))
    model.add(MaxPool2D(pool_size=(2, 2), padding='valid'))
    model.add(BatchNormalization())
    
    # ANN block
    #100 for 10 classes ==> (10*10) 
    model.add(Flatten())
    model.add(Dense(units=40, activation='relu'))
    model.add(Dense(units=40, activation='relu'))
    model.add(Dropout(0.25))
    # output layer
    model.add(Dense(units=4, activation='softmax'))
    
    # compile model
    opt = SGD(learning_rate=0.01, momentum=0.9)
    model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
    return model

## Model Running Function

In [3]:
# evaluate a model using k-fold cross-validation
def run_model(train, val):
    # define model
    model = define_model()
    # fit model
    run = model.fit(train, batch_size=32, epochs=10, validation_data=val, verbose=1) 
    #verbose: 0 for no logging to stdout, 1 for progress bar logging, 2 for one log line per epoch.
    # save model
    model.save('final_signature_model.h5') 
    return True

## Testing

**this cell bellow** dedicated for testing  
when runing the cell it will take about **18mnit** to be done, and for that i turned it into mardkown cell to avoid long runs

In [5]:
#run the test harness for evaluating a model
def run_test_signature_model():
    # evaluate model
    runs = run_model(train, val)

#entry point, run the test harness
run_test_signature_model()

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
