## Abstract

Safran Electronics & Defense is developing a multi-target detection, classification, tracking and localization video chain for land, sea and airborne defence applications to assist operators in their missions.

This video chain can be deployed in the context of directional sights, handheld cameras, panoramic surveillance sights, or distributed wide field sensors. 

Safran Electronics & Defense wishes to optimize the valorization of its image & video database, which is heterogeneous, both in terms of content (type of campaign air, land, sea, type of sensor, associated metadata of navigation, image content, etc etc) and in terms of form (file format). 

## Introduction

As part of our annual INFONUM project, we are partnering with the company SAFRAN.
 
Thus, within the framework of this project, we have to set up an architecture for extracting metadata each time a new file is added to the storage server. This architecture includes different bricks (extraction module, database, GUI, etc.) and notably AI bricks. Our partner gave us the freedom to choose what we would like to implement and we chose to work on a Deep Learning module allowing us to classify the contexts to which the imported images could belong (aerial, rural, maritime, desert, mountainous context).
 
Having seen that it was possible to couple the project of the Deep Learning module with the project of another course requiring a Deep Learning part, we thus made the choice to link these two projects.
 
The team is composed of two students: LEVY Daniel and PUJOL Corentin. As I stated above, we want to implement and compare different Deep Learning approaches to extract the context of different images. The objective would be to implement different models (ResNet, MobileNet, VGG for example) and to study their hyperparameters in order to obtain the best possible results on context extraction. The data used will be those proposed by the partner. As the partner has not yet provided us with the data, we have found several sources allowing us to start working with a dataset obtained from the Kaggle platform. 
 
The approach we would like to take would be to implement different pre-trained models, then to perform Transfer Learning (with and without fine-tuning) in order to make these models perform well on our data.

## Dataset presentation

This is Landscape classification dataset. This data consists of 5 different classes. Each class representing a kind of landscape. These classes are :

    Coast
    This class contains images belonging to coastal areas, or simply beaches.

    Desert
    This class contains images of desert areas such as Sahara Thar, etc.

    Forest
    This class is filled with images belonging to forest areas such as Amazon.

    Glacier
    This class consists of some amazing white images, These images belongs to glaciers. For example, the Antarctic.

    Mountains
    This class shows you the world from the top i.e. the mountain areas such as the Himalayas.
    
This data is first divided into 3 sub directories. These sub directories are the training, validation, and testing data directories. Another directory of tensorflow records is also added, which is further divided into 3 directories of training, validation and testing data, containing the tensorflow records of these images. This allow you to load the data both using Image Data Generator, or using the tensorflow records.

From my perspective for any. For any model to perform well on this data set, the model should have proper knowledge of the colors and the geometry of the image. Because when both the colors and geometry come together, they make up a landscape.

## Our approach to the problem

In order to succeed in determining the best performing model, we will load the infrastructures of four well-known models, already performing well on more global data sets. The objective is to re-train these models on the data of our problem.

We will divide the training phase into several steps in order to study each hyperparameter, understand how they work and finally choose the ones that will make the models perform best with respect to our initial problem.

After loading the pre-trained architectures, in order to avoid any fine-tuning of the first layers of the network, the gradients of the corresponding parameters must be frozen. Then, in order to perform the learning transfer, we will replace the last classification layer by the one appropriate to our problem. Thus, our prediction layer will be trained on our new data based on the performance of the previous layers already pre-trained. Once this step is completed, we will proceed to a fine-tuning step, in order to adjust the parameters of all the layers of our models.

To do so, we will define the cost function adapted to the problem and we will first apply the gradient descent
to the parameters of the newly defined fully connected layer only, with the different hyperparameters chosen. We will compare the performance of the model according to the chosen hyperparameters in order to determine which ones are the most suitable for training our models.

After training the new fully connected layer, we will untrain the upper layers of the model in the fine-tuning stage. We will follow the same approach as in the transfer learning step in order to make our models as efficient as possible.

We assume that some hyperparameters do not have the same effect depending on the architecture they are applied to, so we expect to get very good results on the different models but with different chosen hyperparameters.

### Selected pre-trained models

Resnet18, MobileNet, AlexNet, VGG16

### Hyperparameters studied

Hyperparameters are adjustable parameters that let you control the model optimization process. Different hyperparameter values can impact model training and convergence rates (read more about hyperparameter tuning)

We define the following hyperparameters for training:

- Number of Epochs - the number times to iterate over the dataset

    Use of early stopping to record only the best and not the last model (recording only when the loss decreases)

- Batch Size - the number of data samples propagated through the network before the parameters are updated

    Try [16, 32, 64, 128, 256] but in fact, smallers should be better according some research (https://reader.elsevier.com/reader/sd/pii/S2405959519303455?token=C37354B4B0DEA9791B9005F04612342318223AC16CAF73061D563AC0CF8937F435433C40BB033FEBDFA6AF09F4C5A4FB&originRegion=eu-west-1&originCreation=20230216143607)
    
- Learning Rate - how much to update models parameters at each batch/epoch. Smaller values yield slow learning speed, while large values may result in unpredictable behavior during training.

    How to adjust learning rate

    torch.optim.lr_scheduler provides several methods to adjust the learning rate based on the number of epochs. torch.optim.lr_scheduler.ReduceLROnPlateau allows dynamic learning rate reducing based on some validation measurements.

    Learning rate scheduling should be applied after optimizer’s update; e.g., you should write your code this way:

    Example:

        model = [Parameter(torch.randn(2, 2, requires_grad=True))]

        optimizer = SGD(model, 0.1)

        scheduler = ExponentialLR(optimizer, gamma=0.9)

        for epoch in range(20):

            for input, target in dataset:

                optimizer.zero_grad()
                output = model(input)
                loss = loss_fn(output, target)
                loss.backward()
                optimizer.step()

            scheduler.step()

### Loss function

We have to choose a loss function to make all our experiences.

Then, explain why ?

### Optimizer 

Optimization is the process of adjusting model parameters to reduce model error in each training step. Optimization algorithms define how this process is performed.  All optimization logic is encapsulated in the optimizer object. Here, we use the SGD optimizer; additionally, there are many different optimizers available in PyTorch such as ADAM and RMSProp, that work better for different kinds of models and data.

We initialize the optimizer by registering the model’s parameters that need to be trained, and passing in the learning rate hyperparameter.

Example with SGD algorithm: 

    optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

Link to see all the existing algorithms (SGD, Adagrad, RMSProp, Adam, NAdam, etc...) : https://pytorch.org/docs/stable/optim.html

The different optimizers can be divided in two families: gradient descent optimizers and adaptive optimizers. This division is exclusively based on an operational aspect which forces you to manually tune the learning rate in the case of Gradient Descent algorithms while it is automatically adapted in adaptive algorithms — that’s why we have this name.

    Gradient Descent:

    Batch gradient descent
    Stochastic gradient descent
    Mini-batch gradient descent
    
    Adaptive:

    Adagrad
    RMSprop
    Adam 
    
https://towardsdatascience.com/7-tips-to-choose-the-best-optimizer-47bb9c1219e

### Regularization

Regularization techniques are used to prevent overfitting by adding a penalty to the model's loss function. Some common regularization techniques include weight decay (L2 regularization), dropout, and batch normalization.

We can try also data augmention, if it can help us to reach better result.

### Bonus : Activation function  

(Relu, softmax, sigmoïd, tanh)

## Experiences

In [2]:
from module import *



Images normalization for pytorch pretrained models*

voir: https://pytorch.org/docs/stable/torchvision/models.html et ici pour les « explications » sur les valeurs exactes: https://github.com/pytorch/vision/issues/1439

In [3]:
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])

data_transforms = transforms.Compose([
    transforms.Resize([224, 224]),
    transforms.ToTensor(),
    transforms.Normalize(mean=mean, std=std)
])

We define the image directory for our personal data and the test_size:

In [6]:
image_directory = "C:/Users/coco8/Documents/CentraleSupelec/Projet_année_local/dataset"
test_size = 0.2

Here is the dataset function that we define in module.py. This function allow us to load correctly all our data from the directory to three variables : dataset_train, dataset_val, dataset_test. These variables allow us to handle easily our images and to give it to our different pretrained models. 

In [7]:
dataset_train, dataset_val, dataset_test = dataset(data_transforms, image_directory, test_size)

In [8]:
# on utilisera le GPU (beaucoup plus rapide) si disponible, sinon on utilisera le CPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
#device = torch.device("cpu") # forcer en CPU s'il y a des problèmes de mémoire GPU (+ être patient...)
print("GPU available : ",torch.cuda.is_available())
if torch.cuda.is_available():
    print('__CUDNN VERSION:', torch.backends.cudnn.version())
    print('__Number CUDA Devices:', torch.cuda.device_count())
    print('__CUDA Device Name:',torch.cuda.get_device_name(0))
    print('__CUDA Device Total Memory [GB]:',torch.cuda.get_device_properties(0).total_memory/1e9)

GPU available :  True
__CUDNN VERSION: 8302
__Number CUDA Devices: 1
__CUDA Device Name: NVIDIA GeForce RTX 3060 Laptop GPU
__CUDA Device Total Memory [GB]: 6.441926656


Loading of the pretrained models:

In [9]:
resnet18 = models.resnet18(pretrained=True)
mobilenet = models.mobilenet_v2(pretrained=True)
vgg16 = models.vgg16(pretrained=True)
alexnet = models.alexnet(pretrained=True)

my_models = [resnet18, mobilenet, vgg16, alexnet]

Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to C:\Users\coco8/.cache\torch\hub\checkpoints\mobilenet_v2-b0353104.pth
100.0%
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to C:\Users\coco8/.cache\torch\hub\checkpoints\vgg16-397923af.pth
62.7%IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)

85.8%IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)

100.0%


Transfer learning with fine tunning:

https://towardsdatascience.com/transfer-learning-with-convolutional-neural-networks-in-pytorch-dd09190245ce

In [None]:
# for my_net in my_models:  
