# Template VISION

**Prerequisites:**

- This notebook must have been generated using the Gabarit's numerical template.


- **Launch this notebook with a kernel using your project virtual environment**. In order to create a kernel linked to your virtual environment : `pip install ipykernel` and then `python -m ipykernel install --user --name=your_venv_name` (once your virtual environment is activated). Obviously, the project must be installed on this virtual environment

---
---
---

## 1. How this template works

### Why use gabarit's vision template ?

The vision template automatically generates a project folder and python code containing mainstream models and facilitating their industrialization.

The generated project can be used for **image classification** and **object detection** tasks. Of course, you have to adapt it to your particular use case. 

### Structure of the generated project

<div style="font-family: monospace; display: grid; grid-template-columns: 1fr 2fr;">
  <div>.                                </div>  <div style="color: green;"></div>
  <div>.                                </div>  <div style="color: green;"></div>
  <div>├── {{package_name}}             </div>  <div style="color: green;"># The package</div>
  <div>│ ├── models_training            </div>  <div style="color: green;"># Folder containing all the modules related to the models</div>
  <div>│ ├── monitoring                 </div>  <div style="color: green;"># Folder containing all the modules related to the explainers and MLflow</div>
  <div>│ └── preprocessing              </div>  <div style="color: green;"># Folder containing all the modules related to the preprocessing</div>
  <div>├── {{package_name}}-data        </div>  <div style="color: green;"># Folder containing all the data (datasets, embeddings, etc.)</div>
  <div>├── {{package_name}}-exploration </div>  <div style="color: green;"># Folder where all your experiments and explorations must go</div>
  <div>├── {{package_name}}-models      </div>  <div style="color: green;"># Folder containing all the generated models</div>
  <div>├── {{package_name}}-ressources  </div>  <div style="color: green;"># Folder containing some ressources such as the instructions to upload a model</div>
  <div>├── {{package_name}}-scripts     </div>  <div style="color: green;"># Folder containing examples script to preprocess data, train models, predict and use a demonstrator</div>
  <div>│ └── utils                      </div>  <div style="color: green;"># Folder containing utils scripts (such as split train/test, sampling, etc...)</div>
  <div>├── {{package_name}}-transformers </div>  <div style="color: green;"># Folder containing pytorch transformers</div>
  <div>├── {{package_name}}-tutorials    </div>  <div style="color: green;"># Folder containing notebook tutorials, including this one</div>
  <div>├── tests                        </div>  <div style="color: green;"># Folder containing all the unit tests</div>
  <div>├── .gitignore                   </div>  <div style="color: green;"></div>
  <div>├── .coveragerc                  </div>  <div style="color: green;"></div>
  <div>├── Makefile                     </div>  <div style="color: green;"></div>
  <div>├── nose_setup_coverage.cfg      </div>  <div style="color: green;"></div>
  <div>├── README.md                    </div>  <div style="color: green;"></div>
  <div>├── requirements.txt             </div>  <div style="color: green;"></div>
  <div>├── setup.py                     </div>  <div style="color: green;"></div>
  <div>└── version.txt                  </div>  <div style="color: green;"></div>
</div>


**General principles on the generated packages**

- Data must be saved in the `{{package_name}}-data` folder<br>
<br>
- Trained models will automatically be saved in the `{{package_name}}-models` folder<br>
<br>
- Be aware that all the functions/methods for writing/reading files uses these two folders as base. Thus when a script has an argument for the path of a file/model, the given path should be **relative** to the `{{package_name}}-data` / `{{package_name}}-models` folders.<br>
<br>
- The provided scripts in `{{package_name}}-scripts` are given as example. You can use them as accelerators, but their use is not required.<br>
<br>
- You can use this package for image classification and object detection<br>
<br>
- The modelling part is structured as follows :
    - `ModelClass`: main class taking care of saving data and metrics (among other)
    - `ModelKeras`: child class of ModelClass managing all models using Keras
<br>
<br>
- Each task (image classification and object detection) has a mixin class (`ModelClassifierMixin` and `ModelObjectDetectorMixin`) and specific models located in corresponding subfolders.

---
---
---

### Load utility functions

Please run the following cell to load needed utility functions. These functions are only needed in this notebook.

In [None]:
%load_ext autoreload
%autoreload 2

# Import utility functions
import os
import sys
sys.path.append(os.path.abspath(''))
from tutorial_exercices import answers, verify, utils

---

## 2. Use the template to train your first model

###  dataset

We are going to use a [small dataset](https://github.com/OSS-Pole-Emploi/gabarit/blob/main/gabarit/template_vision/vision_data/dataset_v3) containing 144 images of three categories : 

| Birman cat                                                                                                                                             | Bombay cat                                                                                                                                             | Shiba dog                                                                                                                                               |
|--------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
| ![A picture of a birman cat](https://github.com/OSS-Pole-Emploi/gabarit/blob/main/gabarit/template_vision/vision_data/dataset_v3/birman/Birman_22.jpg?raw=true) | ![A picture of a bombay cat](https://github.com/OSS-Pole-Emploi/gabarit/blob/main/gabarit/template_vision/vision_data/dataset_v3/bombay/Bombay_45.jpg?raw=true) | ![A picture of a shiba dog](https://github.com/OSS-Pole-Emploi/gabarit/blob/main/gabarit/template_vision/vision_data/dataset_v3/shiba/shiba_inu_15.jpg?raw=true) |

In [None]:
# This function download all dataset images in {{package_name}}-data/dataset_v3
utils.github_download_classification_dataset()

You can verify that a new folder called `dataset_v3` is present in your `{{package_name}}-data` directory. It contains three subfolders, one for each category : `birman`, `bombay` and `shiba`.

---
<span style="color:red">**Exercice 1**</span> : **train / valid / test split**

**Goal:**

- Split the main dataset in train / valid / test sets

**TODO:**
- Use the script `utils/0_split_train_valid_test.py` on `dataset_v3`
- Use ratios train / valid / test ratio to : 0.6 / 0.2 / 0.2 (which is default ratio)

**Help:**
- The file `utils/0_split_train_valid_test.py` splits a folder in 3 :
    - `{dataset_folder}_train`: the training dataset
    - `{dataset_folder}_valid`: the validation dataset
    - `{dataset_folder}_test`: the test dataset
- You can specify the type of split : random or stratified (here, use random)
- Reminder: the path are relatives to `{{package_name}}-data`
- See the script helper : `python {{package_name}}-scripts/utils/0_split_train_valid_test.py --help`
- Don't forget to activate your virtual environment ...

#### Exercice 1 : Verify your answer ✔

In [None]:
verify.verify_exercice_1()

#### Exercice 1 : Solution 💡

In [None]:
answers.answer_exercice_1()

---
<span style="color:red">**Exercice 2**</span> : **random sample**

**Goal:**

- Get a random sample of train and test sets (n=3) (we won't use it, this exercise is just here to show what can be done)

**TODO:**
- Use the script `utils/0_create_samples.py` on the directories `dataset_v3_train` and `dataset_v3_test`
- We want samples of 3 images

**Help:**
- Use the script : `utils/0_create_samples.py`
- To get the possible arguments of the script: `python 0_create_samples.py --help`
- Don't forget to activate your virtual environment ...

#### Exercice 2 : Verify your answer ✔

In [None]:
verify.verify_exercice_2()

#### Exercice 2 : Solution 💡

In [None]:
answers.answer_exercice_2()

---
<span style="color:red">**Exercice 3**</span> : **pre-processing**

- The script `1_preprocess_data.py` applies a preprocessing pipeline **to all images of given directories**
- The argument `--preprocessing` (or simply `-p`) is used to specify which preprocessing pipeline should be used. 
- It works as follows:
    - In `{{package_name}}/preprocessing/preprocess.py`: 
        - There is a dictionary of functions (`pipelines_dict`): key: str -> function 
            - /!\ Don't remove the default element 'no_preprocess': lambda x: x /!\ 
        - There are preprocessing functions
    - In `1_preprocess_data.py` :
        - We retrieve the dictionary of functions from `preprocessing/preprocess.py` 
        - If a `preprocessing` argument is specified, we keep only the corresponding key from the dictionnary 
        - Otherwise, we keep all keys (except `no_preprocess`) 
        - For each entry of the dictionary, we:
            - Get the associated preprocessing function
            - Load images
            - apply the preprocessing function
            - Save the result in a new folder

In [None]:
from {{package_name}}.preprocessing.preprocess import get_preprocessors_dict
utils.display_source(get_preprocessors_dict, strip_comments=False)

As you can see, two pipelines are given as examples : `preprocess_convert_rgb` and `preprocess_docs`. 

So if `--preprocessing` is omitted, `1_preprocess_data.py` will preprocess images with each pipeline and store the results in a `<directory>_preprocess_convert_rgb` directory and a `<directory>_preprocess_docs` directory respectively.


**Goal:**
- Use `preprocess_convert_rgb` on train, validation and test data
    - This will simply convert all images in RGB mode.

**TODO:**
- Use the script `1_preprocess_data.py` to preprocess `dataset_v3_train` and `dataset_v3_valid` with `preprocess_convert_rgb`.

**Help:**
- To get the possible arguments of the script: `python 1_preprocess_data.py --help`
- Don't forget to activate your virtual environment ...

**Important:**

- Do not worry about applying the pipeline to the test dataset. Our models will store the preprocessing pipelines and :
    - The prediction script `3_predict.py` will preprocess the test dataset with the preprocessing pipeline before sending the data to the model's predict function. This is the **batch mode**.
    - We also expose an agnostic `predict` function (in `utils_models`) to handle new data on the fly. It will preprocess it with the preprocessing pipeline before sending the data to the model's predict function. This is the **API mode**.

#### Exercice 3 : Verify your answer ✔

In [None]:
verify.verify_exercice_3()

#### Exercice 3 : Solution 💡

In [None]:
answers.answer_exercice_3()

---
<span style="color:red">**Exercice 4**</span> : **Train a classifier**

**Goal:**

- Train a classification model on preprocessed data.
- Use default model `ModelCnnClassifier`.

**TODO:**
- Use the script `2_training_classifier.py` to train a classifier on `dataset_v3_train_preprocess_convert_rgb`
- Use `dataset_v3_valid_preprocess_convert_rgb` as validation data (use `--directory_valid` argument)

**Help:**
- You can reduce the number of epochs in `2_training_classifier.py` to reduce training time
- To get the possible arguments of the script: `python 2_training_classifier.py --help`
- Don't forget to activate your virtual environment ...

#### Exercice 4 : Verify your answer ✔

In [None]:
verify.verify_exercice_4()

#### Exercice 4 : Solution 💡

In [None]:
answers.answer_exercice_4()

---
<span style="color:red">**Exercice 5**</span> : **Use transfer learning to get a better model**

Our previous classifier perform poorly on validation data. This is partly due to the fact that we do not have enough data to train our model. 

Here we are going to use transfer learning to train a better classifier : we will use a model composed of pretrained layers for feature extraction and a classfication layer on top.

**Goal:**

- Train a `ModelTransferLearningClassifier` on preprocessed data.

**TODO:**
- Use the script `2_training_classifier.py` to train a `ModelTransferLearningClassifier` on `dataset_v3_train_preprocess_convert_rgb`
- Use `dataset_v3_valid_preprocess_convert_rgb` as validation data (use `--directory_valid` argument)
- **You may appreciate to reduce the number of epochs to accelerate learning (you can use 5 epochs)**
- **You should probably turn off fine-tuning : `with_fine_tune=False` since it requires a lot of memory**. Otherwise the training may crash during fine-tunning.

**Help:**

- If you look at `2_training_classifier.py` you will see that `ModelTransferLearningClassifier` is commented :

```python
if model is None:
    model = model_cnn_classifier.ModelCnnClassifier(
        batch_size=64, epochs=100, validation_split=0.2, patience=10,
        width=224, height=224, depth=3, color_mode='rgb',
        in_memory=False, data_augmentation_params={}, level_save=level_save
    )
    # model = model_transfer_learning_classifier.ModelTransferLearningClassifier(
    #     batch_size=64, epochs=100, validation_split=0.2, patience=10,
    #     width=224, height=224, depth=3, color_mode='rgb', in_memory=False, 
    #     data_augmentation_params={}, with_fine_tune=True, second_epochs=99, 
    #     second_lr=1e-5, second_patience=5, level_save=level_save
    # )

```
- You can simply comment `ModelCnnClassifier` line and uncomment `ModelTransferLearningClassifier` line
- This model uses pretrained [EfficientNet](https://arxiv.org/abs/1905.11946) base layers + a classification head. You can easiliy modify the model class to use another base model if you want.
- The fine tune part can be costly, hence we advise using `with_fine_tune=False` for this exercise.
- To get the possible arguments of the script: `python 2_training_classifier.py --help`
- Don't forget to activate your virtual environment ...

#### Exercice 5 : Verify your answer ✔

In [None]:
verify.verify_exercice_5()

#### Exercice 5 : Solution 💡

In [None]:
answers.answer_exercice_5()

In [None]:
answers.answer_exercice_5_training()

As you can see this model outperforms the previous one thanks to the Transfer Learning approach.

---
<span style="color:red">**Exercice 6**</span> : **Test your model on the test dataset**

**Goal:**

- Use your `ModelTransferLearningClassifier` model to predict on the test dataset

**TODO:**

- Use the script `3_predict.py` to make predictions on `dataset_v3_test`

**Help:**

- Use `3_predict.py -h` to see CLI helper.
- You **DO NOT** need to preprocess the test data ! As we said above, the preprocessing pipeline is saved alongside the model, and the script will preprocess the test data before sending it to the model's predict function.

#### Exercice 6 : Verify your answer ✔

In [None]:
verify.verify_exercice_6()

#### Exercice 6 : Solution 💡

In [None]:
answers.answer_exercice_6()

---
---
---


## 3. Use a saved model in python

In this section, we will see how to load a saved model in python for use with new data

### Load a saved model

First choose one of your saved models :

In [None]:
import os
import pandas as pd

from pathlib import Path
from {{package_name}}.utils import get_models_path, get_data_path
from {{package_name}}.models_training import utils_models

DATA_PATH = Path(get_data_path())
MODELS_PATH = Path(get_models_path())

# This line list saved model in template_num-models
saved_model_names = sorted([model.name for model in MODELS_PATH.glob("*/model_*")])
print("\n".join(saved_model_names))

Then load it with `utils_models.load_model` :

In [None]:
model_name = saved_model_names[-1]
print(model_name)

model, model_conf = utils_models.load_model(model_name)

### Make predictions on new data

First we download examples from [wikimedia](https://commons.wikimedia.org/w/index.php?search=&title=Special:MediaSearch&go=Go&type=image) :

In [None]:
birman_example = "https://upload.wikimedia.org/wikipedia/commons/6/68/Birmanstrofe_2005_%28cropped%29.jpg"
bombay_example = "https://upload.wikimedia.org/wikipedia/commons/5/55/Sable_Bombay_Cat_Rosie.jpg"
shiba_example = "https://upload.wikimedia.org/wikipedia/commons/b/b1/Shiba-inu.jpg"

df_examples = pd.DataFrame(
    {
        "file_name": ["birman_example.jpg", "bombay_example.jpg", "shiba_example.jpg"],
        "file_class": ["birman", "bombay", "shiba"],
        "file_url": [birman_example, bombay_example, shiba_example], 
    }
)

df_examples["file_path"] = [
    (DATA_PATH / "examples" / file_name).as_posix() 
    for file_name in df_examples["file_name"]
]

(DATA_PATH / "examples").mkdir(exist_ok=True)

for example_url, example_name in zip(df_examples["file_url"], df_examples["file_path"]):
    utils.download_file(example_url, example_name, overwrite=True)

df_examples

Then we use our model to predict classes : 

In [None]:
import tempfile
from PIL import Image
from {{package_name}}.preprocessing import preprocess

# Load images
file_paths = list(df_examples['file_path'].values)
images = [Image.open(_) for _ in file_paths]

# Get preprocessing pipeline
preprocess_str = model_conf['preprocess_str']
preprocessor = preprocess.get_preprocessor(preprocess_str)

# Preprocess images
images_preprocessed = preprocessor(images)

# We'll create a temporary folder to save preprocessed images (models need a directory as input)
with tempfile.TemporaryDirectory(dir=utils.get_data_path()) as tmp_folder:
    # Save images
    images_path = []
    for i, img in enumerate(images_preprocessed):
        img_path = os.path.join(tmp_folder, f"image_{i}.png")
        img.save(img_path, format='PNG')
        images_path.append(img_path)

    # Get predictions
    df = pd.DataFrame({'file_path': images_path})
    predictions1 = model.predict(df)

accuracy = (predictions1 == df_examples["file_class"]).sum() / predictions1.shape[0]

print(predictions1)
print(f"Accuracy {accuracy:.0%}")

The result is pretty good ! :)  
But it's quite annoying to manage the preprocessing, a temporary folder, etc...   
Hopefully, we have a function that manage all of this : `utils_models.predict`

### Make predictions on new data - using the `utils_models.predict` function

An alternative is to use the provided (model agnostic) `utils_models.predict` function.

This function does not need the data to be preprocessed. Everything is managed inside the function, you just have to provide the dataset and the model.

In [None]:
predictions2 = utils_models.predict(df_examples, model, model_conf)  # Returns a list here

# Verifying accuracy :
accuracy2 = (predictions2 == df_examples["file_class"]).sum() / len(predictions2)
print(f"Accuracy v2 : {accuracy2:.2%}")

You can try the model on images from [google image](https://www.google.fr/search?q=shiba&source=lnms&tbm=isch)

---
---
---


## 4. Use the template for object detection 

In previous sections we saw how to train a model to solve classification problem thanks to the script `3_training_classifier.py`. Here we are going to see how to use `3_training_object_detector.py` script for object detection.


###  Dataset

We are going to use a [small dataset](https://github.com/OSS-Pole-Emploi/gabarit/tree/main/gabarit/template_vision/vision_data/dataset_object_detection) containing 30 pictures of fruits :

| Apples                                                                                                                                                          | Bananas                                                                                                                                                           | Oranges                                                                                                                                                           |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| ![A picture of apples](https://github.com/OSS-Pole-Emploi/gabarit/blob/main/gabarit/template_vision/vision_data/dataset_object_detection/apple_21.jpg?raw=true) | ![A picture of a banana](https://github.com/OSS-Pole-Emploi/gabarit/blob/main/gabarit/template_vision/vision_data/dataset_object_detection/banana_10.jpg?raw=true) | ![A picture of oranges](https://github.com/OSS-Pole-Emploi/gabarit/blob/main/gabarit/template_vision/vision_data/dataset_object_detection/orange_44.jpg?raw=true) |

In [None]:
# This function download all dataset images in {{package_name}}-data/dataset_object_detection
utils.github_download_object_detection_dataset()

You can verify that a new folder called `dataset_object_detection` is present in your `{{package_name}}-data` directory. It contains pictures of fruits and a file called `metadata_bboxes.csv` that contains the fruits' bounding boxes.

<span style="color:red">**Exercice 7**</span> : **Preprocess, train and predict**

**Goal:**
- Use everything we have learned in the previous exercices to create a object detection model that is capable of spotting apples, bananas and oranges in a picture.

**TODO:**

- Split `dataset_object_detection` into train / valid / test datasets thanks to `utils/0_split_train_valid_test.py`
- Use `2_training_object_detector.py` to train a `ModelDetectronFasterRcnnObjectDetector` in train data.
    - We skip the preprocessing here.
- Make prediction on the test data thanks to `3_predict.py`

**Help:**

- Each script has a CLI helper.
- **You may appreciate to reduce the number of epochs in `2_training_object_detector.py` to reduce training time**
- Don't forget to activate your virtual environment ...

#### Exercice 7 : Verify your answer ✔

In [None]:
verify.verify_exercice_7()

#### Exercice 7 : Solution 💡

In [None]:
answers.answer_exercice_7()

See next section to try your model in a web demonstrator ! 

---
---
---


## 5. BONUS : Start up a small web app to introduce your models 🚀 

You are now ready to demonstrate how good your models work. We implemented a default ***Streamlit*** app., let's try it !

```bash
# do not forget to activate your virtual environment
# source venv_num_template/bin/activate 

streamlit run {{package_name}}-scripts/4_demonstrator.py
```

It will start a Streamlit app on the default port (8501)

Visit [http://localhost:8501](http://localhost:8501) to see you demonstrator