<h1> <FONT COLOR="#273B5F"> Structured pruning of image classification keras models by using NetsPresso </h1>

NetsPresso is a platform that provides tools and infrastructure for building and deploying machine learning models. One of its powerful features is the NetsPresso Compressor, which helps optimizing machine learning models by reducing their size and computational requirements. This is achieved through techniques such as pruning, which can be done using various norms like L2 and GM, or filter decomposition techniques such as Tucker decomposition, Singular Value Decomposition (SVD), or Canonical Polyadic (CP).

Structured pruning is a model compression technique that involves removing specific weights or connections from a neural network while preserving its overall structure. This technique can significantly reduce the size of the model and make it faster to run on edge devices without compromising its accuracy.

**This notebook demonstrates the process of static structured pruning for deep learning models using NetsPresso. It covers the process of training a classification model, pruning, fine-tuning, and quantizing it using the STM32 model zoo. The STM32Cube.AI Developer Cloud is then used to benchmark the models.**


## License of the Jupyter Notebook

Copyright (c) 2022 STMicroelectronics.

All rights reserved.

This software is licensed under terms that can be found in the LICENSE file in
the root directory of this software component.

If no LICENSE file comes with this software, it is provided AS-IS.

Copyrights (c) 2024. Nota inc. All rights reserved.

Any code statements to execute NetsPresso optimization process belongs to Nota inc.



<div style="border-bottom: 3px solid #273B5F">
<h2>Table of content</h2>
<ul style="list-style-type: none">
  <li><a href="#Setup">1. Setup Instructions</a></li>

<li><a href="#Prep">2. Preparing the Baseline Model</a></li>
    <ul style="list-style-type: none">
    <li><a href="#Training">2.1 Training, Quantization, and Evaluation</a></li>
    <li><a href="#Benchmarking">2.2 Benchmarking the Baseline Model</a></li>
  </ul>
<li><a href="#Pruning the Model with NetsPresso Compressor">3. Pruning the Model with NetsPresso Compressor</a></li>
<li><a href="#Fine-Tuning">4. Fine-Tuning the Pruned Model</a></li>
<li><a href="#Quantizing">5. Quantizing the Pruned Model</a></li>
<li><a href="#Evaluating">6. Evaluating the Pruned Model</a></li>
<li><a href="#Benchmarking-the-Pruned">7. Benchmarking the Pruned Model on STM32Cube.AI Developer Cloud</a></li>        
<li><a href="#Comparing">8. Comparing NetsPresso Pruned Model with Baseline Model</a></li>    
  </ul>
</ul>
</div>

<div id="Setup">
    <h2>1. Setup Instructions</h2>
</div>

In this notebook, we will apply structured pruning on an image classification model using [NetsPresso](https://netspresso.ai/) and train, quantize, and benchmark the model using the [STM32 model zoo](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main). 

The STM32 model zoo valuable resource, accessible on GitHub, offers a range of use cases such as image classification, object detection, audio event detection, hand posture, and human activity recognition. It provides numerous services, including training, evaluation, prediction, deployment, quantization, benchmarking, and chained services, such as chain_tbqeb, chain_tqe, chain_eqe, chain_qb, chain_eqeb, and chain_qd, which are thoroughly explained in their respective readmes.

To get started, you'll need to add the stm32ai model zoo repository as a submodule by running the code below:

In [None]:
import os
os.chdir('..')
!git clone https://github.com/STMicroelectronics/stm32ai-modelzoo.git

After running the code above, navigate to the stm32ai-modelzoo repository and install the required libraries by running the code below:

In [None]:
os.chdir('stm32ai-modelzoo')
!pip install -r requirements.txt netspresso== 1.3.1

In this notebook, we will be utilizing the various services of the image classification service. To do so, we must navigate to the image classification source by running the code section below and use the `stm32ai_main.py` script in conjunction with a YAML file in the next sections. 

In [None]:
os.chdir('image_classification/src')

<div id="Prep">
    <h2>2. Preparing the Baseline Model</h2>
</div>

<div id="Training">
    <h3>2.1 Training, Quantization, and Evaluation</h3>
</div>

In this section, we will be training, quantizing, and evaluating a classification model to consider it as a baseline model. We will use the MobileNetV2 model from the Model Zoo and the [tf_flowers](https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz) classification dataset as an example. To achieve this, we will be using the `chain_tqe_config.yaml` file to specify the service and the set of configuration parameters such as the model, the dataset, the number of epochs, and the preprocessing parameters, among others. Please feel free to review and adjust the training parameters as needed. If you have already trained your own model, you may proceed to the next section and skip this one.

In [None]:
%run stm32ai_main.py --config-path  ../../../image_classification/config_files  --config-name chain_tqe_config.yaml

All of the training and evaluation artifacts are saved in the current output simulation directory, which is located at **experiments_outputs/baseline_training**. The trained model is also saved in **experiments_outputs/baseline_training/best_model** directory and is called `best_model.h5` which will be used in the next steps of the notebook. You can retrieve the confusion matrix generated after evaluating the float and the quantized model on the test set by navigating to the appropriate directory

<div id="Benchmarking">
    <h3>2.2 Benchmarking the Baseline Model</h2>
</div>

In this section we use the [STM32Cube.AI Developer Cloud](https://stm32ai-cs.st.com/home) to benchmark the baseline model on the **STM32H747I-DISCO** board.

If you are behind a proxy, you can uncomment and fill the following proxy settings.

**NOTE** : If the password contains some special characters like `@`, `:` etc. they need to be url-encoded with their ASCII values.

In [None]:
# os.environ['http_proxy'] = "http://user:passwd@ip_address:port"
# os.environ['https_proxy'] = "https://user:passwd@ip_address:port"
# And eventually disable SSL verification
# os.environ['NO_SSL_VERIFY'] = "1"


Set environment variables with your credentials to access STM32Cube.AI Developer Cloud. If you don't have an account yet go to : https://stm32ai-cs.st.com/home and click on sign in to create an account. Then set the environment variables below with your credentials.

In [None]:
import getpass

email ='xxx.yyy@st.com'
os.environ['stmai_username'] = email
print('Enter you password')
password = getpass.getpass()
os.environ['stmai_password'] = password
os.environ['NO_SSL_VERIFY'] = "1"

We will be using the `benchmarking_config.yaml` file to measure the performance of the baseline model on the **STM32H747I-DISCO**.

In [None]:
%run stm32ai_main.py --config-path  ../../../image_classification/config_files  --config-name baseline_benchmarking_config.yaml

<div id="Pruning">
    <h2>3. Pruning the Model with NetsPresso Compressor</h2>
</div>

To use NetsPresso python package, first sign up for the [NetsPresso website](https://netspresso.ai). Then, please enter the email and password that you registered with in the code section below.


In [None]:
import getpass
from netspresso import NetsPresso

#log to the NetsPresso server

email ='xxx.yyy@st.com'
print('Enter you password') 
password = getpass.getpass()

# Creating an instance of NetsPresso using the email and password entered by the user for connecting to the NetsPresso service.
netspresso = NetsPresso(email=email, password=password)

# Creating an instance of Compressor using the created NetsPresso instance.
compressor = netspresso.compressor()

To load your trained model, simply enter the required information.
To perform advanced compression, please enter the required parameters. You can find a description of each parameter in the table below.

<table>
<tr>
<th style="text-align: left">Option</th>
<th style="text-align: left">Description</th>

</tr>

<tr>
<td style="text-align: left">compression_method
   
<td style="text-align: left">The selected compression method, example: <strong>'PR_L2'</strong> for L2 Norm Pruning or <strong>'PR_GM'</strong> for GM pruning </td>
</tr>  
 
<tr>
<td style="text-align: left">recommendation_method</td>
<td style="text-align: left">The selected recommendation method, the method to consider rge layer wise importance for structured pruning, example: <strong>'SLAMP'</strong>
 for Structured Layer-adaptive Sparsity for the Magnitude-based Pruning or <strong>'VBMF'</strong> for Variational Bayesian Matrix Factorization </td>
</tr>

<tr>
<td style="text-align: left">recommendation_ratio</td>
<td style="text-align: left">The compression ratio recommended by the recommendation method, refers to the amount of the filters considering layer wise importance  </td>
</tr>

<tr>
<td style="text-align: left">input_model_path</td>
<td style="text-align: left">The file path where the model is located</td>
</tr>

<tr>
<td style="text-align: left">output_dir</td>
<td style="text-align: left">The local path to save the compressed model</td>
</tr>

<tr>
<td style="text-align: left">framework   
<td style="text-align: left">The framework of the model</td>
</tr>  


<tr>
<td style="text-align: left">input_shapes</td>
<td style="text-align: left">Input shapes of the model</td>
</tr>

<tr>
<td style="text-align: left">options</td>
<td style="text-align: left">The options for pruning method</td>
</tr>
    
</table>

In [None]:
import os
from netspresso.enums import Framework, CompressionMethod, RecommendationMethod

# Set the path of the saved model file
baseline_model_path = '../../../image_classification/experiments_outputs/baseline_training/saved_models/best_model.h5'

# Set the path of the compressed model directory path
pruned_model_path = '../../../image_classification/experiments_outputs/pruned_models/PR_L2_0.5'

# Run recommendation compression
compression_result = compressor.recommendation_compression(
    compression_method=CompressionMethod.PR_L2,
    recommendation_method=RecommendationMethod.SLAMP,
    recommendation_ratio=0.5,
    input_model_path=baseline_model_path,
    output_dir=pruned_model_path,
    framework=Framework.TENSORFLOW_KERAS,
    input_shapes=[{"batch": 1, "channel": 3, "dimension": [224, 224]}],
)

<div id="Fine-Tuning">
    <h2>4. Fine-Tuning the Pruned Model</h2>
</div>

Fine-tuning is a crucial step to address the slight reduction in accuracy that can occur in pruned models. This involves adjusting the remaining filters by re-training the model on the original or a similar dataset. By optimizing the model's performance through fine-tuning, the accuracy of the pruned model can be restored to match or even exceed that of the original model.

To fine-tune  the pruned model generated by NetsPresso Compressor, you will need to utilize `stm32ai_main.py` and the `fine_tuning_config.yaml` file. This configuration file enables you to tailor the fine-tuning process by adjusting parameters such as `frozen_layer`, `epochs`, and `learning_rate`. It's crucial to select these parameters thoughtfully since they can significantly influence the model's performance. Additionally, it's essential to ensure that you use the same `global_seed`, `seed`, and `validation_split` in `dataset` section of the yaml file as you did during the training of the baseline model. This precautionary measure will prevent any leakage of training data into the validation. Also, it's important to ensure that `model_path` matches `pruned_model_path`, where the compressed version of the model will be saved, so that both paths should point to the same local directory.

After fine-tuning, you can find the model saved under the **experiments_outputs/pruned_fine_tuning/best_model** folder, along with the training curves, confusion matrices, and log file. 

In [None]:
%run stm32ai_main.py --config-path  ../../../image_classification/config_files  --config-name fine_tuning_config.yaml

<div id="Quantizing">
    <h2>5. Quantizing the Pruned Model</h2>
</div>

In this section, we will quantize the pruned float32 model to an int8 quantized tflite model. Quantization is a technique used to reduce the memory and computation requirements of a model by converting the weights and activations from float32 to int8.

To perform quantization, you will need to use the configuration file `quantization_config.yaml` along with the `stm32ai_main.py` script. The configuration file specifies the `quantization_dataset_path` and the quantization parameters, such as the `quantization_input_type` and `quantization_output_type`. 

After running the `stm32ai_main.py` script with the `quantization_config.yaml` file, an int8 quantized tflite model will be generate and saved under **experiments_outputs/pruned_quantization/quantized_models**.

In [None]:
%run stm32ai_main.py --config-path  ../../../image_classification/config_files  --config-name quantization_config.yaml

<div id="Evaluating">
    <h2>6. Evaluating the Pruned Model</h2>
</div>

At this stage of the notebook, we have the model quantized and pruned. By running the code section below we can evaluate it after updating the `evaluation_config.yaml`.

In [None]:
%run stm32ai_main.py --config-path  ../../../image_classification/config_files  --config-name evaluation_config.yaml

<div id="Benchmarking">
        <h2>7. Benchmarking the Pruned Model on STM32Cube.AI Developer Cloud</h2>
</div>

After analyzing the effect of NetsPresso on accuracy, we will use the [STM32Cube.AI Developer Cloud](https://stm32ai-cs.st.com/home) in this section to compare the performance of the compressed neural network with the original model. We will benchmark the model on the **STM32H747I-DISCO** target, as we did with the baseline model, to determine the performance and verify the effectiveness of the NetsPresso Compressor on the MobileNet v2 model.

To set the configuration parameters, we need to update the `pruned_benchmarking_config.yaml` file to specify the path of the quantized pruned model and the name of the `board`.

In [None]:
%run stm32ai_main.py --config-path  ../../../image_classification/config_files  --config-name pruned_benchmarking_config.yaml 

<div id="Comparing">
        <h2> 8. Comparing NetsPresso Pruned Model with Baseline Model</h2>
</div>

This last section compares the performance of the baseline model with the NetsPresso compressed model by presenting a comparison table.
By executing the final two sections, a table will be displayed that compares the quantized baseline model with the pruned quantized model using NetsPresso. The table will show the differences in terms of inference time, validation accuracy, RAM, and flash memory.

In [None]:
import sys
import os

#Change current working directory to 
os.chdir('../../../')

sys.path.append(os.path.relpath('utils'))
from utils import parse_logs_and_display_results

In [None]:
# Define file paths for the log files  
base_path = 'image_classification/experiments_outputs/'
baseline_training = os.path.relpath(base_path + 'baseline_training/stm32ai_main.log')
baseline_benchmarking = os.path.relpath(base_path + 'baseline_benchmarking/stm32ai_main.log')
baseline_evaluation = os.path.relpath(base_path + 'pruned_evaluation/stm32ai_main.log')
pruned_benchmarking = os.path.relpath(base_path + 'pruned_benchmarking/stm32ai_main.log')

# Call function to parse logs and display results in a table 
parse_logs_and_display_results(baseline_training, baseline_benchmarking, baseline_evaluation, pruned_benchmarking)