# Intel® Neural Compressor Sample for Tensorflow

## Introduction

This is a demo to show an End-To-End pipeline to speed up AI model by Intel® Neural Compressor.

1. Train a CNN AlexNet model by Keras and Intel Optimization for Tensorflow based on dataset MNIST.

2. Quantize the frozen PB model file by Intel® Neural Compressor to INT8 model.

3. Test and compare the performance of FP32 and INT8 model by same script.


## Code
Please refer to [README.md](README.md).

## Create/Edit script (Optional: run_inc_ft_mnist_sample.sh is in same folder)

In [None]:
%%writefile run_inc_ft_mnist_sample.sh
#!/bin/bash

echo "Enable Conda Env."
source /glob/development-tools/versions/oneapi/2022.1.1/oneapi/intelpython/python3.9/etc/profile.d/conda.sh
#conda activate user_tensorflow
conda activate /data/oneapi_workshop/INC

echo "Train Model by Keras/Tensorflow with MNIST"
python keras_tf_train_mnist.py

FP32_FILE="fp32_frozen.pb"
if [ ! -f $FP32_FILE ]; then
    echo "$FP32_FILE not exists."
    echo "Train AlexNet model is fault, exit!"
    exit 1
else
    echo "Training is finished"
fi

echo "Enable Intel Optimization for Tensorflow by exporting TF_ENABLE_MKL_NATIVE_FORMAT=0"
echo "Intel Optimized TensorFlow 2.5.0 and later require to set environment variable TF_ENABLE_MKL_NATIVE_FORMAT=0 before running Intel® Neural Compressor quantize Fp32 model or deploying the quantized model."

export TF_ENABLE_MKL_NATIVE_FORMAT=0

echo "Quantize Model by Intel Neural Compressor"
python inc_quantize_model.py

INT8_FILE="alexnet_int8_model.pb"
if [ ! -f $INT8_FILE ]; then
    echo "$INT8_FILE not exists."
    echo "Quantize FP32 model is fault, exit!"
    exit 1
else
    echo "Quantization is finished"
fi

echo "Execute the profiling_inc.py with FP32 model file"
python profiling_inc.py --input-graph=./fp32_frozen.pb --omp-num-threads=4 --num-inter-threads=1 --num-intra-threads=4 --index=32
echo "FP32 performance test is finished"

echo "Execute the profiling_inc.py with INT8 model file"
python profiling_inc.py --input-graph=./alexnet_int8_model.pb --omp-num-threads=4 --num-inter-threads=1 --num-intra-threads=4 --index=8
echo "INT8 performance test is finished"

echo "Compare the Performance of FP32 and INT8 Models"
python compare_perf.py
echo "Please check the PNG files to see the performance!"

if [[ $? -eq 0 ]]
then
  echo "This demo is finished successfully!"
else
  echo "This demo is fault!"
fi

echo "Thank you!"

## Check Script

In [None]:
!cat run_inc_ft_mnist_sample.sh

## Prepare Running Environment

Please refer to [README.md](README.md).

### Remove all old output files (Optional)

In [None]:
!rm -rf run_inc_ft_mnist_sample.sh.*

## Run in Intel® DevCloud

Job submit to compute node with the property 'clx' or 'icx' or 'spr' which support Intel® Deep Learning Boost (avx512_vnni).

In [None]:
!qsub run_inc_ft_mnist_sample.sh -d `pwd` -l nodes=1:icx:ppn=2

Check job status

In [None]:
!qstat

### Check Result

#### Check Result in Log File
Check the latest created log file with prefix: **run_inc_ft_mnist_sample.sh.o**

In [None]:
!tail -23 `ls -lAtr run_inc_ft_mnist_sample.sh.o* |  tail -1 | awk '{print $9}'`

Check any existed log file, for example **run_inc_ft_mnist_sample.sh.o1842343**

In [None]:
!tail -23 run_inc_ft_mnist_sample.sh.o1842343

#### Check Result in PNG file

In [None]:
from IPython.display import Image, display

listOfImageNames = ['fp32_int8_aboslute.png',
                    'fp32_int8_times.png']

for imageName in listOfImageNames:
    display(Image(filename=imageName))

## Run in Customer Server or Cloud

Note, it's recommended to use 2nd Generation Intel® Xeon® Scalable Processors or newer to get better performance improvement.

### Run in Jupyter Notebook


In [None]:
run_inc_ft_mnist_sample.sh

### Check Result

#### Check Result in Screen Output

```
...

Compare the Performance of FP32 and INT8 Models
Model            FP32                     INT8                    
throughput(fps)  572.4982883964987        3218.52236638019        
latency(ms)      2.8339174329018104       1.9863116497896156      
accuracy(%)      0.9799                   0.9796                  

Save to fp32_int8_aboslute.png

Model            FP32                     INT8                    
throughput_times 1                        5.621889936815179       
latency_times    1                        0.7009066766478504      
accuracy_diff(%) 0                        -0.029999999999986926   

Save to fp32_int8_times.png
Please check the PNG files to see the performance!
This demo is finished successfully!
Thank you!

########################################################################
# End of output for job 1842253.v-qsvr-1.aidevcloud
# Date: Thu 27 Jan 2022 07:05:52 PM PST
########################################################################

...

```
#### Check Result in PNG file

The demo creates figure files: fp32_int8_aboslute.png, fp32_int8_times.png to show performance bar. They could be used in report.

In [None]:
from IPython.display import Image, display

listOfImageNames = ['fp32_int8_aboslute.png',
                    'fp32_int8_times.png']

for imageName in listOfImageNames:
    display(Image(filename=imageName))