<h1 style="font-size:30px;">Deploying the ASL Classifier to Vitis-AI</h1>  

This notebook describes how to quantize and compile a TensorFlow2 model with Vitis-AI for deployment.

<img src='./images/VGG16_06_asl_fine_tuning.png' width=1000 align='center'><br/>

## Table of Contents
* [1 System Configuration](#1-System-Configuration)
* [2 Download and Extract the Dataset](#2-Download-and-Extract-the-Dataset)
* [3 Dataset Configuration](#3-Dataset-Configuration)
* [4 Quantization](#4-Quantization)
* [5 Compilation](#5-Compilation)
* [6 Conclusion](#6-Conclusion)


In [1]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import random
import numpy as np
import os
import matplotlib.pyplot as plt
import cv2
import zipfile
import requests
import glob as glob

from tensorflow.keras.utils import image_dataset_from_directory

from matplotlib.ticker import (MultipleLocator, FormatStrFormatter)
from dataclasses import dataclass 

block_plot = False
plt.rcParams['figure.figsize'] = (12, 9)
SEED_VALUE = 42 

2023-04-26 20:43:05.782217: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-26 20:43:05.912263: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.


In [2]:
print("tensorflow version : ",tf.__version__)
print("tensorflow version : ",keras.__version__)
print("opencv version : ",cv2.__version__)

tensorflow version :  2.10.0
tensorflow version :  2.10.0
opencv version :  4.6.0


## 1 System Configuration

In [3]:
def system_config():
    
    # Get list of GPUs.
    gpu_devices = tf.config.list_physical_devices('GPU')
    print(gpu_devices)
    
    if len(gpu_devices) > 0:
        print('Using GPU')
        os.environ["CUDA_VISIBLE_DEVICES"] = '0'
        os.environ['TF_CUDNN_DETERMINISTIC'] = '1' 
        
        # If there are any gpu devices, use first gpu.
        tf.config.experimental.set_visible_devices(gpu_devices[0], 'GPU')
        
        # Grow the memory usage as it is needed by the process.
        tf.config.experimental.set_memory_growth(gpu_devices[0], True)
        
        # Enable using cudNN.
        os.environ['TF_USE_CUDNN'] = "true"
    else:
        print('Using CPU')

system_config()

[]
Using CPU


## 2 Download and Extract the Dataset

In [4]:
def download_file(url, save_name):
    url = url
    file = requests.get(url)

    open(save_name, 'wb').write(file.content)

In [5]:
def unzip(zip_file=None):
    try:
        with zipfile.ZipFile(zip_file) as z:
            z.extractall("./")
            print("Extracted all")
    except:
        print("Invalid file")

In [6]:
#download_file(
#    'https://github.com/AlbertaBeef/asl_tutorial/releases/download/vitis_ai_3.0_version2/dataset_ASL_reduced.zip?dl=1', 
#    'dataset_ASL_reduced.zip'
#)
    
#unzip(zip_file='dataset_ASL_reduced.zip')

## 3 Dataset and Training Configuration

In [7]:
@dataclass(frozen=True)
class DatasetConfig:
    NUM_CLASSES: int = 29
    IMG_HEIGHT:  int = 224
    IMG_WIDTH:   int = 224
    CHANNELS:    int = 3
    BATCH_SIZE:  int = 32
    TRAINING_DATA_ROOT:   str = './dataset_ASL_reduced/training'
    VALIDATION_DATA_ROOT:   str = './dataset_ASL_reduced/validation'
        
@dataclass(frozen=True)
class TrainingConfig:
    BATCH_SIZE:     int   = 32
    EPOCHS:         int   = 51
    LEARNING_RATE:  float = 0.0001
    CHECKPOINT_DIR: str   = './saved_models_asl_classifier3'

### 3.1 Prepare the Training and Validation Dataset

In [8]:
train_dataset = image_dataset_from_directory(directory=DatasetConfig.TRAINING_DATA_ROOT,
                                             batch_size=TrainingConfig.BATCH_SIZE,
                                             shuffle=True,
                                             seed=SEED_VALUE,
                                             label_mode='categorical',
                                             image_size=(DatasetConfig.IMG_WIDTH, DatasetConfig.IMG_HEIGHT),
                                            )

valid_dataset = image_dataset_from_directory(directory=DatasetConfig.VALIDATION_DATA_ROOT,
                                             batch_size=TrainingConfig.BATCH_SIZE,
                                             shuffle=True,
                                             seed=SEED_VALUE,
                                             label_mode='categorical',
                                             image_size=(DatasetConfig.IMG_WIDTH, DatasetConfig.IMG_HEIGHT),
                                            )

Found 5800 files belonging to 29 classes.


2023-04-26 20:43:08.692487: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Found 1450 files belonging to 29 classes.


## 4 Quantization

**Load model**

Load model for the rest of the tutorial with the `load_model` method.

In [9]:

model = keras.models.load_model('tf2_asl_classifier13.h5')


In order to compile the trained model for deployment on a DPU platform, we must first quantize it. Here we will use the `vitis_quantize` module to convert the floating point model into an INT8 quantized representation. 

In [10]:
from tensorflow_model_optimization.quantization.keras import vitis_quantize

**Quantize model**

By default the `quantize_model` function converts the weights, activations and inputs into 8-bit wide numbers. We can specify different values and configurations using `weight_bit`, `activation_bit` and other parameters. 

In [11]:
quantizer = vitis_quantize.VitisQuantizer(model)
quantized_model = quantizer.quantize_model(calib_dataset=valid_dataset, weight_bit=8, activation_bit=8)

[VAI INFO] Update activation_bit: 8
[VAI INFO] Update weight_bit: 8
[VAI INFO] Quantizing without specific `target`.
[VAI INFO] Start CrossLayerEqualization...
[VAI INFO] CrossLayerEqualization Done.
[VAI INFO] Start Quantize Calibration...
[VAI INFO] Quantize Calibration Done.
[VAI INFO] Start Post-Quant Model Refinement...
[VAI INFO] Start Quantize Position Ajustment...
[VAI INFO] Quantize Position Ajustment Done.
[VAI INFO] Post-Quant Model Refninement Done.
[VAI INFO] Start Model Finalization...
[VAI INFO] Model Finalization Done.
[VAI INFO] Quantization Finished.


**Evaluate quantized model**

In order to evaluate the quantized model, it needs to be re-compiled with the desired loss and evaluation metrics, such as accuracy. Since we are using 8-bit quantization we do not lose much performance, if at all.

In [12]:
quantized_model.compile(loss='categorical_crossentropy', metrics=["accuracy"])

In [20]:
print(f"Model evaluation accuracy (training dataset): {quantized_model.evaluate(train_dataset)[1]*100.:.3f}")

Model evaluation accuracy (training dataset): 100.000


In [13]:
print(f"Model evaluation accuracy (validation dataset): {quantized_model.evaluate(valid_dataset)[1]*100.:.3f}")

Model evaluation accuracy: 99.310


**Save quantized model**

Once we are happy with the performance of the quantized model, we can save it as a .h5 file, simply using the `save` method.

In [14]:
quantized_model.save('tf2_asl_classifier13_quantized.h5')

## 5 Compilation

For this final step we use the Vitis AI compiler `vai_c_tensorflow2` and pass the quantized model as a parameter. 

The target platform (ie. specific DPU architecture) is defined by .arch file.

To support as many platforms as possible, we compile for the following DPU architectures:
- B4096 (ZCU102, ZCU104, UltraZed-EV)
- B3136 (KV260)
- B2304 (Ultra96-V2)
- B1152 (Ultra96-V2+DualCam)
-  B512 (ZUBoard)
-  B128 (ZUBoard+DualCam)

In [15]:
!vai_c_tensorflow2 \
    --model ./tf2_asl_classifier13_quantized.h5 \
    --arch ./arch/B4096/arch-zcu104.json \
    --output_dir ./model_vgg16/B4096/ \
    --net_name asl_classifier

!vai_c_tensorflow2 \
    --model ./tf2_asl_classifier13_quantized.h5 \
    --arch ./arch/B3136/arch-kv260.json \
    --output_dir ./model_vgg16/B3136/ \
    --net_name asl_classifier

!vai_c_tensorflow2 \
    --model ./tf2_asl_classifier13_quantized.h5 \
    --arch ./arch/B2304/arch-b2304-lr.json \
    --output_dir ./model_vgg16/B2304/ \
    --net_name asl_classifier

!vai_c_tensorflow2 \
    --model ./tf2_asl_classifier13_quantized.h5 \
    --arch ./arch/B1152/arch-b1152-hr.json \
    --output_dir ./model_vgg16/B1152/ \
    --net_name asl_classifier

!vai_c_tensorflow2 \
    --model ./tf2_asl_classifier13_quantized.h5 \
    --arch ./arch/B512/arch-b512-lr.json \
    --output_dir ./model_vgg16/B512/ \
    --net_name asl_classifier

!vai_c_tensorflow2 \
    --model ./tf2_asl_classifier13_quantized.h5 \
    --arch ./arch/B128/arch-b128-lr.json \
    --output_dir ./model_vgg16/B128/ \
    --net_name asl_classifier


**************************************************
* VITIS_AI Compilation - Xilinx Inc.
**************************************************
[INFO] Namespace(batchsize=1, inputs_shape=None, layout='NHWC', model_files=['./tf2_asl_classifier_quantized.h5'], model_type='tensorflow2', named_inputs_shape=None, out_filename='/tmp/asl_classifier_DPUCZDX8G_ISA1_B4096_org.xmodel', proto=None)
[INFO] tensorflow2 model: /workspace/tf2_asl_classifier_quantized.h5
[INFO] keras version: 2.10.0
[INFO] Tensorflow Keras model type: functional
[INFO] parse raw model     :100%|█| 39/39 [00:00<00:00, 24867.42it/s]           
[INFO] infer shape (NHWC)  :100%|█| 64/64 [00:00<00:00, 884.61it/s]             
[INFO] perform level-0 opt :100%|█| 1/1 [00:00<00:00, 181.99it/s]               
[INFO] perform level-1 opt :100%|█| 2/2 [00:00<00:00, 760.25it/s]               
[INFO] generate xmodel     :100%|█| 64/64 [00:00<00:00, 255.47it/s]             
[INFO] dump xmodel: /tmp/asl_classifier_DPUCZDX8G_ISA1_B4096_org.

[INFO] Namespace(batchsize=1, inputs_shape=None, layout='NHWC', model_files=['./tf2_asl_classifier_quantized.h5'], model_type='tensorflow2', named_inputs_shape=None, out_filename='/tmp/asl_classifier_0x101000002010208_org.xmodel', proto=None)
[INFO] tensorflow2 model: /workspace/tf2_asl_classifier_quantized.h5
[INFO] keras version: 2.10.0
[INFO] Tensorflow Keras model type: functional
[INFO] parse raw model     :100%|█| 39/39 [00:00<00:00, 24848.53it/s]           
[INFO] infer shape (NHWC)  :100%|█| 64/64 [00:00<00:00, 886.08it/s]             
[INFO] perform level-0 opt :100%|█| 1/1 [00:00<00:00, 190.39it/s]               
[INFO] perform level-1 opt :100%|█| 2/2 [00:00<00:00, 761.77it/s]               
[INFO] generate xmodel     :100%|█| 64/64 [00:00<00:00, 259.20it/s]             
[INFO] dump xmodel: /tmp/asl_classifier_0x101000002010208_org.xmodel
[UNILOG][INFO] Compile mode: dpu
[UNILOG][INFO] Debug mode: null
[UNILOG][INFO] Target architecture: DPUCZDX8G_ISA1_B128_0101000002010208


In [16]:
print(train_dataset.class_names)

['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'del', 'nothing', 'space']


### Generate test-images

In [21]:
output_dir = './test-images'

   
def generate_test_images(dataset, checkpoint_dir=None, checkpoint_version=0):
    
    if not checkpoint_dir:
        checkpoint_dir = os.path.join(os.getcwd(), TrainingConfig.checkpoint_dir, f"version_{checkpoint_version}")

    if not os.path.exists(output_dir):
        os.mkdir(output_dir)
         
    # Load saved model.
    model = tf.keras.models.load_model(checkpoint_dir)
    
    num_test_images = 850
    class_names = dataset.class_names
    jdx = 0
    
    # Evaluate all the batches.
    for image_batch, labels_batch in dataset:
        
        # Predictions for the current batch.
        predictions = model.predict(image_batch)
        
        # Loop over all the images in the current batch.
        for idx in range(len(labels_batch)):
            
            pred_idx = tf.argmax(predictions[idx]).numpy()
            truth_idx = np.nonzero(labels_batch[idx].numpy())
            
            # Save the images with correct predictions
            if pred_idx == truth_idx:
                
                jdx += 1
                
                if jdx > num_test_images:
                    # Break from the loops if the maximum number of images have been plotted
                    break
                
                image = image_batch[idx].numpy().astype("uint8")
                image_dst = output_dir+"/test%04d"%(jdx)+'_'+str(pred_idx)+'_'+str(class_names[pred_idx])+'.png'
                if not os.path.exists(image_dst):
                    print(image_dst)
                    cv2.imwrite(image_dst, image )
            
    return  

In [22]:
generate_test_images(valid_dataset, TrainingConfig.CHECKPOINT_DIR)

./test-images/test0001_16_Q.png
./test-images/test0002_14_O.png
./test-images/test0003_7_H.png
./test-images/test0004_27_nothing.png
./test-images/test0005_12_M.png
./test-images/test0006_17_R.png
./test-images/test0007_3_D.png
./test-images/test0008_0_A.png
./test-images/test0009_21_V.png
./test-images/test0010_19_T.png
./test-images/test0011_18_S.png
./test-images/test0012_11_L.png
./test-images/test0013_4_E.png
./test-images/test0014_20_U.png
./test-images/test0015_28_space.png
./test-images/test0016_20_U.png
./test-images/test0017_20_U.png
./test-images/test0018_8_I.png
./test-images/test0019_0_A.png
./test-images/test0020_16_Q.png
./test-images/test0021_21_V.png
./test-images/test0022_15_P.png
./test-images/test0023_17_R.png
./test-images/test0024_18_S.png
./test-images/test0025_11_L.png
./test-images/test0026_9_J.png
./test-images/test0027_17_R.png
./test-images/test0028_27_nothing.png
./test-images/test0029_1_B.png
./test-images/test0030_15_P.png
./test-images/test0031_19_T.png


./test-images/test0255_26_del.png
./test-images/test0256_14_O.png
./test-images/test0257_14_O.png
./test-images/test0258_15_P.png
./test-images/test0259_5_F.png
./test-images/test0260_16_Q.png
./test-images/test0261_15_P.png
./test-images/test0262_19_T.png
./test-images/test0263_22_W.png
./test-images/test0264_5_F.png
./test-images/test0265_21_V.png
./test-images/test0266_5_F.png
./test-images/test0267_20_U.png
./test-images/test0268_1_B.png
./test-images/test0269_26_del.png
./test-images/test0270_11_L.png
./test-images/test0271_2_C.png
./test-images/test0272_18_S.png
./test-images/test0273_8_I.png
./test-images/test0274_3_D.png
./test-images/test0275_22_W.png
./test-images/test0276_28_space.png
./test-images/test0277_0_A.png
./test-images/test0278_4_E.png
./test-images/test0279_28_space.png
./test-images/test0280_18_S.png
./test-images/test0281_9_J.png
./test-images/test0282_3_D.png
./test-images/test0283_17_R.png
./test-images/test0284_21_V.png
./test-images/test0285_18_S.png
./test-

./test-images/test0509_3_D.png
./test-images/test0510_4_E.png
./test-images/test0511_1_B.png
./test-images/test0512_8_I.png
./test-images/test0513_22_W.png
./test-images/test0514_7_H.png
./test-images/test0515_17_R.png
./test-images/test0516_10_K.png
./test-images/test0517_13_N.png
./test-images/test0518_5_F.png
./test-images/test0519_0_A.png
./test-images/test0520_24_Y.png
./test-images/test0521_8_I.png
./test-images/test0522_14_O.png
./test-images/test0523_28_space.png
./test-images/test0524_18_S.png
./test-images/test0525_23_X.png
./test-images/test0526_25_Z.png
./test-images/test0527_21_V.png
./test-images/test0528_7_H.png
./test-images/test0529_0_A.png
./test-images/test0530_4_E.png
./test-images/test0531_11_L.png
./test-images/test0532_25_Z.png
./test-images/test0533_2_C.png
./test-images/test0534_20_U.png
./test-images/test0535_24_Y.png
./test-images/test0536_26_del.png
./test-images/test0537_26_del.png
./test-images/test0538_10_K.png
./test-images/test0539_4_E.png
./test-images

./test-images/test0762_4_E.png
./test-images/test0763_19_T.png
./test-images/test0764_7_H.png
./test-images/test0765_28_space.png
./test-images/test0766_7_H.png
./test-images/test0767_27_nothing.png
./test-images/test0768_6_G.png
./test-images/test0769_22_W.png
./test-images/test0770_2_C.png
./test-images/test0771_0_A.png
./test-images/test0772_6_G.png
./test-images/test0773_8_I.png
./test-images/test0774_28_space.png
./test-images/test0775_19_T.png
./test-images/test0776_27_nothing.png
./test-images/test0777_25_Z.png
./test-images/test0778_12_M.png
./test-images/test0779_3_D.png
./test-images/test0780_27_nothing.png
./test-images/test0781_27_nothing.png
./test-images/test0782_6_G.png
./test-images/test0783_28_space.png
./test-images/test0784_17_R.png
./test-images/test0785_5_F.png
./test-images/test0786_3_D.png
./test-images/test0787_13_N.png
./test-images/test0788_27_nothing.png
./test-images/test0789_10_K.png
./test-images/test0790_27_nothing.png
./test-images/test0791_21_V.png
./te

## 6 Conclusion

In this notebook, we showed how to quantize and compile a TensorFlow2 model with Vitis-AI for deployment on AMD Zynq-UltraScale+ devices. 