### ADLS Proj: TensorRT with MASE for Multiple Precision Inference

This notebook demonstrates the integration of TensorRT passes into MASE as part of the MASERT framework.

Currently, our experiments are conducted on RTX 4060 and RTX 3070 GPUs, as our request for A100 access is still pending.

### Objective
Our goal is to plot trade-off curves that analyze the relationship between different variables, including:
- **GPU Type** (e.g., RTX 4060, RTX 3070, and A100 when available)
- **Dataset** (e.g., CIFAR-10)
- **Model Type** (e.g., ResNet18, ResNet50, VGG, AlexNet ...)
- **Precision vs. Runtime Trade-off** (FP32, FP16, INT8)

At this stage, we have successfully implemented inference using multiple models, such as **ResNet18 and ResNet50**, on the **CIFAR-10 dataset**. Further experiments will explore the precision-runtime trade-off across different GPU architectures.


### Training the Model for Quantization Experiments

In this section, we train an original model of a target model type. The trained model will later serve as a baseline for different precision quantization experiments, including FP32, FP16, and INT8. This process helps in evaluating the trade-offs between model accuracy and runtime efficiency across different GPU architectures.

#### Running the Training Script

To train the model, execute the following command:

```bash
!python3 ./ch train --config /workspace/ADLS_Proj/docs/tutorials/tensorrt/resnet18_INT8_quantization_by_type.toml



In [3]:
!python3 ./ch train --config /workspace/ADLS_Proj/docs/tutorials/tensorrt/resnet18_INT8_quantization_by_type.toml

INFO: Seed set to 0
I0311 00:15:51.512130 140276942296128 seed.py:57] Seed set to 0
+-------------------------+--------------------------+--------------+-----------------+--------------------------+
| Name                    |         Default          | Config. File | Manual Override |        Effective         |
+-------------------------+--------------------------+--------------+-----------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |     cls      |                 |           cls            |
| load_name               |           None           |              |                 |           None           |
| load_type               |            mz            |              |                 |            mz            |
| batch_size              |           [38;5;8m128[0m            |      64      |                 |            64            |
| to_debug                |          False           |              |                

### INT8 Quantization with TensorRT

This section explains the process of **INT8 quantization** using TensorRT within the MASE framework. The key steps include **fake quantization, calibration, fine-tuning, and generating a TensorRT engine**.

### Code Execution Flow

1. **Apply TensorRT Passes**
   - **Fake Quantization**: Inserts quantization simulation operations.
   - **Summarization**: Displays which layers were quantized.
   - **Calibration**: Uses calibration algorithms (e.g., histogram-based) to determine optimal quantization parameters.
   - **Fine-Tuning**: Adjusts parameters to recover accuracy loss after quantization.

2. **Generate the TensorRT Engine**
   - Calls `tensorrt_engine_interface_pass` to convert the optimized graph into a **TensorRT engine**.

3. **Benchmarking & Performance Analysis**
   - Runs inference tests with warm-up and batch evaluation to measure efficiency.


In [1]:
RES_INT8_BY_TYPE_TOML = "/workspace/ADLS_Proj/docs/tutorials/proj/resnet18_INT8_quant.toml"
RES_CHECKPOINT_PATH = "/workspace/ADLS_Proj/mase_output/resnet18_cls_cifar10_2025-03-08/software/training_ckpts/best.ckpt"
!python ch transform --config {RES_INT8_BY_TYPE_TOML} --load {RES_CHECKPOINT_PATH} --load-type pl

INFO: Seed set to 0
I0316 23:33:16.033470 139949971272768 seed.py:57] Seed set to 0
+-------------------------+--------------------------+--------------+--------------------------+--------------------------+
| Name                    |         Default          | Config. File |     Manual Override      |        Effective         |
+-------------------------+--------------------------+--------------+--------------------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |     cls      |                          |           cls            |
| load_name               |           [38;5;8mNone[0m           |              | /workspace/ADLS_Proj/mas | /workspace/ADLS_Proj/mas |
|                         |                          |              | e_output/resnet18_cls_ci | e_output/resnet18_cls_ci |
|                         |                          |              | far10_2025-03-08/softwar | far10_2025-03-08/softwar |
|                     

### FP16 Conversion with TensorRT

#### Overview
This section describes the process of converting a model to **FP16 precision** using TensorRT. Unlike **INT8 quantization**, **FP16 does not require calibration, fake quantization, or fine-tuning**. The conversion process is simpler and primarily focuses on **speeding up inference while maintaining high precision**.

### Code Execution Flow

1. **Apply TensorRT FP16 Pass**
   - **No Fake Quantization**: Since FP16 does not require quantization-aware training, the `quantize` option is set to `false`.
   - **No Calibration**: Unlike INT8, FP16 does not need calibration data, so `num_calibration_batches` is set to `0`.
   - **No Fine-Tuning**: Additional training is unnecessary in FP16 mode.

2. **Generate the TensorRT Engine**
   - Calls `tensorrt_engine_interface_pass` to convert the model to a **TensorRT FP16 engine**.

3. **Benchmarking & Performance Analysis**
   - Runs inference tests with warm-up and batch evaluation to measure efficiency.


In [2]:
RES_FP16_BY_TYPE_TOML = "/workspace/ADLS_Proj/docs/tutorials/tensorrt/resnet18_FP16_quant.toml"
RES_CHECKPOINT_PATH = "/workspace/ADLS_Proj/mase_output/resnet18_cls_cifar10_2025-03-08/software/training_ckpts/best.ckpt"
!python ch transform --config {RES_FP16_BY_TYPE_TOML} --load {RES_CHECKPOINT_PATH} --load-type pl

INFO: Seed set to 0
I0316 18:46:58.871801 140478726022208 seed.py:57] Seed set to 0
+-------------------------+--------------------------+--------------+--------------------------+--------------------------+
| Name                    |         Default          | Config. File |     Manual Override      |        Effective         |
+-------------------------+--------------------------+--------------+--------------------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |     cls      |                          |           cls            |
| load_name               |           [38;5;8mNone[0m           |              | /workspace/ADLS_Proj/mas | /workspace/ADLS_Proj/mas |
|                         |                          |              | e_output/resnet18_cls_ci | e_output/resnet18_cls_ci |
|                         |                          |              | far10_2025-03-08/softwar | far10_2025-03-08/softwar |
|                     

### FP32 Conversion with TensorRT

The process for converting a model to **FP32 precision** using TensorRT is quite similar to the **FP16 conversion**, but with even fewer modifications. Since FP32 is the default precision for deep learning models, the main goal here is to **leverage TensorRT optimizations** without changing the numerical format.

In [None]:
RES_FP32_BY_TYPE_TOML = "/workspace/ADLS_Proj/docs/tutorials/tensorrt/resnet18_FP32_quant.toml"
RES_CHECKPOINT_PATH = "/workspace/ADLS_Proj/mase_output/resnet18_cls_cifar10_2025-03-08/software/training_ckpts/best.ckpt"
!python ch transform --config {RES_FP32_BY_TYPE_TOML} --load {RES_CHECKPOINT_PATH} --load-type pl

INFO: Seed set to 0
I0315 01:43:15.535419 140335826768960 seed.py:57] Seed set to 0
+-------------------------+--------------------------+--------------+--------------------------+--------------------------+
| Name                    |         Default          | Config. File |     Manual Override      |        Effective         |
+-------------------------+--------------------------+--------------+--------------------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |     cls      |                          |           cls            |
| load_name               |           [38;5;8mNone[0m           |              | /workspace/ADLS_Proj/mas | /workspace/ADLS_Proj/mas |
|                         |                          |              | e_output/resnet18_cls_ci | e_output/resnet18_cls_ci |
|                         |                          |              | far10_2025-03-08/softwar | far10_2025-03-08/softwar |
|                     

### Results on RTX 4060 with ResNet18 and CIFAR-10

The following results were obtained while running **ResNet18 on CIFAR-10** using **TensorRT** on an **RTX 4060 GPU**. 

#### **Accuracy Comparison**
| Model Version   | Accuracy |
|----------------|----------|
| **Original (FP32)**  | **0.73**  |
| **After Quantization**  | **0.74**  |

- The slight accuracy **increase** after quantization is attributed to **QAT**.

#### **Latency Reduction**
| Precision Mode | Initial Latency | Optimized Latency |
|---------------|----------------|------------------|
| **FP32**      | 8.3ms            | **2.7ms**        |
| **FP16**      | 8.1ms            | **1.0ms**        |
| **INT8**      | **42.8ms**       | **4.5ms**        |

- **FP32 to FP16** significantly reduces latency, bringing it down to **1.0ms**.
- **INT8 inference** achieves **4.5ms latency**, but the initial unoptimized latency was **42.8ms**, 这很奇怪，怀疑是fake quantize操作.


input: Model:Resnet18 Restnet50 VGG

Variable: Precision: INT8 FP16 FP32 (Original) | batch size

Output: Original INT8 FP16 FP32 (Acc, Runtime)(Model, batch size) 

In [4]:
MUL_PRECISION_BY_TYPE_TOML = "/workspace/ADLS_Proj/docs/tutorials/proj/resnet18_Mul_quant.toml"
RES_CHECKPOINT_PATH = "/workspace/ADLS_Proj/mase_output/resnet18_cls_cifar10_2025-03-08/software/training_ckpts/best.ckpt"
!python ch transform --config {MUL_PRECISION_BY_TYPE_TOML} --load {RES_CHECKPOINT_PATH} --load-type pl

INFO: Seed set to 0
I0317 17:52:22.237135 140261090690112 seed.py:57] Seed set to 0
+-------------------------+--------------------------+----------------------+--------------------------+--------------------------+
| Name                    |         Default          |     Config. File     |     Manual Override      |        Effective         |
+-------------------------+--------------------------+----------------------+--------------------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |         cls          |                          |           cls            |
| load_name               |           [38;5;8mNone[0m           |                      | /workspace/ADLS_Proj/mas | /workspace/ADLS_Proj/mas |
|                         |                          |                      | e_output/resnet18_cls_ci | e_output/resnet18_cls_ci |
|                         |                          |                      | far10_2025-03-08/sof

Here is for Resnet50

In [2]:
MUL_PRECISION_BY_TYPE_TOML = "/workspace/ADLS_Proj/docs/tutorials/proj/resnet50_Mul_quant.toml"
RES_CHECKPOINT_PATH = "/workspace/ADLS_Proj/mase_output/resnet50_cls_cifar10_2025-03-15/software/training_ckpts/best.ckpt"
!python ch transform --config {MUL_PRECISION_BY_TYPE_TOML} --load {RES_CHECKPOINT_PATH} --load-type pl

INFO: Seed set to 0
I0317 18:34:02.077955 140444487648320 seed.py:57] Seed set to 0
+-------------------------+--------------------------+----------------------+--------------------------+--------------------------+
| Name                    |         Default          |     Config. File     |     Manual Override      |        Effective         |
+-------------------------+--------------------------+----------------------+--------------------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |         cls          |                          |           cls            |
| load_name               |           [38;5;8mNone[0m           |                      | /workspace/ADLS_Proj/mas | /workspace/ADLS_Proj/mas |
|                         |                          |                      | e_output/resnet50_cls_ci | e_output/resnet50_cls_ci |
|                         |                          |                      | far10_2025-03-15/sof