### ADLS Proj: TensorRT with MASE for Multiple Precision Inference

This notebook demonstrates the integration of TensorRT passes into MASE as part of the MASERT framework.

Currently, our experiments are conducted on RTX 4060 and RTX 3070 GPUs, as our request for A100 access is still pending.

### Objective
Our goal is to plot trade-off curves that analyze the relationship between different variables, including:
- **GPU Type** (e.g., RTX 4060, RTX 3070, and A100 when available)
- **Dataset** (e.g., CIFAR-10)
- **Model Type** (e.g., ResNet18, ResNet50, VGG, AlexNet ...)
- **Precision vs. Runtime Trade-off** (FP32, FP16, INT8)

At this stage, we have successfully implemented inference using multiple models, such as **ResNet18 and ResNet50**, on the **CIFAR-10 dataset**. Further experiments will explore the precision-runtime trade-off across different GPU architectures.


### Training the Model for Quantization Experiments

In this section, we train an original model of a target model type. The trained model will later serve as a baseline for different precision quantization experiments, including FP32, FP16, and INT8. This process helps in evaluating the trade-offs between model accuracy and runtime efficiency across different GPU architectures.

#### Running the Training Script

To train the model, execute the following command:

```bash
!python3 ./ch train --config /workspace/ADLS_Proj/docs/tutorials/tensorrt/resnet18_INT8_quantization_by_type.toml



In [None]:
!python3 ./ch train --config /workspace/ADLS_Proj/docs/tutorials/proj/resnet18_INT8_quant.toml

INFO: Seed set to 0
I0311 00:15:51.512130 140276942296128 seed.py:57] Seed set to 0
+-------------------------+--------------------------+--------------+-----------------+--------------------------+
| Name                    |         Default          | Config. File | Manual Override |        Effective         |
+-------------------------+--------------------------+--------------+-----------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |     cls      |                 |           cls            |
| load_name               |           None           |              |                 |           None           |
| load_type               |            mz            |              |                 |            mz            |
| batch_size              |           [38;5;8m128[0m            |      64      |                 |            64            |
| to_debug                |          False           |              |                

### INT8 Quantization with TensorRT

This section explains the process of **INT8 quantization** using TensorRT within the MASE framework. The key steps include **fake quantization, calibration, fine-tuning, and generating a TensorRT engine**.

### Code Execution Flow

1. **Apply TensorRT Passes**
   - **Fake Quantization**: Inserts quantization simulation operations.
   - **Summarization**: Displays which layers were quantized.
   - **Calibration**: Uses calibration algorithms (e.g., histogram-based) to determine optimal quantization parameters.
   - **Fine-Tuning**: Adjusts parameters to recover accuracy loss after quantization.

2. **Generate the TensorRT Engine**
   - Calls `tensorrt_engine_interface_pass` to convert the optimized graph into a **TensorRT engine**.

3. **Benchmarking & Performance Analysis**
   - Runs inference tests with warm-up and batch evaluation to measure efficiency.


In [None]:
RES_INT8_BY_TYPE_TOML = "/workspace/ADLS_Proj/docs/tutorials/proj/resnet18_INT8_quant.toml"
RES_CHECKPOINT_PATH = "/workspace/ADLS_Proj/docs/tutorials/proj/model/resnet18/best.ckpt"
!python ch transnew --config {RES_INT8_BY_TYPE_TOML} --load {RES_CHECKPOINT_PATH} --load-type pl

INFO: Seed set to 0
I0320 00:42:34.917057 139954364924992 seed.py:57] Seed set to 0
+-------------------------+--------------------------+--------------+--------------------------+--------------------------+
| Name                    |         Default          | Config. File |     Manual Override      |        Effective         |
+-------------------------+--------------------------+--------------+--------------------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |     cls      |                          |           cls            |
| load_name               |           [38;5;8mNone[0m           |              | /workspace/ADLS_Proj/doc | /workspace/ADLS_Proj/doc |
|                         |                          |              | s/tutorials/proj/model/r | s/tutorials/proj/model/r |
|                         |                          |              |    esnet18/best.ckpt     |    esnet18/best.ckpt     |
| load_type           

### FP16 Conversion with TensorRT

#### Overview
This section describes the process of converting a model to **FP16 precision** using TensorRT. Unlike **INT8 quantization**, **FP16 does not require calibration, fake quantization, or fine-tuning**. The conversion process is simpler and primarily focuses on **speeding up inference while maintaining high precision**.

### Code Execution Flow

1. **Apply TensorRT FP16 Pass**
   - **No Fake Quantization**: Since FP16 does not require quantization-aware training, the `quantize` option is set to `false`.
   - **No Calibration**: Unlike INT8, FP16 does not need calibration data, so `num_calibration_batches` is set to `0`.
   - **No Fine-Tuning**: Additional training is unnecessary in FP16 mode.

2. **Generate the TensorRT Engine**
   - Calls `tensorrt_engine_interface_pass` to convert the model to a **TensorRT FP16 engine**.

3. **Benchmarking & Performance Analysis**
   - Runs inference tests with warm-up and batch evaluation to measure efficiency.


In [None]:
RES_FP16_BY_TYPE_TOML = "/workspace/ADLS_Proj/docs/tutorials/proj/resnet18_FP16_quant.toml"
RES_CHECKPOINT_PATH = "/workspace/ADLS_Proj/docs/tutorials/proj/model/resnet18/best.ckpt"
!python ch transnew --config {RES_FP16_BY_TYPE_TOML} --load {RES_CHECKPOINT_PATH} --load-type pl

INFO: Seed set to 0
I0320 00:49:48.720277 140621743559744 seed.py:57] Seed set to 0
+-------------------------+--------------------------+--------------+--------------------------+--------------------------+
| Name                    |         Default          | Config. File |     Manual Override      |        Effective         |
+-------------------------+--------------------------+--------------+--------------------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |     cls      |                          |           cls            |
| load_name               |           [38;5;8mNone[0m           |              | /workspace/ADLS_Proj/doc | /workspace/ADLS_Proj/doc |
|                         |                          |              | s/tutorials/proj/model/r | s/tutorials/proj/model/r |
|                         |                          |              |    esnet18/best.ckpt     |    esnet18/best.ckpt     |
| load_type           

### FP32 Conversion with TensorRT

The process for converting a model to **FP32 precision** using TensorRT is quite similar to the **FP16 conversion**, but with even fewer modifications. Since FP32 is the default precision for deep learning models, the main goal here is to **leverage TensorRT optimizations** without changing the numerical format.

In [None]:
RES_FP32_BY_TYPE_TOML = "/workspace/ADLS_Proj/docs/tutorials/proj/resnet18_FP32_quant.toml"
RES_CHECKPOINT_PATH = "/workspace/ADLS_Proj/docs/tutorials/proj/model/resnet18/best.ckpt"
!python ch transnew --config {RES_FP32_BY_TYPE_TOML} --load {RES_CHECKPOINT_PATH} --load-type pl

INFO: Seed set to 0
I0315 01:43:15.535419 140335826768960 seed.py:57] Seed set to 0
+-------------------------+--------------------------+--------------+--------------------------+--------------------------+
| Name                    |         Default          | Config. File |     Manual Override      |        Effective         |
+-------------------------+--------------------------+--------------+--------------------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |     cls      |                          |           cls            |
| load_name               |           [38;5;8mNone[0m           |              | /workspace/ADLS_Proj/mas | /workspace/ADLS_Proj/mas |
|                         |                          |              | e_output/resnet18_cls_ci | e_output/resnet18_cls_ci |
|                         |                          |              | far10_2025-03-08/softwar | far10_2025-03-08/softwar |
|                     

### Multi-Precision Multi-Batch size (Runtime-Accuracy Trade-off)

input: Model:Resnet18 Restnet50 VGG

Variable: Precision: INT8 FP16 FP32 (Original) | batch size

Output: Original INT8 FP16 FP32 (Acc, Runtime)(Model, batch size) 

In [None]:
MUL_PRECISION_BY_TYPE_TOML = "/workspace/ADLS_Proj/docs/tutorials/proj/resnet18_Mul_quant.toml"
RES_CHECKPOINT_PATH = "/workspace/ADLS_Proj/docs/tutorials/proj/model/resnet18/best.ckpt"
!python ch transnew --config {MUL_PRECISION_BY_TYPE_TOML} --load {RES_CHECKPOINT_PATH} --load-type pl

INFO: Seed set to 0
I0317 17:52:22.237135 140261090690112 seed.py:57] Seed set to 0
+-------------------------+--------------------------+----------------------+--------------------------+--------------------------+
| Name                    |         Default          |     Config. File     |     Manual Override      |        Effective         |
+-------------------------+--------------------------+----------------------+--------------------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |         cls          |                          |           cls            |
| load_name               |           [38;5;8mNone[0m           |                      | /workspace/ADLS_Proj/mas | /workspace/ADLS_Proj/mas |
|                         |                          |                      | e_output/resnet18_cls_ci | e_output/resnet18_cls_ci |
|                         |                          |                      | far10_2025-03-08/sof

#### Here is for Resnet50

In [None]:
MUL_PRECISION_BY_TYPE_TOML = "/workspace/ADLS_Proj/docs/tutorials/proj/resnet50_Mul_quant.toml"
RES_CHECKPOINT_PATH = "/workspace/ADLS_Proj/docs/tutorials/proj/model/resnet50/best.ckpt"
!python ch transnew --config {MUL_PRECISION_BY_TYPE_TOML} --load {RES_CHECKPOINT_PATH} --load-type pl

INFO: Seed set to 0
I0317 18:34:02.077955 140444487648320 seed.py:57] Seed set to 0
+-------------------------+--------------------------+----------------------+--------------------------+--------------------------+
| Name                    |         Default          |     Config. File     |     Manual Override      |        Effective         |
+-------------------------+--------------------------+----------------------+--------------------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |         cls          |                          |           cls            |
| load_name               |           [38;5;8mNone[0m           |                      | /workspace/ADLS_Proj/mas | /workspace/ADLS_Proj/mas |
|                         |                          |                      | e_output/resnet50_cls_ci | e_output/resnet50_cls_ci |
|                         |                          |                      | far10_2025-03-15/sof

## Explore Sparsity

Here is for Resnet18

In [None]:
FP16_SPARSITY_BY_TYPE_TOML = "/workspace/ADLS_Proj/docs/tutorials/proj/resnet18_FP16_spar.toml"
RES_CHECKPOINT_PATH = "/workspace/ADLS_Proj/docs/tutorials/proj/model/resnet18/best.ckpt"
!python ch transnew --config {FP16_SPARSITY_BY_TYPE_TOML} --load {RES_CHECKPOINT_PATH} --load-type pl

INFO: Seed set to 0
I0319 01:41:30.468708 140167305331776 seed.py:57] Seed set to 0
+-------------------------+--------------------------+----------------------+--------------------------+--------------------------+
| Name                    |         Default          |     Config. File     |     Manual Override      |        Effective         |
+-------------------------+--------------------------+----------------------+--------------------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |         cls          |                          |           cls            |
| load_name               |           [38;5;8mNone[0m           |                      | /workspace/ADLS_Proj/mas | /workspace/ADLS_Proj/mas |
|                         |                          |                      | e_output/resnet18_cls_ci | e_output/resnet18_cls_ci |
|                         |                          |                      | far10_2025-03-08/sof

In [26]:
INT8_SPARSITY_BY_TYPE_TOML = "/workspace/ADLS_Proj/docs/tutorials/proj/resnet18_INT8_spar.toml"
RES_CHECKPOINT_PATH = "/workspace/ADLS_Proj/docs/tutorials/proj/model/resnet18/best.ckpt"
!python ch transnew --config {INT8_SPARSITY_BY_TYPE_TOML} --load {RES_CHECKPOINT_PATH} --load-type pl

INFO: Seed set to 0
I0320 20:11:25.069239 140672379520064 seed.py:57] Seed set to 0
+-------------------------+--------------------------+--------------+--------------------------+--------------------------+
| Name                    |         Default          | Config. File |     Manual Override      |        Effective         |
+-------------------------+--------------------------+--------------+--------------------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |     cls      |                          |           cls            |
| load_name               |           [38;5;8mNone[0m           |              | /workspace/ADLS_Proj/doc | /workspace/ADLS_Proj/doc |
|                         |                          |              | s/tutorials/proj/model/r | s/tutorials/proj/model/r |
|                         |                          |              |    esnet18/best.ckpt     |    esnet18/best.ckpt     |
| load_type           

Here is for Resnet50

## Explore Meta-Learning

Firstly, we need to collect dataset

In [None]:
#VGG7_CHECKPOINT_PATH = "/workspace/ADLS_Proj/docs/tutorials/proj/model/vgg7/best.ckpt"

In [33]:
RES18_META_BY_TYPE_TOML = "/workspace/ADLS_Proj/docs/tutorials/proj/resnet18_meta.toml"
RES18_CHECKPOINT_PATH = "/workspace/ADLS_Proj/docs/tutorials/proj/model/resnet18/best.ckpt"

!python ch meta --config {RES18_META_BY_TYPE_TOML} --load {RES18_CHECKPOINT_PATH} --load-type pl

INFO: Seed set to 0
I0320 20:55:53.820242 140701426177088 seed.py:57] Seed set to 0
+-------------------------+--------------------------+----------------------+--------------------------+--------------------------+
| Name                    |         Default          |     Config. File     |     Manual Override      |        Effective         |
+-------------------------+--------------------------+----------------------+--------------------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |         cls          |                          |           cls            |
| load_name               |           [38;5;8mNone[0m           |                      | /workspace/ADLS_Proj/doc | /workspace/ADLS_Proj/doc |
|                         |                          |                      | s/tutorials/proj/model/r | s/tutorials/proj/model/r |
|                         |                          |                      |    esnet18/best.ckpt

In [36]:
RES50_META_BY_TYPE_TOML = "/workspace/ADLS_Proj/docs/tutorials/proj/resnet50_meta.toml"
RES50_CHECKPOINT_PATH = "/workspace/ADLS_Proj/docs/tutorials/proj/model/resnet50/best.ckpt"

!python ch meta --config {RES50_META_BY_TYPE_TOML} --load {RES50_CHECKPOINT_PATH} --load-type pl

INFO: Seed set to 0
I0320 21:58:41.048768 140303353021504 seed.py:57] Seed set to 0
+-------------------------+--------------------------+----------------------+--------------------------+--------------------------+
| Name                    |         Default          |     Config. File     |     Manual Override      |        Effective         |
+-------------------------+--------------------------+----------------------+--------------------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |         cls          |                          |           cls            |
| load_name               |           [38;5;8mNone[0m           |                      | /workspace/ADLS_Proj/doc | /workspace/ADLS_Proj/doc |
|                         |                          |                      | s/tutorials/proj/model/r | s/tutorials/proj/model/r |
|                         |                          |                      |    esnet50/best.ckpt

And then do the meta-learning training

In [42]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
import torch.optim as optim

# 读取 CSV 数据集
df = pd.read_csv("/workspace/ADLS_Proj/docs/tutorials/proj/meta_data.csv")

# 归一化数值特征
scaler = MinMaxScaler()
df[["latency", "accuracy", "energy"]] = scaler.fit_transform(df[["latency", "accuracy", "energy"]])

# One-Hot 编码 `quant_method`
quant_encoder = OneHotEncoder(sparse_output=False)
quant_method_encoded = quant_encoder.fit_transform(df[["quant_method"]])
quant_method_labels = quant_encoder.categories_[0]  # 获取量化方法类别

# One-Hot 编码 `model_name`
model_encoder = OneHotEncoder(sparse_output=False)
model_name_encoded = model_encoder.fit_transform(df[["model_name"]])
model_name_labels = model_encoder.categories_[0]  # 获取模型类别

# One-Hot 编码 `batch_size`
batch_size_encoder = OneHotEncoder(sparse_output=False)
batch_size_encoded = batch_size_encoder.fit_transform(df[["batch_size"]])
batch_size_labels = batch_size_encoder.categories_[0]  # 获取 batch_size 可能的取值


# 添加 One-Hot 编码的列
df_encoded = pd.concat(
    [
        df[["latency", "accuracy", "energy"]],  # 作为输入特征
        pd.DataFrame(model_name_encoded, columns=model_name_labels),  # 模型 One-Hot
        pd.DataFrame(batch_size_encoded, columns=batch_size_labels),  # batch_size One-Hot (输出)
        pd.DataFrame(quant_method_encoded, columns=quant_method_labels),  # 量化方法 One-Hot (输出)
    ],
    axis=1,
)

# 划分训练集和测试集
train_df, test_df = train_test_split(df_encoded, test_size=0.2, random_state=42)

# 提取输入特征 & 目标输出
input_features = ["latency", "accuracy", "energy"] + list(model_name_labels)
output_features = list(batch_size_labels) + list(quant_method_labels)  # 预测 batch_size + 量化方法

X_train = train_df[input_features].values
y_train = train_df[output_features].values

X_test = test_df[input_features].values
y_test = test_df[output_features].values

# 转换为 PyTorch Tensor
X_train_tensor = torch.FloatTensor(X_train)
y_train_tensor = torch.FloatTensor(y_train)
X_test_tensor = torch.FloatTensor(X_test)
y_test_tensor = torch.FloatTensor(y_test)

# 定义 MLP 预测模型
class MetaPolicyNetwork(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(MetaPolicyNetwork, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_dim, output_dim)
        self.softmax = nn.Softmax(dim=1)  # 归一化到概率分布

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.softmax(self.fc2(x))
        return x

# 初始化模型
input_dim = len(input_features)  # latency, accuracy, energy + model_name_onehot
hidden_dim = 16
output_dim = len(output_features)  # batch_size_onehot + quant_method_onehot

model = MetaPolicyNetwork(input_dim, hidden_dim, output_dim)

# 训练模型
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

num_epochs = 500
for epoch in range(num_epochs):
    optimizer.zero_grad()
    output = model(X_train_tensor)
    loss = criterion(output, torch.max(y_train_tensor, 1)[1])  # CrossEntropyLoss 需要类别索引
    loss.backward()
    optimizer.step()

    if epoch % 50 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")

print("训练完成！")

Epoch 0, Loss: 2.3023
Epoch 50, Loss: 2.0734
Epoch 100, Loss: 1.8238
Epoch 150, Loss: 1.6748
Epoch 200, Loss: 1.6117
Epoch 250, Loss: 1.5849
Epoch 300, Loss: 1.5717
Epoch 350, Loss: 1.5639
Epoch 400, Loss: 1.5586
Epoch 450, Loss: 1.5545
训练完成！


In [43]:
# 推荐量化策略
def recommend_quantization(model_name, target_latency, target_accuracy, target_energy):
    """ 根据目标要求推荐最合适的 batch_size 和 量化方法 """

    # One-Hot 编码模型名称
    model_onehot = np.zeros(len(model_name_labels))
    model_index = np.where(model_name_labels == model_name)[0]
    if len(model_index) > 0:
        model_onehot[model_index[0]] = 1  # 设置对应索引为 1

    # 归一化目标要求
    scaled_input = scaler.transform([[target_latency, target_accuracy, target_energy]])

    # 拼接 One-Hot
    input_data = np.hstack([scaled_input, model_onehot.reshape(1, -1)])

    # 转换为 PyTorch Tensor
    input_tensor = torch.FloatTensor(input_data)

    # 预测 batch_size & 量化方法
    with torch.no_grad():
        prediction = model(input_tensor).numpy()

    # 分割预测结果
    batch_size_probs = prediction[:, :len(batch_size_labels)]  # batch_size 预测概率
    quant_method_probs = prediction[:, len(batch_size_labels):]  # 量化方法预测概率

    # 选择最高概率的 batch_size 和 量化方法
    best_batch_size = batch_size_labels[np.argmax(batch_size_probs)]
    best_method = quant_method_labels[np.argmax(quant_method_probs)]

    return {
        "推荐 batch_size": int(best_batch_size),
        "推荐量化方法": best_method
    }

# **示例推荐**
recommendation = recommend_quantization("resnet18", 7.0, 0.72, 0.02)
print("推荐结果:", recommendation)


推荐结果: {'推荐 batch_size': 32, '推荐量化方法': 'tensorrt_int8'}


