# TensorRT Quantization Tutorial

This notebook is designed to show the features of the TensorRT passes integrated into MASE as part of the MASERT framework. The following demonstrations were run on a NVIDIA RTX A2000 GPU with a Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz CPU.

## Section 1. Show Configuration
Firstly, we will show you how to do a int8 quantization of a simple model, `jsc-toy`, and compare the quantized model to the original model using the `Machop API`. The quantization process is split into the following stages, each using their own individual pass, and are explained in depth at each subsection:

1. [Fake quantization](#section-11-fake-quantization): `tensorrt_fake_quantize_transform_pass`
2. [Calibration](#section-12-calibration): `tensorrt_calibrate_transform_pass`
3. [Quantized Aware Training](#section-13-quantized-aware-training-qat): `tensorrt_fine_tune_transform_pass`
4. [Quantization](#section-14-tensorrt-quantization): `tensorrt_engine_interface_pass`
5. [Analysis](#section-15-performance-analysis): `tensorrt_analysis_pass`

We start by loading in the required libraries and passes required for the notebook as well as ensuring the correct path is set for machop to be used.

In [1]:
import sys
import os
from pathlib import Path
import toml

# Figure out the correct path
machop_path = Path(".").resolve().parent.parent.parent /"src"
assert machop_path.exists(), "Failed to find machop at: {}".format(machop_path)
sys.path.append(str(machop_path))

# Add directory to the PATH so that chop can be called
new_path = "../../../machop"
full_path = os.path.abspath(new_path)
os.environ['PATH'] += os.pathsep + full_path

from chop.tools.utils import to_numpy_if_tensor
from chop.tools.logger import set_logging_verbosity
from chop.tools import get_cf_args, get_dummy_input
from chop.passes.graph.utils import deepcopy_mase_graph
from chop.tools.get_input import InputGenerator
from chop.tools.checkpoint_load import load_model
from chop.ir import MaseGraph
from chop.models import get_model_info, get_model, get_tokenizer
from chop.dataset import MaseDataModule, get_dataset_info
from chop.passes.graph.transforms import metadata_value_type_cast_transform_pass
from chop.passes.graph import (
    summarize_quantization_analysis_pass,
    add_common_metadata_analysis_pass,
    init_metadata_analysis_pass,
    add_software_metadata_analysis_pass,
    tensorrt_calibrate_transform_pass,
    tensorrt_fake_quantize_transform_pass,
    tensorrt_fine_tune_transform_pass,
    tensorrt_engine_interface_pass,
    runtime_analysis_pass,
    )

set_logging_verbosity("info")

[32mINFO    [0m [34mSet logging level to info[0m


Check dependency (the dependent package "cuda" refers to "cuda-python")

In [2]:
from chop.tools.check_dependency import check_deps_tensorRT_pass
check_deps_tensorRT_pass(silent=False)

[32mINFO    [0m [34mExtension: All dependencies for TensorRT pass are available.[0m


True

Next, we load in the toml file used for quantization. To view the configuration, click [here](../../../machop/configs/tensorrt/jsc_toy_INT8_quantization_by_type.toml).

In [3]:
import toml
# Path to your TOML file
RES_TOML_PATH = 'resnet18_INT8_quant.toml'

# Reading TOML file and converting it into a Python dictionary
with open(RES_TOML_PATH, 'r') as toml_file:
    pass_args = toml.load(toml_file)

# Extract the 'passes.tensorrt' section and its children
tensorrt_config = pass_args.get('passes', {}).get('tensorrt', {})
print(tensorrt_config)
# Extract the 'passes.runtime_analysis' section and its children
runtime_analysis_config = pass_args.get('passes', {}).get('tensorrt', {}).get('runtime_analysis', {})
print(runtime_analysis_config)

{'by': 'type', 'num_calibration_batches': 10, 'post_calibration_analysis': True, 'default': {'config': {'quantize': True, 'calibrators': ['percentile', 'mse', 'entropy'], 'percentiles': [99.0, 99.9, 99.99], 'precision': 'int8'}, 'input': {'calibrator': 'histogram', 'quantize_axis': False}, 'weight': {'calibrator': 'histogram', 'quantize_axis': False}}, 'fine_tune': {'fine_tune': True}, 'runtime_analysis': {'num_batches': 500, 'num_GPU_warmup_batches': 5, 'test': True}}
{'num_batches': 500, 'num_GPU_warmup_batches': 5, 'test': True}


We then create a `MaseGraph` by loading in a model and training it using the toml configuration model arguments.

In [4]:
from chop.dataset import MaseDataModule
from chop.models import get_model_info
from chop.models import get_model
from chop.tools.get_input import InputGenerator

# Load the basics in
model_name = pass_args['model']
dataset_name = pass_args['dataset']
max_epochs = pass_args['max_epochs']
batch_size = pass_args['batch_size']
learning_rate = pass_args['learning_rate']
accelerator = pass_args['accelerator']

data_module = MaseDataModule(
    name=dataset_name,
    batch_size=batch_size,
    model_name=model_name,
    num_workers=0,
)
data_module.prepare_data()
data_module.setup()

# Add the data_module and other necessary information to the configs
configs = [tensorrt_config, runtime_analysis_config]
for config in configs:
    config['task'] = pass_args['task']
    config['dataset'] = pass_args['dataset']
    config['batch_size'] = pass_args['batch_size']
    config['model'] = pass_args['model']
    config['data_module'] = data_module
    config['accelerator'] = 'cuda' if pass_args['accelerator'] == 'gpu' else pass_args['accelerator']
    if config['accelerator'] == 'gpu':
        os.environ['CUDA_MODULE_LOADING'] = 'LAZY'

model_info = get_model_info(model_name)
# quant_modules.initialize()
model = get_model(
    model_name,
    # task="cls",
    dataset_info=data_module.dataset_info,
    pretrained=False)


input_generator = InputGenerator(
    data_module=data_module,
    model_info=model_info,
    task="cls",
    which_dataloader="train",
)

# generate the mase graph and initialize node metadata
mg = MaseGraph(model=model)

model_info is MaseModelInfo(name='resnet', model_source=<ModelSource.TORCHVISION: 'torchvision'>, task_type=<ModelTaskType.VISION: 'vision'>, image_classification=True, physical_data_point_classification=False, sequence_classification=False, seq2seqLM=False, causal_LM=False, is_quantized=False, is_lora=False, is_sparse=False, is_fx_traceable=True)


In [3]:
!python3 ./ch train --config /workspace/ADLS_Proj/docs/tutorials/tensorrt/resnet18_INT8_quantization_by_type.toml

INFO: Seed set to 0
I0311 00:15:51.512130 140276942296128 seed.py:57] Seed set to 0
+-------------------------+--------------------------+--------------+-----------------+--------------------------+
| Name                    |         Default          | Config. File | Manual Override |        Effective         |
+-------------------------+--------------------------+--------------+-----------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |     cls      |                 |           cls            |
| load_name               |           None           |              |                 |           None           |
| load_type               |            mz            |              |                 |            mz            |
| batch_size              |           [38;5;8m128[0m            |      64      |                 |            64            |
| to_debug                |          False           |              |                

Then we load in the checkpoint. You will have to adjust this according to where it has been stored in the mase_output directory.

In [None]:
# Load in the trained checkpoint - change this accordingly
RES_CHECKPOINT_PATH = "/workspace/ADLS_Proj/mase_output/resnet18_cls_cifar10_2025-03-08/software/training_ckpts/best.ckpt"

# Load model directly
# from transformers import AutoImageProcessor, AutoModelForImageClassification

# processor = AutoImageProcessor.from_pretrained("microsoft/resnet-50")
# model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")

model = load_model(load_name=RES_CHECKPOINT_PATH, load_type="pl", model=model)
print("load model done!")

# Initiate metadata
dummy_in = next(iter(input_generator))
print("dummy in done")
dummy_in_converted = {"pixel_values": dummy_in["x"]}

_ = model(**dummy_in_converted)

print("_ done")

mg, _ = init_metadata_analysis_pass(mg, None)
print("init_metadata_analysis_pass done")

mg, _ = add_common_metadata_analysis_pass(mg, {"dummy_in": dummy_in})
print("add_common_metadata_analysis_pass done")

mg, _ = add_software_metadata_analysis_pass(mg, None)
print("add_software_metadata_analysis_pass done")

mg, _ = metadata_value_type_cast_transform_pass(mg, pass_args={"fn": to_numpy_if_tensor})
print("metadata_value_type_cast_transform_pass done")

# Before we begin, we will copy the original MaseGraph model to use for comparison during quantization analysis
mg_original = deepcopy_mase_graph(mg)
print("deep copy done")

load model done!
dummy in done
_ done
init_metadata_analysis_pass done
add_common_metadata_analysis_pass done
add_software_metadata_analysis_pass done
metadata_value_type_cast_transform_pass done
using safe deepcopy
deep copy done


In [None]:
import os
import toml
from copy import deepcopy
from pathlib import Path
import logging
import torch

# 导入 MASE 相关工具与 passes
from chop.ir.graph.mase_graph import MaseGraph
from chop.passes.graph.analysis import (
    init_metadata_analysis_pass,
    add_common_metadata_analysis_pass,
    add_software_metadata_analysis_pass,
)
from chop.passes.graph.transforms import metadata_value_type_cast_transform_pass
from chop.passes.graph.utils import deepcopy_mase_graph
from chop.tools.get_input import InputGenerator, get_dummy_input, get_cf_args
from chop.tools.utils import to_numpy_if_tensor

# 导入数据和模型工具
from chop.dataset import MaseDataModule
from chop.models import get_model_info, get_model

# 从 transformers 导入 HuggingFace 模型及预处理器
from transformers import AutoImageProcessor, AutoModelForImageClassification

# 读取 TOML 配置文件
RES_TOML_PATH = 'resnet18_INT8_quant.toml'
with open(RES_TOML_PATH, 'r') as toml_file:
    pass_args = toml.load(toml_file)

# 从配置中提取各个部分
tensorrt_config = pass_args.get('passes', {}).get('tensorrt', {})
runtime_analysis_config = pass_args.get('passes', {}).get('tensorrt', {}).get('runtime_analysis', {})

print("tensorrt config:", tensorrt_config)
print("runtime_analysis config:", runtime_analysis_config)

# 基础设定
model_name = pass_args['model']      # 此处应为 "resnet18"
dataset_name = pass_args['dataset']   # 例如 "cifar10"
max_epochs = pass_args['max_epochs']
batch_size = pass_args['batch_size']
learning_rate = pass_args['learning_rate']
accelerator = pass_args['accelerator']

# 初始化数据模块
data_module = MaseDataModule(
    name=dataset_name,
    batch_size=batch_size,
    model_name=model_name,
    num_workers=0,
)
data_module.prepare_data()
data_module.setup()

# 将额外信息加入到配置中
configs = [tensorrt_config, runtime_analysis_config]
for config in configs:
    config['task'] = pass_args['task']
    config['dataset'] = dataset_name
    config['batch_size'] = batch_size
    config['model'] = model_name
    config['data_module'] = data_module
    # 如果配置中 accelerator 为 gpu，则转换为 cuda
    config['accelerator'] = 'cuda' if accelerator == 'gpu' else accelerator
    if config['accelerator'] == 'gpu':
        os.environ['CUDA_MODULE_LOADING'] = 'LAZY'

# 获取模型信息（例如该模型是否是视觉任务）
model_info = get_model_info(model_name)

# 使用 HuggingFace 下载预训练模型和对应的预处理器
print("Loading HuggingFace model and processor ...")
processor = AutoImageProcessor.from_pretrained("microsoft/resnet-50")
model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")
print("HuggingFace model loaded.")

# 使用 InputGenerator 获取一个 dummy 输入
input_generator = InputGenerator(
    data_module=data_module,
    model_info=model_info,
    task=pass_args['task'],  # 例如 "cls"
    which_dataloader="train",
)
dummy_in = next(iter(input_generator))
print("Dummy input obtained.")

# 假设 input_generator 返回的字典中键名为 "x"（可以根据实际情况修改）
# 将其转换为模型所需的键 "pixel_values"
dummy_in_converted = {"pixel_values": dummy_in["x"]}
# 检查模型前向能否正常工作
_ = model(**dummy_in_converted)
print("Model forward pass succeeded.")

# 构造用于 FX 跟踪的 concrete args，确保只包含 pixel_values
cf_args = {"pixel_values": dummy_in_converted["pixel_values"]}

# 创建 MaseGraph 对象
print("Tracing model into MaseGraph ...")
mg = MaseGraph(model=model, cf_args=cf_args, hf_input_names=["pixel_values"])

# 初始化节点元数据
mg, _ = init_metadata_analysis_pass(mg, None)
print("init_metadata_analysis_pass done.")

# 添加通用的元数据（使用转换后的 dummy_in）
mg, _ = add_common_metadata_analysis_pass(mg, {"dummy_in": dummy_in_converted})
print("add_common_metadata_analysis_pass done.")

# 添加软件相关的元数据
mg, _ = add_software_metadata_analysis_pass(mg, None)
print("add_software_metadata_analysis_pass done.")

# 将节点中包含 tensor 的值转换为 numpy（以便后续处理）
mg, _ = metadata_value_type_cast_transform_pass(mg, pass_args={"fn": to_numpy_if_tensor})
print("metadata_value_type_cast_transform_pass done.")

# 备份一份原始图
mg_original = deepcopy_mase_graph(mg)
print("MaseGraph deep copy done.")

# 此时 mg 对象即为你从 HuggingFace 下载的模型转换而成的 MaseGraph


tensorrt config: {'by': 'type', 'num_calibration_batches': 10, 'post_calibration_analysis': True, 'default': {'config': {'quantize': True, 'calibrators': ['percentile', 'mse', 'entropy'], 'percentiles': [99.0, 99.9, 99.99], 'precision': 'int8'}, 'input': {'calibrator': 'histogram', 'quantize_axis': False}, 'weight': {'calibrator': 'histogram', 'quantize_axis': False}}, 'fine_tune': {'fine_tune': True}, 'runtime_analysis': {'num_batches': 500, 'num_GPU_warmup_batches': 5, 'test': True}}
runtime_analysis config: {'num_batches': 500, 'num_GPU_warmup_batches': 5, 'test': True}
Loading HuggingFace model and processor ...
HuggingFace model loaded.
Dummy input obtained.
Model forward pass succeeded.
Tracing model into MaseGraph ...
init_metadata_analysis_pass done.


ValueError: Unknown module: resnet.encoder.stages.0.layers.0.layer.2.activation

In [12]:
import os
import toml
from copy import deepcopy
from pathlib import Path
import logging
import torch

# 导入 MASE 相关工具与 passes
from chop.ir.graph.mase_graph import MaseGraph
from chop.passes.graph.analysis import (
    init_metadata_analysis_pass,
    add_common_metadata_analysis_pass,
    add_software_metadata_analysis_pass,
)
from chop.passes.graph.transforms import metadata_value_type_cast_transform_pass
from chop.passes.graph.utils import deepcopy_mase_graph
from chop.tools.get_input import InputGenerator, get_dummy_input, get_cf_args
from chop.tools.utils import to_numpy_if_tensor

# 导入数据和模型工具
from chop.dataset import MaseDataModule
from chop.models import get_model_info, get_model

# 从 transformers 导入 HuggingFace 模型及预处理器
from transformers import AutoImageProcessor, AutoModelForImageClassification

# 对于 ResNet 模型，我们使用 CIFAR-10 数据集作为例子
checkpoint = ""              # 图像模型直接从 HuggingFace加载，不需要指定预训练检查点路径
tokenizer_checkpoint = ""    # 图像模型不需要分词器
dataset_name = "cifar10"     # 使用 CIFAR-10 数据集

# 初始化数据模块（这里假设 MaseDataModule 已实现对图像数据集的加载）
data_module = MaseDataModule(
    name=dataset_name,
    batch_size=64,
    num_workers=4,
    model_name="resnet18",  # 模型名称为 resnet18
)
data_module.prepare_data()
data_module.setup()

# 使用 HuggingFace 下载预训练模型及对应的图像预处理器
print("Loading HuggingFace model and processor ...")
processor = AutoImageProcessor.from_pretrained("microsoft/resnet-50")
model = AutoModelForImageClassification.from_pretrained("microsoft/resnet-50")
print("HuggingFace model loaded.")

# 设置模型的配置参数，表明这是一个图像分类问题
model.config.problem_type = "image_classification"

# 创建 MaseGraph 对象，对于图像模型，输入名称为 "pixel_values"
print("Tracing model into MaseGraph ...")
mg = MaseGraph(
    model,
    hf_input_names=["pixel_values"],
)

# 初始化节点元数据
mg, _ = init_metadata_analysis_pass(mg)
print("init_metadata_analysis_pass done.")

# 添加通用的元数据（利用 dummy 输入自动解析各节点参数信息）
dummy_in = {"pixel_values": torch.randn(1, 3, 224, 224)}
mg, _ = add_common_metadata_analysis_pass(mg, {"dummy_in": dummy_in})
print("add_common_metadata_analysis_pass done.")

# 添加软件相关的元数据
mg, _ = add_software_metadata_analysis_pass(mg)
print("add_software_metadata_analysis_pass done.")

# 将节点中包含 tensor 的值转换为 numpy（以便后续处理）
mg, _ = metadata_value_type_cast_transform_pass(mg, pass_args={"fn": to_numpy_if_tensor})
print("metadata_value_type_cast_transform_pass done.")

# 备份一份原始图，用于后续对比或恢复
mg_original = deepcopy_mase_graph(mg)
print("MaseGraph deep copy done.")

# 此时 mg 对象即为你从 HuggingFace 下载的 ResNet 模型转换而成的 MaseGraph


Loading HuggingFace model and processor ...
HuggingFace model loaded.
Tracing model into MaseGraph ...
init_metadata_analysis_pass done.


ValueError: Unknown module: resnet.encoder.stages.0.layers.0.layer.2.activation

In [None]:
import os
import toml
from copy import deepcopy
from pathlib import Path
import logging
import torch

# 导入 MASE 相关工具与 passes
from chop.ir.graph.mase_graph import MaseGraph
from chop.passes.graph.analysis import (
    init_metadata_analysis_pass,
    add_common_metadata_analysis_pass,
    add_software_metadata_analysis_pass,
)
from chop.passes.graph.transforms import metadata_value_type_cast_transform_pass
from chop.passes.graph.utils import deepcopy_mase_graph
from chop.tools.get_input import InputGenerator, get_dummy_input, get_cf_args
from chop.tools.utils import to_numpy_if_tensor

# 导入数据和模型工具
from chop.dataset import MaseDataModule
from chop.models import get_model_info, get_model

# 从 transformers 导入 HuggingFace 模型及预处理器
from transformers import AutoImageProcessor, AutoModelForImageClassification
from chop.tools import get_tokenized_dataset, get_trainer
import chop.passes as passes
from transformers import AutoModelForSequenceClassification
# 加载 IMDb 数据集，并获取对应的分词器
checkpoint = "prajjwal1/bert-tiny"  # 预训练模型的检查点
tokenizer_checkpoint = "bert-base-uncased"  # 分词器的检查点
dataset_name = "imdb"  # 数据集名称

dataset, tokenizer = get_tokenized_dataset(
    dataset=dataset_name,
    checkpoint=tokenizer_checkpoint,
    return_tokenizer=True,
)

# 加载预训练的 BERT 模型
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
model.config.problem_type = "single_label_classification"  # 设置问题类型为单标签分类

# 初始化 MaseGraph（模型量化和优化工具）
mg = MaseGraph(
    model,
    hf_input_names=["input_ids", "attention_mask", "labels"],  # 指定输入张量名称
)

# 初始化节点元数据
mg, _ = passes.init_metadata_analysis_pass(mg)
print("init_metadata_analysis_pass done.")

mg, _ = passes.add_common_metadata_analysis_pass(mg)
print("add_common_metadata_analysis_pass done.")

# 备份一份原始图，用于后续对比或恢复
mg_original = deepcopy_mase_graph(mg)
print("MaseGraph deep copy done.")

# 此时 mg 对象即为你从 HuggingFace 下载的 ResNet 模型转换而成的 MaseGraph


[32mINFO    [0m [34mTokenizing dataset imdb with AutoTokenizer for bert-base-uncased.[0m
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at prajjwal1/bert-tiny and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
`past_key_values` were not specified as input names, but model.config.use_cache = True. Setting model.config.use_cache = False.
[32mINFO    [0m [34mGetting dummy input for prajjwal1/bert-tiny.[0m


init_metadata_analysis_pass done.
tensor([[ 101, 9932, 2089, 2202, 2058, 1996, 2088, 2028, 2154,  102],
        [ 101, 2023, 2003, 2339, 2017, 2323, 4553, 4748, 4877,  102]])
tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
tensor([[ 101, 9932, 2089, 2202, 2058, 1996, 2088, 2028, 2154,  102],
        [ 101, 2023, 2003, 2339, 2017, 2323, 4553, 4748, 4877,  102]])
tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])
tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])
tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])
tensor([[[[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]],


        [[[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]]])
tensor([[[[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
          [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
          [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
          [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
          [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
          [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
          [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
          [1, 1

In [23]:
BER_INT8_BY_TYPE_TOML = "/workspace/ADLS_Proj/docs/tutorials/proj/bert_INT8_quant.toml"
BER_CHECKPOINT_PATH = "prajjwal1/bert-tiny"  # 使用 HuggingFace 上的模型标识
!python ch transform --config {BER_INT8_BY_TYPE_TOML} --load {BER_CHECKPOINT_PATH} --load-type hf


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


INFO: Seed set to 0
I0315 17:55:34.734239 140152862438464 seed.py:57] Seed set to 0
+-------------------------+--------------------------+---------------------+---------------------+--------------------------+
| Name                    |         Default          |    Config. File     |   Manual Override   |        Effective         |
+-------------------------+--------------------------+---------------------+---------------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |         cls         |                     |           cls            |
| load_name               |           [38;5;8mNone[0m           | [38;5;8mprajjwal1/bert-tiny[0m | prajjwal1/bert-tiny |   prajjwal1/bert-tiny    |
| load_type               |            [38;5;8mmz[0m            |         [38;5;8mhf[0m          |         hf          |            hf            |
| batch_size              |           [38;5;8m128[0m            |         32          |      

## Section 2. Resnet: INT8/FP16/FP32 Quantization Comparison

We will now load in a new toml configuration that uses fp16 instead of int8, whilst keeping the other settings the exact same for a fair comparison. This time however, we will use chop from the terminal which runs all the passes showcased in [Section 1](#section-1---int8-quantization).

Since float quantization does not require calibration, nor is it supported by `pytorch-quantization`, the model will not undergo fake quantization; for the time being this unfortunately means QAT is unavailable and only undergoes Post Training Quantization (PTQ). 

In [5]:
RES_INT8_BY_TYPE_TOML = "/workspace/ADLS_Proj/docs/tutorials/tensorrt/resnet18_INT8_quant.toml"
RES_CHECKPOINT_PATH = "microsoft/resnet-50"  # 使用 HuggingFace 上的模型标识
!python ch transform --config {RES_INT8_BY_TYPE_TOML} --load {RES_CHECKPOINT_PATH} --load-type hf


INFO: Seed set to 0
I0315 15:53:53.484075 140467152626752 seed.py:57] Seed set to 0
+-------------------------+--------------------------+--------------+---------------------+--------------------------+
| Name                    |         Default          | Config. File |   Manual Override   |        Effective         |
+-------------------------+--------------------------+--------------+---------------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |     cls      |                     |           cls            |
| load_name               |           [38;5;8mNone[0m           |              | microsoft/resnet-50 |   microsoft/resnet-50    |
| load_type               |            [38;5;8mmz[0m            |              |         hf          |            hf            |
| batch_size              |           [38;5;8m128[0m            |      64      |                     |            64            |
| to_debug                |    

In [None]:
RES_INT8_BY_TYPE_TOML = "/workspace/ADLS_Proj/docs/tutorials/tensorrt/resnet18_INT8_quant.toml"
RES_CHECKPOINT_PATH = "/workspace/ADLS_Proj/mase_output/resnet18_cls_cifar10_2025-03-08/software/training_ckpts/best.ckpt"
!python ch transform --config {RES_INT8_BY_TYPE_TOML} --load {RES_CHECKPOINT_PATH} --load-type pl

INFO: Seed set to 0
I0315 01:23:45.774668 139703610053696 seed.py:57] Seed set to 0
+-------------------------+--------------------------+--------------+--------------------------+--------------------------+
| Name                    |         Default          | Config. File |     Manual Override      |        Effective         |
+-------------------------+--------------------------+--------------+--------------------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |     cls      |                          |           cls            |
| load_name               |           [38;5;8mNone[0m           |              | /workspace/ADLS_Proj/mas | /workspace/ADLS_Proj/mas |
|                         |                          |              | e_output/resnet18_cls_ci | e_output/resnet18_cls_ci |
|                         |                          |              | far10_2025-03-08/softwar | far10_2025-03-08/softwar |
|                     

In [None]:
RES_FP16_BY_TYPE_TOML = "/workspace/ADLS_Proj/docs/tutorials/tensorrt/resnet18_FP16_quant.toml"
RES_CHECKPOINT_PATH = "/workspace/ADLS_Proj/mase_output/resnet18_cls_cifar10_2025-03-08/software/training_ckpts/best.ckpt"
!python ch transform --config {RES_FP16_BY_TYPE_TOML} --load {RES_CHECKPOINT_PATH} --load-type pl

INFO: Seed set to 0
I0315 01:36:29.907291 139755539158080 seed.py:57] Seed set to 0
+-------------------------+--------------------------+--------------+--------------------------+--------------------------+
| Name                    |         Default          | Config. File |     Manual Override      |        Effective         |
+-------------------------+--------------------------+--------------+--------------------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |     cls      |                          |           cls            |
| load_name               |           [38;5;8mNone[0m           |              | /workspace/ADLS_Proj/mas | /workspace/ADLS_Proj/mas |
|                         |                          |              | e_output/resnet18_cls_ci | e_output/resnet18_cls_ci |
|                         |                          |              | far10_2025-03-08/softwar | far10_2025-03-08/softwar |
|                     

In [None]:
RES_FP32_BY_TYPE_TOML = "/workspace/ADLS_Proj/docs/tutorials/tensorrt/resnet18_FP32_quant.toml"
RES_CHECKPOINT_PATH = "/workspace/ADLS_Proj/mase_output/resnet18_cls_cifar10_2025-03-08/software/training_ckpts/best.ckpt"
!python ch transform --config {RES_FP32_BY_TYPE_TOML} --load {RES_CHECKPOINT_PATH} --load-type pl

INFO: Seed set to 0
I0315 01:43:15.535419 140335826768960 seed.py:57] Seed set to 0
+-------------------------+--------------------------+--------------+--------------------------+--------------------------+
| Name                    |         Default          | Config. File |     Manual Override      |        Effective         |
+-------------------------+--------------------------+--------------+--------------------------+--------------------------+
| task                    |      [38;5;8mclassification[0m      |     cls      |                          |           cls            |
| load_name               |           [38;5;8mNone[0m           |              | /workspace/ADLS_Proj/mas | /workspace/ADLS_Proj/mas |
|                         |                          |              | e_output/resnet18_cls_ci | e_output/resnet18_cls_ci |
|                         |                          |              | far10_2025-03-08/softwar | far10_2025-03-08/softwar |
|                     

As you can see, `fp16` acheives a slighty higher test accuracy but a slightly lower latency (~30%) from that of int8 quantization; it is still ~2.5x faster than the unquantized model. Now lets apply quantization to a more complicated model.
