# Experiment 1 Per-Layer Sensitivity

The idea of this experienment Quantize only one block at a time, keep others FP32

This experiment investigates the layer-wise sensitivity of a deep CNN model to quantization(PTQ, QAT). The goal is to understand how quantizing individual layers affects overall model accuracy, and to identify which layers are more robust or more sensitive to quantization-induced degradation. 

To achieve this, a modified, modularized variant of original deep CNN was implemented that allows selective quantization of individual convolutional blocks (L1–L4). For each run, only one block was quantized at a time while the rest remained in FP32. `qconfig` was applied only to the target block and `quant/dequant` stubs to isolate the effect.

## 1. PTQ 

In [1]:
import torch
import yaml
from src.utils import *
from src.data_loader import get_data_loaders
from src.models.cnn_model_LayerWiseQuant import M5Modular, PTQM5Modular, PTQM5_LayerWiseQuant
from src.evaluate import test
fp_dict = torch.load("../models/cnn_fp32_model.pth")
ptq_dict = torch.load("../models/cnn_ptq_model.pth")
LWQ_dict_dicts = {
    1: torch.load("../models/cnn_ptq_LayerWiseQuant_q1_model.pth"),
    2: torch.load("../models/cnn_ptq_LayerWiseQuant_q2_model.pth"),
    3: torch.load("../models/cnn_ptq_LayerWiseQuant_q3_model.pth"),
    4: torch.load("../models/cnn_ptq_LayerWiseQuant_q4_model.pth"),
}

data_config = {
    "raw_dir": "../data/raw",
    "processed_dir": "./data/processed",
    "sample_rate": 8000,
    "batch_size": 256,
    "version": "v0.1"
}
train_loader, test_loader, _ = get_data_loaders(data_config)

  device=storage.device,


In [2]:
# Load FP model
config_fp = '../configs/cnn_fp32.yaml'
with open(config_fp, 'r') as f:
    config = yaml.safe_load(f)
    
params_fp = config["model"]["base_cnn"]
model_fp = M5Modular(
        n_input=params_fp["n_input"],
        n_output=params_fp["n_output"],
        stride=params_fp["stride"],
        n_channel=params_fp["n_channel"],
        conv_kernel_sizes=params_fp["conv_kernel_sizes"]
        )
model_fp.load_state_dict(fp_dict)
model_fp.to('cpu')

# evaluate FP model
acc_fp = test(model_fp, test_loader)
print(f"FP32 model accuracy: {acc_fp:.4f}")

FP32 model accuracy: 83.0713


In [4]:
# Load fully quantized PTQ model
# Load PTQ model
config_PTQ = '../configs/cnn_ptq.yaml'
with open(config_PTQ, 'r') as f:
    config = yaml.safe_load(f)
    
params_PTQ = config["model"]["base_cnn"]
model_PTQ = PTQM5Modular(
            n_input=params_PTQ["n_input"],
            n_output=params_PTQ["n_output"],
            stride=params_PTQ["stride"],
            n_channel=params_PTQ["n_channel"],
            conv_kernel_sizes=params_PTQ["conv_kernel_sizes"]
        )
# Fuse and prepare for quantization
model_PTQ.eval()
model_PTQ.fuse_model()
model_PTQ.qconfig = torch.ao.quantization.get_default_qat_qconfig('x86')

model_PTQ.train()
torch.ao.quantization.prepare_qat(model_PTQ, inplace=True)

# Convert to quantized model
model_PTQ.eval()
model_PTQ = torch.ao.quantization.convert(model_PTQ, inplace=False)

# Load checkpoint
model_PTQ.load_state_dict(ptq_dict)
model_PTQ.to('cpu')

# evaluate PTQ model
acc_PTQ = test(model_PTQ, test_loader)
print(f"PTQ model accuracy: {acc_PTQ:.4f}")

PTQ model accuracy: 75.8473


In [6]:

config_LWQ = '../configs/cnn_ptq_LayerWiseQuant.yaml'
with open(config_LWQ, 'r') as f:
    config = yaml.safe_load(f)

# for i in range(1, 2):
for i in config["model"]["quantization"]:
    model_LWQ = PTQM5_LayerWiseQuant(
        quantized_block_idx = i,
        n_input=config["model"]["base_cnn"]["n_input"],
        n_output=config["model"]["base_cnn"]["n_output"],
        stride=config["model"]["base_cnn"]["stride"],
        n_channel=config["model"]["base_cnn"]["n_channel"],
        conv_kernel_sizes=config["model"]["base_cnn"]["conv_kernel_sizes"],
    )

    # Fuse and prepare for quantization
    model_LWQ.eval()
    # print(f"Layer-Wise Quantized Model before fuse: {model_LWQ}")
    model_LWQ.fuse_model()
    # print(f"Layer-Wise Quantized Model after fuse, before Layer {i} quantized: {model_LWQ}")

    qconfig = torch.ao.quantization.get_default_qconfig('x86')
    model_LWQ.set_qconfig_for_layerwise(qconfig)
    torch.ao.quantization.prepare(model_LWQ, inplace=True)

    # Convert to quantized model
    # model_LWQ.eval()
    model_LWQ = torch.ao.quantization.convert(model_LWQ, inplace=False)
    # print(f"Layer-Wise Quantized Model Layer {i} quantized : {model_LWQ}")
    # # Load checkpoint
    model_LWQ.load_state_dict(LWQ_dict_dicts[i])

    # evaluate single layer quantized model
    acc_LWQ = test(model_LWQ, test_loader)
    print(f"Layer-Wise Quantized Model (Layer {i} quantized) accuracy: {acc_LWQ:.4f}")


Layer-Wise Quantized Model (Layer 1 quantized) accuracy: 76.8287
Layer-Wise Quantized Model (Layer 2 quantized) accuracy: 80.8905
Layer-Wise Quantized Model (Layer 3 quantized) accuracy: 81.8174
Layer-Wise Quantized Model (Layer 4 quantized) accuracy: 82.9714


|Model|Acc|Accuracy Drop (vs. FP32)|
|---|---|---|
|FP32|83.0713|-0.00%|
|PTQ (L4 Quantized)|82.9714|-0.10%|
|PTQ (L3 Quantized)|81.8174|-1.25%|
|PTQ (L2 Quantized)|80.8905|-2.18%|
|PTQ (L1 Quantized)|76.8287|-6.24%|
|PTQ (Fully Quantized)|75.8473|-7.22%|


> Accuracy Drop=FP32 Accuracy−Quantized Model Accuracy

Insights:

1. Early layers (especially L1) are highly sensitive to quantization and significantly degrade accuracy when quantized. Later layers (L3, L4) are more robust. Early layers handle raw features, which are more sensitive to quantization noise. Later layers operate on higher-level representations and are more robust to quantization.

2. Compared with L1-only quantization, fully quantized model reduces accuracy further, indicating accumulated quantization noise.

3. Layer-wise PTQ provides insight into per-layer sensitivity, which can guide efficient mixed-precision or hybrid quantization strategies, e.g. If we choose mixed-precision, keep front layers in FP32 and quantize later layers; If applying QAT, prioritize front layers, etc.

In [15]:
x = (83.0713 - 82.9714) 
print(f"PTQ model accuracy drop L4: {x}")
x = (83.0713 - 81.8174) 
print(f"PTQ model accuracy drop L3: {x}")
x = (83.0713 - 80.8905) 
print(f"PTQ model accuracy drop L2: {x}")
x = (83.0713 - 76.8287) 
print(f"PTQ model accuracy drop L1: {x}")

PTQ model accuracy drop L4: 0.099899999999991
PTQ model accuracy drop L3: 1.2538999999999874
PTQ model accuracy drop L2: 2.1807999999999907
PTQ model accuracy drop L1: 6.242599999999996


## 2. QAT