# Lab04 Task 2-2: Manual Post-Training Static Quantization

In this notebook, we will try to manually quantize the pretrained model.

<font color="red">**Only add or modify code between `YOUR CODE START` and `YOUR CODE END`. Don’t change anything outside of these markers.**</font>

In [1]:
##### YOUR CODE START #####

# Please fill in your student id here.
student_id = "313510156"

##### YOUR CODE END #####

### Library Import

The libraries you need for this practice are listed below. You can add more if you think they’re necessary. If you’re not sure whether a library is allowed, ask TA in the FB group.

In [2]:
import os
import tqdm
import torch
from torch import nn
from torch.utils.data import DataLoader
import torchvision
from torchvision import datasets, transforms, models
import copy
from resnet20_int8 import (
    QuantizedTensor,
    QuantizedCifarResNet,
    QuantizeLayer,
    QuantizedConv2d,
    QuantizedConvReLU2d,
    QuantizedReLU,
    QuantizedLinear,
    QuantizedAdaptiveAvgPool2d,
    QuantizedAdd,
    QuantizedFlatten,
)
import matplotlib

##### YOUR CODE START #####

# Do you need any additional libraries? If not, you can leave this block empty.
# For this task, you must attempt to manually quantize the model. Therefore, using any libraries that perform automatic quantization or calculate scale/zero-point values is prohibited.

##### YOUR CODE END #####

### Device

If you have GPU available, you should see "cuda" in the following cell.

In [3]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using device: %s" % device)

Using device: cuda


### Dataset

In this lab, we will use CIFAR-10 dataset. CIFAR-10 is a widely used image classification dataset consisting of 60,000 color images at 32×32 resolution. It has 10 classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck), with 50,000 training images and 10,000 test images. Due to its small size and balanced categories, CIFAR-10 is commonly used for benchmarking machine learning and computer vision models.

CIFAR-10 has both a training set and a test set. Post-training static quantization requires a small subset of the training set for calibration. On the other hand, manually quantizing the convolutional layer weights is a data-free process. The test set is only used at the end to evaluate the final result.

In [4]:
# Load training & test set

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), 
                         (0.2023, 0.1994, 0.2010))
])

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=128,
                                         shuffle=False)
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                       download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=128,
                                         shuffle=False)

### Load Model

In this lab, you do not need to train a model from scratch. We will use a pretrained ResNet20 model instead. ResNet20 is a popular deep learning model for image classification. Its key feature is the use of skip (residual) connections, which make training deep networks easier and more stable.
The code below loads the pre-trained model and evaluates its accuracy on the test set, which should be <font color="red">**92.60%**</font>. Please use this model for the subsequent tasks. <font color="red">**Designing and training your own model is not allowed.**</font>

In [5]:
model = torch.hub.load('chenyaofo/pytorch-cifar-models', 
                       'cifar10_resnet20', pretrained=True).to(device)
model.eval()

Using cache found in /home/bschen/.cache/torch/hub/chenyaofo_pytorch-cifar-models_master


CifarResNet(
  (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias

In [6]:
def test_acc(model_test):
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in testloader:
            images = images.to(device, non_blocking=True)
            labels = labels.to(device, non_blocking=True)
            outputs = model_test(images)
            _, predicted = outputs.max(1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    acc = 100 * correct / total
    print(f'Accuracy on CIFAR-10 test set: {acc:.2f}%')
    return acc
    
_ = test_acc(model)

Accuracy on CIFAR-10 test set: 92.59%


### Manually Quantizing the Model

While PyTorch's FX graph mode is very convenient, it abstracts away the details of how quantized weights are actually calculated. Therefore, in this section, you will attempt to manually quantize the model.

To pass this lab, the test accuracy of your manually quantized model must be higher than <font color="red">**90.00%.**</font>

<font color="red">**Please be aware of the following rules. Violating them will result in a score of zero for this section:**</font>

1. Your modifications to the model are strictly limited to populating the parameters of the `QuantizedCifarResNet` model. Any other operations, including but not limited to retraining, or changing the model architecture, are forbidden.

2. You must explicitly show your calculation process. The use of any functions that automatically compute scale / zero_point or gather statistics is prohibited. (The pre-defined observer in the previous task is prohibited, but it is allowed to use `torch.max` and `torch.min`, or define an observer on your own.) Also, you must not directly assign numerical values without demonstrating how they were derived.

### Introduction to QuantizedCifarResNet

`QuantizedCifarResNet` is a modified version of the standard `CifarResNet` architecture, specifically adapted for **integer-only inference**. Unlike the original `CifarResNet` which performs computations using 32-bit floating-point (FP32) numbers, this quantized version primarily uses 8-bit integer arithmetic (`int8` for weights, `uint8` for activations) for most of its operations. This significantly reduces model size and can lead to faster inference speeds on hardware with specialized integer instruction support.

The key differences arise from replacing standard PyTorch layers (`nn.Conv2d`, `nn.ReLU`, `nn.Linear`, etc.) with custom-defined quantized layer equivalents. These custom layers require specific **quantization parameters** (scale and zero-point) to map the integer values back to the approximate floating-point range, ensuring the model maintains reasonable accuracy. Data between these layers is passed using a `QuantizedTensor` wrapper object, which bundles the `uint8` tensor data with its corresponding `scale` and `zero_point`.

Here's a breakdown of the custom quantized layers used in this implementation and the parameters they typically require **after initialization** (usually set via methods like `set_..._params` or `set_..._weight` after calibration):

1.  **`QuantizeLayer`**:
    * **Role**: The entry point, converts the initial `float32` input tensor into a `QuantizedTensor` (`uint8`).
    * **Parameters Needed**:
        * `output_scale (float)`: The scale factor for the output activation.
        * `output_zero_point (int)`: The zero-point for the output activation.

2.  **`QuantizedConv2d`** (Used for `conv2` in BasicBlock and `downsample`):
    * **Role**: Performs 2D convolution using `int8` weights and `uint8` activations, producing a `uint8` output. Uses `fp32` bias.
    * **Parameters Needed**:
        * `weight_int8 (torch.Tensor[int8])`: The quantized weights (typically obtained after fusing BatchNorm).
        * `weight_scale (torch.Tensor[float32])`: The **per-channel** scale factor for the weights.
        * `weight_zero_point (torch.Tensor[int32])`: The **per-channel** zero-point for the weights.
        * `bias_fp32 (torch.Tensor[float32])`: The fused `float32` bias term.
        * `output_scale (float)`: The scale factor for the output activation.
        * `output_zero_point (int)`: The zero-point for the output activation.

3.  **`QuantizedConvReLU2d`** (Used for `conv1` in BasicBlock and the first `conv1` of the network):
    * **Role**: Fuses `Conv2d` and `ReLU` operations. Similar to `QuantizedConv2d` but applies ReLU before the final requantization step. Uses `fp32` bias. **The output zero-point is implicitly 0 due to ReLU.**
    * **Parameters Needed**:
        * `weight_int8 (torch.Tensor[int8])`
        * `weight_scale (torch.Tensor[float32])` (**Per-channel**)
        * `weight_zero_point (torch.Tensor[int32])` (**Per-channel**)
        * `bias_fp32 (torch.Tensor[float32])`
        * `output_scale (float)` (Output zero-point is fixed to 0 internally).

4.  **`QuantizedReLU`**:
    * **Role**: Applies ReLU activation directly on the `uint8` tensor by clamping values below the input `zero_point`.
    * **Parameters Needed**: None (It's stateless and uses the parameters from the input `QuantizedTensor`).

5.  **`QuantizedAdd`**:
    * **Role**: Performs element-wise addition of two `QuantizedTensor` inputs (requiring dequantization, float addition, and requantization). Used for residual connections.
    * **Parameters Needed**:
        * `output_scale (float)`: The scale factor for the resulting summed activation.
        * `output_zero_point (int)`: The zero-point for the resulting summed activation.

6.  **`QuantizedAdaptiveAvgPool2d`**:
    * **Role**: Performs adaptive average pooling on the `uint8` tensor.
    * **Parameters Needed**: None (Stateless, passes through the input scale/zero-point after integer averaging).

7.  **`QuantizedFlatten`**:
    * **Role**: Flattens the `uint8` tensor while preserving scale/zero-point.
    * **Parameters Needed**: None (Stateless).

8.  **`QuantizedLinear`** (Used as the final `fc` layer):
    * **Role**: Performs a linear transformation using `int8` weights and `uint8` input, producing a `float32` output (common for the final classification layer). Uses `fp32` bias.
    * **Parameters Needed**:
        * `weight_int8 (torch.Tensor[int8])`
        * `weight_scale (torch.Tensor[float32])` (**Per-channel/output feature**)
        * `weight_zero_point (torch.Tensor[int32])` (**Per-channel/output feature**)
        * `bias_fp32 (torch.Tensor[float32])`

You can find more details of `QuantizedCifarResNet` in `resnet20_int8.py`

In [7]:
model_manual = QuantizedCifarResNet().to(device)
model_manual.eval()

QuantizedCifarResNet(
  (quant): QuantizeLayer(output_scale=1.000000, output_zero_point=0)
  (conv1): QuantizedConvReLU2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1, bias=True)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): QuantizedConvReLU2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1, bias=True)
      (conv2): QuantizedConv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1, bias=True)
      (relu): QuantizedReLU(Quantized ReLU (uint8 clamp at zero_point))
      (add): QuantizedAdd()
    )
    (1): BasicBlock(
      (conv1): QuantizedConvReLU2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1, bias=True)
      (conv2): QuantizedConv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=1, bias=True)
      (relu): QuantizedReLU(Quantized ReLU (uint8 clamp at zero_point))
      (add): QuantizedAdd()
    )
    (2): BasicBlock(
      (conv1): QuantizedConvReLU2d(16, 16, kerne

#### 1. Prepare the Model

Let's manually quantize the model step-by-step. We'll start with the preparation phase. Although PyTorch's FX graph mode offers `prepare_fx` to automatically insert observers and fuse layers (e.g., `Conv2d`, `BatchNorm2d`, `ReLU`), we need to do this manually here. So, we'll define our own observer now and insert it into the FP32 model. Layer fusion mainly concerns weight recalculation, so we'll handle that later in step three. Below is an example of defining a min/max observer and inserting it into the initial `bn1` layer.

In [8]:
##### YOUR CODE START #####

# Define the Observer class (can be used as a hook)
class Observer:
    def __init__(self, obs_skip_connect: bool=False):
        # Initialize statistics
        self.min_val = float('inf')
        self.max_val = float('-inf')

        # For skip connection
        self.obs_skip_connect = obs_skip_connect
        self.obs_skip_connect_counter = 0

    def __call__(self, module: nn.Module, inputs: tuple[torch.Tensor], output: torch.Tensor):
        """
        Hook function executed after the module's forward pass.
        """
        if not self.obs_skip_connect:
            batch_min = output.detach().min().item()
            batch_max = output.detach().max().item()
        else:
            # For skip connections, observe the input tensors instead.
            batch_min = inputs[0].detach().min().item()
            batch_max = inputs[0].detach().max().item()

        # Update overall min/max seen so far.
        if not self.obs_skip_connect or self.obs_skip_connect_counter % 2 == 1:
            self.min_val = min(self.min_val, batch_min)
            self.max_val = max(self.max_val, batch_max)

        # Counter for reuse ReLU module
        if self.obs_skip_connect:
            self.obs_skip_connect_counter = (self.obs_skip_connect_counter + 1) % 2

    def get_min_max(self) -> tuple[float, float]:
        # Returns the overall observed min and max values.
        return self.min_val, self.max_val

    def reset(self):
        # Resets the observed min/max values.
        self.min_val = float('inf')
        self.max_val = float('-inf')


# Create deepcopy of model
model_prepared = copy.deepcopy(model)

# Dictionary to hold observers for later inspection
observers, removers = {}, {}

# Input quantization layer
observers["quant"] = Observer()
removers["quant"] = None

# Register observers to layers
for name, module in model_prepared.named_modules():
    if isinstance(module, (nn.Linear, nn.BatchNorm2d)):
        obs = Observer()
        observers[name] = obs
        removers[name] = module.register_forward_hook(obs)
    elif isinstance(module, nn.ReLU) and "layer" in name:
        # For ReLU, observe the input to capture pre-ReLU activations
        obs = Observer(obs_skip_connect=True)
        observers[name.replace("relu", "add")] = obs
        removers[name.replace("relu", "add")] = module.register_forward_hook(obs)

print(f"Registered {len(observers)} observers:")
for k in observers.keys():
    print(f"  - {k}")

##### YOUR CODE END #####

Registered 32 observers:
  - quant
  - bn1
  - layer1.0.bn1
  - layer1.0.add
  - layer1.0.bn2
  - layer1.1.bn1
  - layer1.1.add
  - layer1.1.bn2
  - layer1.2.bn1
  - layer1.2.add
  - layer1.2.bn2
  - layer2.0.bn1
  - layer2.0.add
  - layer2.0.bn2
  - layer2.0.downsample.1
  - layer2.1.bn1
  - layer2.1.add
  - layer2.1.bn2
  - layer2.2.bn1
  - layer2.2.add
  - layer2.2.bn2
  - layer3.0.bn1
  - layer3.0.add
  - layer3.0.bn2
  - layer3.0.downsample.1
  - layer3.1.bn1
  - layer3.1.add
  - layer3.1.bn2
  - layer3.2.bn1
  - layer3.2.add
  - layer3.2.bn2
  - fc


#### 2. Calibrate the Model

Next, we need to calibrate the model. This step is quite similar to the process when using PyTorch's FX graph mode. It simply involves feeding data from the training set (or a representative subset of it) through the model we prepared earlier (the one with observers attached). Please note: <font color="red">**it is crucial not to use the test set data for calibration.**</font>

In [9]:
##### YOUR CODE START #####

# Prepare training data for calibration
train_size = 10000
train_subset = torch.utils.data.Subset(trainset, range(train_size))
trainloader = torch.utils.data.DataLoader(train_subset, batch_size=128, shuffle=False)

# Calibrate with training data
model_prepared.eval()
with torch.no_grad():
    for images, _ in trainloader:
        images = images.to(device)
        observers["quant"](None, None, images)
        model_prepared(images)

##### YOUR CODE END #####

#### 3. Convert the Model

Finally, we need to populate the `QuantizedCifarResNet` with its parameters. You will need to iterate through all layers in the quantized model and set the required parameters (such as quantized weights, scales, zero-points, and biases) based on their type. The necessary data should be obtained from the original model and the observers inserted previously. Additionally, note that consecutive `Conv2d`, `BatchNorm2d`, and `ReLU` layers in the original model have been fused into corresponding single layers in `QuantizedCifarResNet`. You must adjust the `Conv2d` weights and biases according to the parameters of the corresponding `BatchNorm2d` layer.

In [10]:
##### YOUR CODE START #####

def fuse_conv_bn(
    conv: nn.Conv2d,
    bn: nn.BatchNorm2d
    ) -> tuple[torch.Tensor, torch.Tensor]:
    """Fuse convolution and batch normalization parameters."""
    w = conv.weight.detach().cpu()
    if conv.bias is None:
        b = torch.zeros(w.size(0))
    else:
        b = conv.bias.detach().cpu()
    gamma = bn.weight.detach().cpu()
    beta = bn.bias.detach().cpu()
    mu = bn.running_mean.detach().cpu()
    var = bn.running_var.detach().cpu()
    eps = bn.eps

    std = torch.sqrt(var + eps)
    scale = (gamma / std).reshape(-1, 1, 1, 1)
    w_fused = w * scale
    b_fused = beta + (gamma / std) * (b - mu)
    return w_fused, b_fused

def calc_quant_params(
    min_val: float,
    max_val: float,
    num_bits: int=8,
    symmetric: bool=False,
    unsigned: bool=False
    ) -> tuple[float, int]:
    """Calculate scale and zero-point for quantization."""
    if symmetric:
        max_abs = max(abs(min_val), abs(max_val))
        scale = max_abs / (2 ** (num_bits - 1) - 1)
        zero_point = 0
    else:
        if unsigned:
            qmin, qmax = 0, 2**num_bits - 1
        else:
            qmin, qmax = -1 * 2**(num_bits - 1), 2**(num_bits - 1) - 1
        scale = (max_val - min_val) / float(qmax - qmin) * 0.99
        zero_point = int(round(qmin - min_val / scale))
        zero_point = torch.clamp(torch.tensor(zero_point), qmin, qmax).item()
    return scale, zero_point

# Fill in quantized model parameters
for i, (name, module) in enumerate(model_manual.named_modules()):
    print(f"Quantizing module {i+1:2d}/{len(list(model_manual.named_modules()))}: {name}")

    # QuantizeLayer (input/output quantization node)
    if isinstance(module, QuantizeLayer):
        min_val, max_val = observers["quant"].get_min_max()
        out_scale, out_zero_point = calc_quant_params(min_val, max_val, num_bits=8, unsigned=True)
        module.set_output_quant_params(out_scale, out_zero_point)

    # QuantizedConv2d / QuantizedConvReLU2d
    elif isinstance(module, (QuantizedConv2d, QuantizedConvReLU2d)):
        bname = name.replace("conv", "bn") if "conv" in name else name[:-1] + "1"

        # Weight quantization (per-output-channel)
        with torch.no_grad():
            # Fuse conv and bn parameters
            w, bias_fp32 = fuse_conv_bn(
                dict(model.named_modules())[name],  # Convolution layer
                dict(model.named_modules())[bname]  # BatchNorm layer
            )

            # (channel_out, channel_in, kernel_h, kernel_w) -> (channel_out, -1) -> (channel_out,)
            w_min = torch.min(w.view(w.size(0), -1), dim=1).values
            w_max = torch.max(w.view(w.size(0), -1), dim=1).values

            # Quantization
            weight_scale = torch.zeros((w.size(0),), dtype=torch.float32)
            weight_zero_point = torch.zeros((w.size(0),), dtype=torch.int32)
            weight_int8 = torch.zeros_like(w, dtype=torch.int8)
            for i in range(w.size(0)):
                weight_scale[i], weight_zero_point[i] = calc_quant_params(
                    w_min[i].item(), w_max[i].item()
                )
                weight_int8[i] = torch.clamp((
                    w[i] / weight_scale[i] + weight_zero_point[i]
                ).round(), -128, 127).to(torch.int8)

        module.set_weight_quant_params(weight_scale, weight_zero_point)
        module.set_int8_weight(weight_int8)
        module.set_fp32_bias(bias_fp32)

        # Output quantization
        min_val, max_val = observers[bname].get_min_max()
        if isinstance(module, QuantizedConvReLU2d):
            min_val = max(0.0, min_val)  # ReLU clamps negative values to zero
        out_scale, out_zero_point = calc_quant_params(min_val, max_val, num_bits=8, unsigned=True)

        if isinstance(module, QuantizedConvReLU2d):
            module.set_output_quant_params(out_scale)
        else:
            module.set_output_quant_params(out_scale, out_zero_point)

    # QuantizedAdd (for skip connections)
    elif isinstance(module, QuantizedAdd):
        min_val, max_val = observers[name].get_min_max()
        output_scale, output_zero_point = calc_quant_params(min_val, max_val, num_bits=8, unsigned=True)
        module.set_output_quant_params(output_scale, output_zero_point)

    # QuantizedLinear
    elif isinstance(module, QuantizedLinear):
        # Weight quantization (symmetric, per-tensor)
        with torch.no_grad():
            fc = dict(model.named_modules())[name]
            w = fc.weight.detach().cpu()
            bias_fp32 = fc.bias.detach().cpu()

            # (channel_out, channel_in) -> (channel_out,)
            w_min = torch.min(w, dim=1).values
            w_max = torch.max(w, dim=1).values

            # Quantization
            weight_scale = torch.zeros((w.size(0),), dtype=torch.float32)
            weight_zero_point = torch.zeros((w.size(0),), dtype=torch.int32)
            weight_int8 = torch.zeros_like(w, dtype=torch.int8)
            for i in range(w.size(0)):
                weight_scale[i], weight_zero_point[i] = calc_quant_params(
                    w_min[i].item(), w_max[i].item()
                )
                weight_int8[i] = torch.clamp((
                    w[i] / weight_scale[i] + weight_zero_point[i]
                ).round(), -128, 127).to(torch.int8)

        module.set_weight_quant_params(weight_scale, weight_zero_point)
        module.set_int8_weight(weight_int8)
        module.set_fp32_bias(bias_fp32)

##### YOUR CODE END #####

Quantizing module  1/58: 
Quantizing module  2/58: quant
Quantizing module  3/58: conv1
Quantizing module  4/58: layer1
Quantizing module  5/58: layer1.0
Quantizing module  6/58: layer1.0.conv1
Quantizing module  7/58: layer1.0.conv2
Quantizing module  8/58: layer1.0.relu
Quantizing module  9/58: layer1.0.add
Quantizing module 10/58: layer1.1
Quantizing module 11/58: layer1.1.conv1
Quantizing module 12/58: layer1.1.conv2
Quantizing module 13/58: layer1.1.relu
Quantizing module 14/58: layer1.1.add
Quantizing module 15/58: layer1.2
Quantizing module 16/58: layer1.2.conv1
Quantizing module 17/58: layer1.2.conv2
Quantizing module 18/58: layer1.2.relu
Quantizing module 19/58: layer1.2.add
Quantizing module 20/58: layer2
Quantizing module 21/58: layer2.0
Quantizing module 22/58: layer2.0.conv1
Quantizing module 23/58: layer2.0.conv2
Quantizing module 24/58: layer2.0.relu
Quantizing module 25/58: layer2.0.downsample
Quantizing module 26/58: layer2.0.downsample.0
Quantizing module 27/58: layer

In [11]:
# Let's see the result.
acc = test_acc(model_manual)

print("\n===========================================\n")

if acc < 90.0:
    print("Oh no! Your test accuracy is too low!")
else:
    print("Congratulations! You've achieved the goal of this task. Remember to save your model!")
    print("You can also try increasing accuracy further to earn a higher score!")

Accuracy on CIFAR-10 test set: 92.70%


Congratulations! You've achieved the goal of this task. Remember to save your model!
You can also try increasing accuracy further to earn a higher score!


### Save Model

You can use the code below to save your model as `[student_id]_quantization.pt`, where `[student_id]` is replaced by your student ID in the first cell of this notebook.

In [12]:
file_name = student_id + "_quantization.pt"
# Save model.state_dict() instead of the entire model.
torch.save(model_manual.state_dict(), file_name)
print("Your model is saved to \"" + file_name + "\".")

Your model is saved to "313510156_quantization.pt".


### Final Check

TA has provided check_quantization.py for students to check if their models can pass the tests. <font color="red">**Please make sure to check it before submission.**</font>

In [13]:
!python check_quantization.py --path {file_name}

Congratulations! You've achieved the goals of this task.
Your model's test accuracy is 92.70%.
