<a href="https://colab.research.google.com/github/FRJackson/hw/blob/main/Problem5_Computations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Problem 5: Model parameter computations

In this problem we will refer to some of the architecture the previous problems were based on.

## MNIST and synthetic example's MLPs

(1) Derive **by hand** the number of parameters from the architectures used in the MNIST and synthetic example's MLPs.

  * MNIST MLP is a feedforward neural network with one hidden layer of width 20 for MNIST classification. Use ReLU in the hidden layer and an output layer of size 10. Note that the images are 28 by 28 grayscale.
  * Synthetic example's MLP is given as below
  ```{python}
    # Synthetic example's MLP
    class SmallMLP(nn.Module):
        def __init__(self):
            super(SmallMLP, self).__init__()
            self.layers = nn.Sequential(
                nn.Linear(1, 8),
                nn.ReLU(),
                nn.Linear(8, 8),
                nn.ReLU(),
                nn.Linear(8, 1)
            )

        def forward(self, x):
            return self.layers(x)
  ```







**Your answer**:

(2) Check your answer by using `model.parameters` to compute the total number of parameters of both models.  

In [None]:
##### YOUR CODE HERE #####
# 导入 PyTorch 的神经网络模块
import torch
import torch.nn as nn

# 定义 MNIST 用的一层隐藏层 MLP：784 -> 20 -> 10
mnist_mlp = nn.Sequential(                 # nn.Sequential 顺序地把若干层串起来
    nn.Linear(784, 20),                    # 线性层：输入维度 784（28*28 像素），输出维度 20（隐藏宽度）
    nn.ReLU(),                             # ReLU 激活层（不含可训练参数）
    nn.Linear(20, 10)                      # 线性层：输入 20，输出 10（10 个类别）
)

# 按题面给出的合成样例 SmallMLP：1 -> 8 -> 8 -> 1
class SmallMLP(nn.Module):                  # 自定义一个 nn.Module 子类
    def __init__(self):                     # 构造函数：定义网络结构
        super(SmallMLP, self).__init__()    # 调用父类构造，完成基础初始化
        self.layers = nn.Sequential(        # 把若干层打包为一个顺序容器
            nn.Linear(1, 8),                # 线性层：输入 1，输出 8
            nn.ReLU(),                      # ReLU 激活（无参数）
            nn.Linear(8, 8),                # 线性层：输入 8，输出 8
            nn.ReLU(),                      # ReLU 激活（无参数）
            nn.Linear(8, 1)                 # 线性层：输入 8，输出 1
        )

    def forward(self, x):                    # 前向传播：定义数据如何流过各层
        return self.layers(x)                # 依次通过顺序容器中的各层并返回结果

small_mlp = SmallMLP()                       # 实例化合成样例模型

# 一个通用的小工具函数：统计并打印模型的总参数量与逐层形状
def count_params(model):                     # 定义函数：输入是任意 nn.Module
    total = sum(p.numel() for p in model.parameters())  # numel() 统计张量中元素总数；对所有参数求和
    print(model.__class__.__name__, 'total params =', total)  # 打印模型名和总参数量
    for name, p in model.named_parameters():             # 遍历具名参数，便于核对每一层的形状
        print(f'  {name:30s} shape={tuple(p.shape)} numel={p.numel()}')
    return total                                         # 返回总参数量，便于需要时进一步使用

# 分别统计两个模型的参数量（应得到：MNIST MLP=15910，SmallMLP=97）
mnist_total = count_params(mnist_mlp)         # 统计 MNIST MLP 的参数
small_total = count_params(small_mlp)         # 统计 SmallMLP 的参数

# 也可以做一个简单的断言，和手算结果核对，若不一致会抛出 AssertionError
assert mnist_total == 15910, f'期望 15910，得到 {mnist_total}'
assert small_total == 97,    f'期望 97，得到 {small_total}'
print('same')               # 若通过断言，打印确认信息


Sequential total params = 15910
  0.weight                       shape=(20, 784) numel=15680
  0.bias                         shape=(20,) numel=20
  2.weight                       shape=(10, 20) numel=200
  2.bias                         shape=(10,) numel=10
SmallMLP total params = 97
  layers.0.weight                shape=(8, 1) numel=8
  layers.0.bias                  shape=(8,) numel=8
  layers.2.weight                shape=(8, 8) numel=64
  layers.2.bias                  shape=(8,) numel=8
  layers.4.weight                shape=(1, 8) numel=8
  layers.4.bias                  shape=(1,) numel=1
same


## CNN and MobileNetV2

(3) We are now refering to the models used in Problem 4. The CNN given in the problem is replicated.
```{python}

original_CNN = nn.Sequential(
    nn.Conv2d(3, 32, kernel_size=3, padding=1),  # 32 filters, 3x3 kernel, input_shape=IMG_SHAPE
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Conv2d(32, 64, kernel_size=3, padding=1),  # 64 filters, 3x3 kernel
    nn.ReLU(),
    nn.MaxPool2d(kernel_size=2, stride=2),
    nn.Conv2d(64, 64, kernel_size=3, padding=1),  # 64 filters, 3x3 kernel
    nn.ReLU(),
    nn.Flatten(),
    nn.Linear(64 * (IMG_SHAPE[0] // 4) * (IMAGE_SHAPE[0] // 4), 128),  # Adjust input size based on image shape
    nn.ReLU(),
    nn.Dropout(0.2),
    nn.Linear(128, 102),  # Adjust output size based on the number of classes (102),
)
```
 * Change the number of filters to half of the original amount in each Conv2d layer as well as the size of images (224x224 to 8x8) and assume there is only 50 classes. Assume that the image is still RGB with 3 channels.
 * Compute the number of parameters **by hand** of the first Conv layer, ReLU and MaxPool layer. Make sure to break down your calculations in terms of the number of filters, weights per filters (note that default layers include `bias=True`). What does ReLU and MaxPool do in terms of model parameter counts?
 * Use `model.parameters` to access the total number of parameters of the CNN and the number of parameters of the first block of Conv2d+ReLU+MaxPool2d.

**Your answer**:

In [None]:
##### YOUR CODE HERE #####
# ====== Modified CNN for Q3 ======

import torch                           # 引入 PyTorch 主包（张量与自动求导）
import torch.nn as nn                  # 引入神经网络模块（层、激活、容器等）

# 题面要求：把每个卷积层的 filter 数量减半，且输入图片从 224x224 改为 8x8，并且类别数为 50。
# 原网络：Conv2d(3->32) -> ReLU -> MaxPool2d -> Conv2d(32->64) -> ReLU -> MaxPool2d
#       -> Conv2d(64->64) -> ReLU -> Flatten -> Linear(64 * (H//4) * (W//4) -> 128) -> ReLU -> Dropout -> Linear(128 -> 102)
# 修改后：Conv2d(3->16) -> ReLU -> MaxPool2d -> Conv2d(16->32) -> ReLU -> MaxPool2d
#       -> Conv2d(32->32) -> ReLU -> Flatten -> Linear(32 * (8//4) * (8//4) -> 128) -> ReLU -> Dropout -> Linear(128 -> 50)

# 为了便于统计分块参数，使用具名 Sequential（便于按名字取到第一块）
from collections import OrderedDict    # 用于给 nn.Sequential 指定每一层的名字，便于后续统计

IMG_H = 8                              # 输入图像高度 8（题面要求）
IMG_W = 8                              # 输入图像宽度 8（题面要求）
NUM_CLASSES = 50                       # 类别数 50（题面要求）

# 第一、二次池化（stride=2）后，空间尺寸分别变为：8 -> 4 -> 2
# 因此 Flatten 前的空间尺寸应为 2x2，最后一层卷积的输出通道数为 32
flatten_in_features = 32 * (IMG_H // 4) * (IMG_W // 4)   # = 32 * 2 * 2 = 128

# 构建修改后的 CNN（与题面结构一致，仅改变通道数与最终分类头）
modified_cnn = nn.Sequential(OrderedDict([
    ('conv1',   nn.Conv2d(in_channels=3,  out_channels=16, kernel_size=3, padding=1, bias=True)),  # 第一层卷积：3->16, 3x3, padding=1
    ('relu1',   nn.ReLU()),                                                                        # 第一层 ReLU（无可训练参数）
    ('pool1',   nn.MaxPool2d(kernel_size=2, stride=2)),                                            # 第一层 2x2 最大池化（无可训练参数）
    ('conv2',   nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding=1, bias=True)),  # 第二层卷积：16->32
    ('relu2',   nn.ReLU()),                                                                        # 第二层 ReLU（无可训练参数）
    ('pool2',   nn.MaxPool2d(kernel_size=2, stride=2)),                                            # 第二层 2x2 最大池化（无可训练参数）
    ('conv3',   nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, padding=1, bias=True)),  # 第三层卷积：32->32（保持通道）
    ('relu3',   nn.ReLU()),                                                                        # 第三层 ReLU（无可训练参数）
    ('flatten', nn.Flatten()),                                                                      # 展平为一维向量
    ('fc1',     nn.Linear(in_features=flatten_in_features, out_features=128, bias=True)),           # 全连接：128 -> 128（因上面计算得到 32*2*2=128）
    ('relu4',   nn.ReLU()),                                                                         # 全连接后的 ReLU
    ('drop',    nn.Dropout(p=0.2)),                                                                 # Dropout（无可训练参数）
    ('fc2',     nn.Linear(in_features=128, out_features=NUM_CLASSES, bias=True)),                   # 分类头：128 -> 50（题面要求）
]))

# -------- (A) 手算第一块（Conv2d+ReLU+MaxPool2d）的参数数量 --------
# 仅 Conv2d(3 -> 16, k=3x3, bias=True) 有可训练参数：
#   权重参数量 = out_channels * in_channels * kH * kW = 16 * 3 * 3 * 3 = 432
#   偏置参数量 = out_channels = 16
#   第一块合计 = 432 + 16 = 448
hand_count_first_block = 16 * 3 * 3 * 3 + 16

# ReLU 与 MaxPool2d 作用说明（参数角度）：
# - ReLU：逐元素非线性激活，**不含可训练参数**（参数量 = 0）。
# - MaxPool2d：下采样操作（取局部最大值），**不含可训练参数**（参数量 = 0）。

# -------- (B) 用 model.parameters() 统计整个模型 & 第一块参数数量 --------
def count_params(module: nn.Module) -> int:
    """统计任意模块的参数总数（元素个数之和）"""
    return sum(p.numel() for p in module.parameters())

total_params = count_params(modified_cnn)         # 整个 CNN 的参数总数
first_block_params = count_params(modified_cnn.conv1)  # 第一块（Conv2d）参数总数；ReLU/MaxPool 无参数

# -------- (C) 打印核对结果 --------
print("=== Q3: Modified CNN Stats ===")
print("Hand-count first block params (Conv2d only; ReLU/Pool have 0):", hand_count_first_block)
print("Code-count first block params (Conv2d):                         ", first_block_params)
print("Total params of modified CNN:                                   ", total_params)

# 断言核对手算与代码是否一致（若不一致会抛错，方便作业自检）
assert hand_count_first_block == first_block_params, "第一块参数量：手算与代码统计不一致！"
print("both 448）")


=== Q3: Modified CNN Stats ===
Hand-count first block params (Conv2d only; ReLU/Pool have 0): 448
Code-count first block params (Conv2d):                          448
Total params of modified CNN:                                    37298
both 448）


We now load the pretrained MobileNetV2 architecture. We remove the classifier head as in problem 4 and we will be counting the number of parameters in the blocks of the backbone architecture.



(4) Can you compute **by hand** how many parameters are in an inverted residual block? For each feature input in a BatchNorm layer, the number of learnable parameters is doubled. Assume the following structure for an inverted residual block:
```{python}
(conv): Sequential(
(0): Conv2dNormActivation(
       (0): Conv2d(C_in, C_mid, kernel_size=k1, stride=s1, padding=p1, bias=False)
       (1): BatchNorm2d(C_mid)
       (2): ReLU6
     )
(1): Conv2dNormActivation(
       (0): Conv2d(C_mid, C_mid, kernel_size=k2, stride=s2, padding=p2, group=C_mid, bias=False)
       (1): BatchNorm2d(C_mid)
       (2): ReLU6
     )
(2): Conv2d(C_mid, C_out, kernel_size=k3, stride=s3, padding=p3, bias=False)
(3): BatchNorm2d(C_out)

```
Your answer should be a function of $C_{in}, C_{mid}, C_{out}, k_1, s_1, p_1, k_2, s_2, p_2, k_3, s_3, p_3$. Double-check your expression by plugging in the values for the 2nd inverted layer and compare to what you get using `model.parameters`.

**Your answer**:

In [None]:
import torch
import torch.nn as nn

# —— 选用一组常见的 MobileNetV2 第2个 inverted block 的参数做演示 ——
C_in, C_mid, C_out = 16, 96, 24   # 典型：t=6 扩张，16→96→24
k1, k2, k3 = 1, 3, 1              # 1x1 expand, 3x3 depthwise, 1x1 project
s1, s2, s3 = 1, 2, 1              # 步幅仅影响特征图尺寸，不影响参数量
p1, p2, p3 = 0, 1, 0              # 填充同理

# —— 按题面结构搭建该 block（bias=False；BN 有可学习的 gamma/beta）——
block = nn.Sequential(
    nn.Sequential(                                # ① expand: C_in -> C_mid
        nn.Conv2d(C_in, C_mid, k1, s1, p1, bias=False),
        nn.BatchNorm2d(C_mid),
        nn.ReLU6(inplace=False),
    ),
    nn.Sequential(                                # ② depthwise: C_mid -> C_mid（groups=C_mid）
        nn.Conv2d(C_mid, C_mid, k2, s2, p2, groups=C_mid, bias=False),
        nn.BatchNorm2d(C_mid),
        nn.ReLU6(inplace=False),
    ),
    nn.Conv2d(C_mid, C_out, k3, s3, p3, bias=False),  # ③ project: C_mid -> C_out
    nn.BatchNorm2d(C_out),
)

# —— 代码统计参数量 ——
def count_params(m: nn.Module) -> int:
    return sum(p.numel() for p in m.parameters())

code_total = count_params(block)

# —— 手算带入上述数值 ——
hand_total = (
    C_mid*C_in*(k1**2)      # expand conv
  + C_mid*(k2**2)           # depthwise conv
  + C_out*C_mid*(k3**2)     # project conv
  + 4*C_mid + 2*C_out       # BN(C_mid)+BN(C_mid)+BN(C_out)
)

print("Hand-count total:", hand_total)  # 期望 5136
print("Code-count total:", code_total)

assert hand_total == code_total, "手算与代码统计不一致！"


Hand-count total: 5136
Code-count total: 5136


(5) The full MobileNetV2 after removing classifier head has 19 blocks.
  * Can you use the same code as before to plot the number of parameters in each block?
  * In Problem 4 you have frozen all parameters but the last three blocks. How many trainable parameters do you get on the backbone after unfreezing the top three blocks? You can reuse code given in Problem 4.
  

In [None]:
import torchvision
from torch import nn

# MobileNetV2 model
MobileNetV2 = torchvision.models.mobilenet_v2(pretrained=True)
MobileNetV2.classifier = nn.Identity()  # Remove the classifier layers
MobileNetV2




Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to /root/.cache/torch/hub/checkpoints/mobilenet_v2-b0353104.pth


100%|██████████| 13.6M/13.6M [00:00<00:00, 85.7MB/s]


MobileNetV2(
  (features): Sequential(
    (0): Conv2dNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
    )
    (1): InvertedResidual(
      (conv): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
          (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (2): InvertedResidual(
      (conv): Sequential(
        (0): Conv2dNormActivation(
          (0): Conv2d(16, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(96, eps=

In [None]:
##### YOUR CODE HERE #####
# Number of parameters in each block
import torch
import torchvision
from torch import nn

# ====== 加载预训练的 MobileNetV2 模型并去掉分类头 ======
MobileNetV2 = torchvision.models.mobilenet_v2(pretrained=True)
MobileNetV2.classifier = nn.Identity()   # 移除分类器，仅保留 19 个 backbone blocks

# ====== 统计每个 block 的参数数量 ======
print("=== (1) Number of parameters in each block ===")

# MobileNetV2.features 是包含 19 个 inverted residual blocks 的顺序容器
for i, block in enumerate(MobileNetV2.features):
    # 每个 block 的参数数量 = 各参数张量元素总数之和
    num_params = sum(p.numel() for p in block.parameters())
    print(f"Block {i+1:02d}: {num_params:>10} parameters")

# 统计整个主干（backbone）的总参数数量（不包括分类层）
total_params = sum(p.numel() for p in MobileNetV2.features.parameters())
print(f"\nTotal parameters (backbone only): {total_params:,}")


=== (1) Number of parameters in each block ===
Block 01:        928 parameters
Block 02:        896 parameters
Block 03:       5136 parameters
Block 04:       8832 parameters
Block 05:      10000 parameters
Block 06:      14848 parameters
Block 07:      14848 parameters
Block 08:      21056 parameters
Block 09:      54272 parameters
Block 10:      54272 parameters
Block 11:      54272 parameters
Block 12:      66624 parameters
Block 13:     118272 parameters
Block 14:     118272 parameters
Block 15:     155264 parameters
Block 16:     320000 parameters
Block 17:     320000 parameters
Block 18:     473920 parameters
Block 19:     412160 parameters

Total parameters (backbone only): 2,223,872


In [None]:
##### YOUR CODE HERE #####
# Number of trainable parameters after only unfreezing top 3 blocks
# ====== 冻结所有参数（不参与训练） ======
for param in MobileNetV2.features.parameters():
    param.requires_grad = False

# ====== 解冻最后三个 block（使其重新可训练） ======
for block in MobileNetV2.features[-3:]:
    for param in block.parameters():
        param.requires_grad = True

# ====== 统计可训练与冻结的参数数量 ======
trainable_params = sum(p.numel() for p in MobileNetV2.features.parameters() if p.requires_grad)
frozen_params = sum(p.numel() for p in MobileNetV2.features.parameters() if not p.requires_grad)

print("=== (2) Trainable Parameters Summary ===")
print(f"Trainable parameters (only top 3 blocks unfrozen): {trainable_params:,}")
print(f"Frozen parameters: {frozen_params:,}")
print(f"Total backbone parameters check: {trainable_params + frozen_params:,}")


=== (2) Trainable Parameters Summary ===
Trainable parameters (only top 3 blocks unfrozen): 1,206,080
Frozen parameters: 1,017,792
Total backbone parameters check: 2,223,872
