# YOLO凍結模型實作

在 3.1、3.2 節中已經學習到 Transfer Learning 與 Train from scratch 的知識概念，在這一小節中將要來實際操作凍結 YOLO 模型。

# 準備需要文件

首先將需要的文件解壓縮並進入至該資料夾中

In [None]:
!unzip freeze_model.zip

In [None]:
%cd freeze_model

# 下載模型

我們使用 yolov7_training.pt 來進行凍結，若想在其他模型上操作也可以自行下載玩玩看唷～

In [None]:
!wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7_training.pt

--2022-11-18 17:23:14--  https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7_training.pt
Resolving github.com (github.com)... 20.27.177.113
Connecting to github.com (github.com)|20.27.177.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/511187726/13e046d1-f7f0-43ab-910b-480613181b1f?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20221118%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20221118T172314Z&X-Amz-Expires=300&X-Amz-Signature=c6a73d55f66b5db45d5dabeed9723f56ca627e5fd0b9a7630653046b3e81c587&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=511187726&response-content-disposition=attachment%3B%20filename%3Dyolov7_training.pt&response-content-type=application%2Foctet-stream [following]
--2022-11-18 17:23:14--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/511187726/13e046d1-f7f0-43ab-910b-480613181b1f?X

# YOLO凍結模型實作

在開始實作前，我們需要先準備 'hyp.scratch.custom.yaml' 以及 'coco.yaml'，前者為超參數設置檔、後者為存放用於訓練過程中的資料檔案。


In [None]:
from models.yolo import Model
import torch
import torch.nn as nn
import torch.optim as optim
import yaml

In [None]:
opt_hyp = 'hyp.scratch.custom.yaml'

with open(opt_hyp, errors='ignore') as f:
  hyp = yaml.safe_load(f)  # load hyps dict
  if 'anchors' not in hyp:  # anchors commented in hyp.yaml
      hyp['anchors'] = 3

In [None]:
opt_data = 'coco.yaml'
with open(opt_data) as f:
  data_dict = yaml.load(f, Loader=yaml.SafeLoader)

## 模型建立

接下來，讀取模型 config 檔，並載入權重檔

In [None]:
model = Model('yolov7.yaml', ch=3, nc=int(data_dict['nc']), anchors=hyp.get('anchors'))

In [None]:
cuda = torch.cuda.is_available()
device = torch.device('cuda:0' if cuda else 'cpu')

In [None]:
ckpt = torch.load('yolov7_training.pt', map_location=device)
state_dict = ckpt['model'].float().state_dict()
model.load_state_dict(state_dict, strict=False)

<All keys matched successfully>

## 模型凍結

接著來複習一下 3.1、3.2 節的課程內容，模型凍結是用於 Transfer Learning 的做法，目前有分為兩種方法 ---- Feature Extraction (特徵提取)、Fine-tuning (微調) ，都是基於 Pre-trained model 的權重上做調整。

* Feature Extraction (特徵提取)：將輸出層之前的網路層參數凍結，在訓練自己的資料集時，只訓練更新後面未被凍結的網路層參數或是另外接其他網路層來訓練。

<img src="https://imgur.com/goSL2Na.png" width=700>


* Fine-tuning (微調)：訓練更新 pre-trained model 中的每一層網路層參數，或是僅凍結較低層的網路層，訓練後面未被凍結的部分。

<img src="https://imgur.com/f6iORsC.png" width=700>


### 凍結網路層

首先查看模型網路層名稱和參數，其中 requires_grad 表示是否需要在計算中保留該參數的梯度。

由下列訊息可以看到目前梯度都為 True。

In [None]:
cnt = 0
for k, v in model.named_parameters():
  print('name: ', k)
  print('requires_grad: ', v.requires_grad)

  cnt += 1

  if cnt == 10:
    break

name:  model.0.conv.weight
requires_grad:  True
name:  model.0.bn.weight
requires_grad:  True
name:  model.0.bn.bias
requires_grad:  True
name:  model.1.conv.weight
requires_grad:  True
name:  model.1.bn.weight
requires_grad:  True
name:  model.1.bn.bias
requires_grad:  True
name:  model.2.conv.weight
requires_grad:  True
name:  model.2.bn.weight
requires_grad:  True
name:  model.2.bn.bias
requires_grad:  True
name:  model.3.conv.weight
requires_grad:  True


接著將 "model.0" 以及 "model.1" 的網路層進行凍結

In [None]:
opt_freeze = [0,1]

freeze_list = []
for x in opt_freeze:
  freeze_list.append(f'model.{x}.')

In [None]:
for k, v in model.named_parameters():
  v.requires_grad = True  # train all layers
  if any(x in k for x in freeze_list):
      print('freezing %s' % k)
      v.requires_grad = False

freezing model.0.conv.weight
freezing model.0.bn.weight
freezing model.0.bn.bias
freezing model.1.conv.weight
freezing model.1.bn.weight
freezing model.1.bn.bias


由下列訊息可以看到 "model.0" 以及 "model.1" 的網路層梯度都為 False。

In [None]:
cnt = 0
for k, v in model.named_parameters():
  print('name: ', k)
  print('requires_grad: ', v.requires_grad)

  cnt += 1

  if cnt == 10:
    break

name:  model.0.conv.weight
requires_grad:  False
name:  model.0.bn.weight
requires_grad:  False
name:  model.0.bn.bias
requires_grad:  False
name:  model.1.conv.weight
requires_grad:  False
name:  model.1.bn.weight
requires_grad:  False
name:  model.1.bn.bias
requires_grad:  False
name:  model.2.conv.weight
requires_grad:  True
name:  model.2.bn.weight
requires_grad:  True
name:  model.2.bn.bias
requires_grad:  True
name:  model.3.conv.weight
requires_grad:  True


### 設定優化器

另外，可以在不同的網路層中設定不同的優化器參數。

首先來查看模型的網路架構

In [None]:
cnt = 0
for k, v in model.named_modules():
  print('structure: ', v)
  cnt += 1

  if cnt == 1:
    break

structure:  Model(
  (model): Sequential(
    (0): Conv(
      (conv): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn): BatchNorm2d(32, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
      (act): SiLU()
    )
    (1): Conv(
      (conv): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn): BatchNorm2d(64, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
      (act): SiLU()
    )
    (2): Conv(
      (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn): BatchNorm2d(64, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
      (act): SiLU()
    )
    (3): Conv(
      (conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn): BatchNorm2d(128, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
      (act): SiLU()
    )
    (4): Conv(
      (conv): Conv2d(128, 64, kernel_s

In [None]:
# 查看 bias 參數
cnt = 0
for k, v in model.named_modules():
  if hasattr(v, 'bias') and isinstance(v.bias, nn.Parameter):
    print(v.bias)
    print("==================")
    cnt += 1

    if cnt == 5:
      break

Parameter containing:
tensor([-4.30469,  0.83594, -2.74414, -4.00000,  2.36523,  1.36230, -0.06348,  2.24414,  0.90625, -0.26367,  2.95703,  3.07031, -0.96094, -2.79883, -2.45508, -4.37891,  1.42676,  0.60742, -0.82227,  2.24023, -2.50391,  1.85156,  2.09766,  2.62500, -0.92334, -0.83008, -3.04883,  1.70508,  1.22363, -2.69922, -2.44922,
         2.18164])
Parameter containing:
tensor([-0.22314,  1.29688, -0.04346, -0.38623, -0.46582, -0.32129, -0.54248,  0.40430,  0.06335,  0.86572, -0.50879, -0.12646, -0.12927, -0.28394, -0.55566,  1.67871,  0.64258, -0.63086,  2.23047,  1.00195, -0.70166, -0.22229, -0.49683,  2.32031,  1.56738,  1.13867, -3.05469,  0.47412, -0.59814, -0.76318, -0.58398,
         0.91406, -0.85352,  1.82031, -0.53271,  2.65820,  0.95557, -0.29468, -0.47144, -0.29248, -0.56592, -0.62207, -0.52881,  1.37793, -2.27734, -0.24866, -0.49512, -0.06229, -0.26514,  0.19543, -0.79590,  1.75586, -0.64893,  1.62109, -0.28271, -0.12817,  2.33203,  2.97266,  0.38525, -0.52148,  0.

In [None]:
# 查看 BatchNorm weight
cnt = 0
for k, v in model.named_modules():
  if isinstance(v, nn.BatchNorm2d):
    print(v.weight)
    print("==================")
    cnt += 1

    if cnt == 5:
      break

Parameter containing:
tensor([2.68555, 2.06641, 2.85938, 3.56641, 1.76953, 2.67578, 1.25195, 2.16211, 1.23242, 2.44141, 2.42773, 2.63086, 2.55664, 2.70703, 2.72461, 3.54688, 2.46289, 2.52344, 2.55859, 1.68555, 2.32617, 2.33594, 1.95117, 2.51367, 1.23535, 1.46777, 2.92188, 0.63086, 2.00391, 1.43945, 2.66016, 1.16895])
Parameter containing:
tensor([3.23633, 4.15625, 2.39062, 2.88867, 2.41211, 2.61719, 2.73438, 2.60352, 2.60156, 2.36914, 3.66797, 2.75977, 2.29297, 2.70703, 4.01953, 2.35938, 3.55078, 3.08203, 2.91016, 3.46289, 3.04102, 3.61133, 2.50977, 2.86133, 3.96289, 1.68359, 2.14062, 3.40039, 2.86719, 4.41797, 2.36133, 3.00586, 2.33008, 2.73633,
        2.50781, 1.29102, 3.72852, 2.22852, 2.43945, 1.55664, 2.48047, 3.54688, 3.32812, 3.09570, 1.72949, 3.06445, 3.42383, 2.69727, 2.60742, 2.36523, 2.59570, 1.22949, 3.13867, 2.27344, 2.78516, 3.04688, 1.17676, 2.02734, 4.39062, 3.18945, 2.63867, 2.64453, 2.55273, 4.25000])
Parameter containing:
tensor([1.87402, 0.84033, 2.66016, 1.66113, 

同一個 parameter groups 會設定相同的超參數，因此可以自己決定要設置哪些網路層至同一個 parameter groups 中。


In [None]:
pg0, pg1, pg2 = [], [], []  # optimizer parameter groups
for k, v in model.named_modules():
  if hasattr(v, 'bias') and isinstance(v.bias, nn.Parameter):
    pg2.append(v.bias)  # biases
  if isinstance(v, nn.BatchNorm2d):
    pg0.append(v.weight)  # no decay
  elif hasattr(v, 'weight') and isinstance(v.weight, nn.Parameter):
    pg1.append(v.weight)  # apply decay
  if hasattr(v, 'im'):
    if hasattr(v.im, 'implicit'):           
        pg0.append(v.im.implicit)
    else:
      for iv in v.im:
        pg0.append(iv.implicit)
  if hasattr(v, 'imc'):
    if hasattr(v.imc, 'implicit'):           
        pg0.append(v.imc.implicit)
    else:
      for iv in v.imc:
        pg0.append(iv.implicit)
  if hasattr(v, 'imb'):
    if hasattr(v.imb, 'implicit'):           
        pg0.append(v.imb.implicit)
    else:
      for iv in v.imb:
        pg0.append(iv.implicit)
  if hasattr(v, 'imo'):
    if hasattr(v.imo, 'implicit'):           
        pg0.append(v.imo.implicit)
    else:
      for iv in v.imo:
        pg0.append(iv.implicit)
  if hasattr(v, 'ia'):
    if hasattr(v.ia, 'implicit'):           
        pg0.append(v.ia.implicit)
    else:
      for iv in v.ia:
        pg0.append(iv.implicit)
  if hasattr(v, 'attn'):
    if hasattr(v.attn, 'logit_scale'):   
      pg0.append(v.attn.logit_scale)
    if hasattr(v.attn, 'q_bias'):   
      pg0.append(v.attn.q_bias)
    if hasattr(v.attn, 'v_bias'):  
      pg0.append(v.attn.v_bias)
    if hasattr(v.attn, 'relative_position_bias_table'):  
      pg0.append(v.attn.relative_position_bias_table)
  if hasattr(v, 'rbr_dense'):
    if hasattr(v.rbr_dense, 'weight_rbr_origin'):  
      pg0.append(v.rbr_dense.weight_rbr_origin)
    if hasattr(v.rbr_dense, 'weight_rbr_avg_conv'): 
      pg0.append(v.rbr_dense.weight_rbr_avg_conv)
    if hasattr(v.rbr_dense, 'weight_rbr_pfir_conv'):  
      pg0.append(v.rbr_dense.weight_rbr_pfir_conv)
    if hasattr(v.rbr_dense, 'weight_rbr_1x1_kxk_idconv1'): 
      pg0.append(v.rbr_dense.weight_rbr_1x1_kxk_idconv1)
    if hasattr(v.rbr_dense, 'weight_rbr_1x1_kxk_conv2'):   
      pg0.append(v.rbr_dense.weight_rbr_1x1_kxk_conv2)
    if hasattr(v.rbr_dense, 'weight_rbr_gconv_dw'):   
      pg0.append(v.rbr_dense.weight_rbr_gconv_dw)
    if hasattr(v.rbr_dense, 'weight_rbr_gconv_pw'):   
      pg0.append(v.rbr_dense.weight_rbr_gconv_pw)
    if hasattr(v.rbr_dense, 'vector'):   
      pg0.append(v.rbr_dense.vector)

最後，就來設定要使用的優化器，並且設置 parameter groups 的超參數。

In [None]:
optimizer = optim.Adam(pg0, lr=hyp['lr0'], betas=(hyp['momentum'], 0.999))
optimizer.add_param_group({'params': pg1, 'weight_decay': hyp['weight_decay']})  # add pg1 with weight_decay
optimizer.add_param_group({'params': pg2})

如此一來，模型凍結的部分就完成了！接著就能夠來嘗試 Transfer Learning 方法囉！