Skip to content

HenryZhuHR/Fine-tune-Model-Assignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fine tune Model Assignment

任务需求

基于图像分类的思路进行人脸识别,并且fine tune model:

  • Baseline: First, re-train the softmax layer in the pretrained ResNet18 with face training samples (40 samples per subject). This is equivalent to you freeze the rest of the layers in the network and a linear classifier remains. Note the number of output nodes should be changed to 100 instead of 1000. This serves as a baseline model. Determine the testing accuracy with the testing set, which composed of 10 samples per subject.
  • ModelA: Fine tune Conv5_x.
  • ModelB: Fine tune Conv4_x and Conv5_x.
  • ModelC: Fine tune ALL convolution layers.
  • ModelD: Freeze all the convolution blocks, introduce two FC layers prior to the softmax layer and train them together with softmax layer. This is equivalent to a 2 hidden layers MLP. Specify the neuron number in the FC layers.

数据集

100个类别,每个类别50个样本,其中40 train+10 valid

tar -xzvf face_dataset.tar.gz

查看结果

Tensorboard 查看结果

tensorboard --logdir=runs

也保存在logs目录下的.log文件

文件结构

.
├── face_dataset  # 数据集
│   ├── facescrub_test
│   └── facescrub_train
├── images    # 数据
├── logs      # 训练结果
│   ├── .log  # 训练的loss和acc
│   └── .txt  # 训练前后的参数调整差值 
├── runs      # tensorboard 记录结果
├── plot.py   # 打印loss和acc的脚本
├── README.md
├── train.py
└── train.sh

Fine tune 参数设置及训练

  • Baseline

    k kernel size s stride p zero padding f feature map size RF Receptive Field

    Layer k s/p f # of weights RF
    Input - 3×64×64 1
    Conv1 7 1/3 64×64×64 9,408 7
    Pooling 3 2/1 64×32×32 0 9
    Conv2-1 3 1/1 64×32×32 36,864 13
    Conv2-2 3 1/1 64×32×32 36,864 17
    Conv2-3 3 1/1 64×32×32 36,864 21
    Conv2-4 3 1/1 64×32×32 36,864 25
    Conv3-1 3 1/1 128×32×32 73,728 29
    Conv3-2 3 1/1 128×32×32 147,456 33
    Conv3-3 1 2/0 128×32×32 8,192 33
    Conv3-4 3 1/1 128×32×32 147,456 41
    Conv3-5 3 1/1 128×32×32 147,456 45
    Conv4-1 3 2/1 256×16×16 294,912 53
    Conv4-2 3 1/1 256×16×16 589,824 69
    Conv4-3 1 2/0 256×16×16 32,768 69
    Conv4-4 3 1/1 256×16×16 589,824 101
    Conv4-5 3 1/1 256×16×16 589,824 133
    Conv5-1 3 2/1 512×8×8 1,179,648 165
    Conv5-2 3 1/1 512×8×8 2,359,296 229
    Conv5-3 1 2/0 512×8×8 131,072 229
    Conv5-4 3 1/1 512×8×8 2,359,296 357
    Conv5-5 3 1/1 512×8×8 2,359,296 485
    avgpool - 512×1×1 0
  • 结果汇总

    Architecture Learning rate, and hyperparameters Training Accuracy Test Accuracy
    Baseline lr=0.001,batch size=256 99.77% 61.90%
    ModelA lr=0.001,batch size=256 100.0% 90.70%
    ModelB lr=0.001,batch size=256 100.0% 93.70%
    ModelC lr=0.001,batch size=256 100.0% 95.50%
    ModelD(fc:256+128) lr=0.001,batch size=256 99.75% 54.00%
    ModelD(fc:256+128+dropout) lr=0.001,batch_size=256,dropout=0.5 43.50% 47.40%
    ModelD(fc:1024+512) lr=0.001,batch size=256 100.0% 59.40%
    ModelD(fc:1024+512+dropout) lr=0.001,batch_size=256,dropout=0.5 94.82% 61.60%
    ModelD(fc:2048+1024) lr=0.001,batch size=128 99.97% 60.40%
    ModelD(fc:2048+1024+dropout) lr=0.001,batch_size=128,dropout=0.5 91.72% 60.30%

Baseline

ResNet18模型

           Conv2d-63            [-1, 512, 2, 2]       2,359,296
      BatchNorm2d-64            [-1, 512, 2, 2]           1,024
             ReLU-65            [-1, 512, 2, 2]               0
       BasicBlock-66            [-1, 512, 2, 2]               0
AdaptiveAvgPool2d-67            [-1, 512, 1, 1]               0
           Linear-68                 [-1, 1000]         513,000
================================================================
Total params: 11,689,512
Trainable params: 11,689,512
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 5.14
Params size (MB): 44.59
Estimated Total Size (MB): 49.78
----------------------------------------------------------------

调整 ResNet-18 作为Baseline,将 conv1layer2[0].conv1layer2[0].downsample[0] 的Conv2d层的stride由2调整为1

def Baseline(num_classes: int = 10, pretrained: bool = True):
    model = resnet18(pretrained=pretrained)
    original_state_dict = copy.deepcopy(model.state_dict())
    # ====== fine tune model ======
    # modify the model to make the last feature map size to 8x8x512
    model.conv1 = nn.Conv2d(3, 64, (7, 7), (1, 1), padding=(3, 3), bias=False)
    model.layer2[0].conv1 = nn.Conv2d(64, 128, (3, 3), (1, 1), padding=(1, 1), bias=False)
    model.layer2[0].downsample[0] = nn.Conv2d(64, 128, (1, 1), (1, 1), bias=False)
    # =============================
    model.load_state_dict(original_state_dict)
    model.fc = nn.Linear(model.fc.in_features, num_classes)
    return model

调整后的Baseline

           Conv2d-63            [-1, 512, 8, 8]       2,359,296
      BatchNorm2d-64            [-1, 512, 8, 8]           1,024
             ReLU-65            [-1, 512, 8, 8]               0
       BasicBlock-66            [-1, 512, 8, 8]               0
AdaptiveAvgPool2d-67            [-1, 512, 1, 1]               0
           Linear-68                  [-1, 100]          51,300
================================================================
Total params: 11,227,812
Trainable params: 51,300
Non-trainable params: 11,176,512
----------------------------------------------------------------
Input size (MB): 0.05
Forward/backward pass size (MB): 41.50
Params size (MB): 42.83
Estimated Total Size (MB): 84.38
----------------------------------------------------------------

加载Baseline模型并训练

model = Baseline(num_classes=len(train_set.classes),
pretrained=True)
for name, param in model.named_parameters():
    param.requires_grad = False
model.fc.weight.requires_grad = True
model.fc.bias.requires_grad = True

将此模型作为fine tune前的模型加载

# -- Load Baseline model  --
  • Baseline 训练结果 Valid set Accuracy 最终稳定在 61% 左右

ModelA

  • Fine tune Conv5_x.

Conv5_x.也就是模型的layer4中全部conv层开放梯度反向传播

# -- Load Baseline model  --
# layer4 / conv5_x
model.layer4[0].conv1.weight.requires_grad = True
model.layer4[0].conv2.weight.requires_grad = True
model.layer4[0].downsample[0].weight.requires_grad = True
model.layer4[1].conv1.weight.requires_grad = True
model.layer4[1].conv2.weight.requires_grad = True
  • ModelA 训练结果 Valid set Accuracy 最终稳定在 90% 左右

ModelB

  • Fine tune Conv4_x and Conv5_x. 对Conv4_x.Conv5_x.也就是模型的layer4layer3中全部conv层开放梯度反向传播
# -- Load Baseline model  --
# layer3 / conv4_x
model.layer3[0].conv1.weight.requires_grad = True
model.layer3[0].conv2.weight.requires_grad = True
model.layer3[0].downsample[0].weight.requires_grad = True
model.layer3[1].conv1.weight.requires_grad = True
model.layer3[1].conv2.weight.requires_grad = True
# layer4 / conv5_x
model.layer4[0].conv1.weight.requires_grad = True
model.layer4[0].conv2.weight.requires_grad = True
model.layer4[0].downsample[0].weight.requires_grad = True
model.layer4[1].conv1.weight.requires_grad = True
model.layer4[1].conv2.weight.requires_grad = True
  • ModelB 训练结果 Valid set Accuracy 最终稳定在 93% 左右

ModelC

  • Fine tune ALL convolution layers.

对全部conv层开放梯度反向传播

# -- Load Baseline model  --
# conv1_x
model.conv1.weight.requires_grad = True
# layer1 / conv2_x
model.layer1[0].conv1.weight.requires_grad = True
model.layer1[0].conv2.weight.requires_grad = True
model.layer1[1].conv1.weight.requires_grad = True
model.layer1[1].conv2.weight.requires_grad = True
# layer2 / conv3_x
model.layer2[0].conv1.weight.requires_grad = True
model.layer2[0].conv2.weight.requires_grad = True
model.layer2[0].downsample[0].weight.requires_grad = True
model.layer2[1].conv1.weight.requires_grad = True
model.layer2[1].conv2.weight.requires_grad = True
# layer3 / conv4_x
model.layer3[0].conv1.weight.requires_grad = True
model.layer3[0].conv2.weight.requires_grad = True
model.layer3[0].downsample[0].weight.requires_grad = True
model.layer3[1].conv1.weight.requires_grad = True
model.layer3[1].conv2.weight.requires_grad = True
# layer4 / conv5_x
model.layer4[0].conv1.weight.requires_grad = True
model.layer4[0].conv2.weight.requires_grad = True
model.layer4[0].downsample[0].weight.requires_grad = True
model.layer4[1].conv1.weight.requires_grad = True
model.layer4[1].conv2.weight.requires_grad = True
  • ModelC 训练结果 Valid set Accuracy 最终稳定在 95% 左右

ModelD

modelD-

freeze全部卷积层,增加两层fc之后,并没有很好地提高模型的精度,最高也只能达到60%左右

ModelD(fc:256+128)

新增的全连接层为 256 neuron + 128 neuron

# -- Load Baseline model  --
model.fc = nn.Sequential(
    nn.Linear(model.fc.in_features, 256),
    nn.Linear(256, 128),
    nn.Linear(128, len(train_set.classes)))
  • ModelD(fc:256+128) 训练结果

    Valid set Accuracy 最终稳定在 54% 左右

ModelD(fc:256+128 + dropout)

新增的全连接层为 256 neuron + 128 neuron 并且有p=0.5的dropout层

# -- Load Baseline model  --
model.fc = nn.Sequential(
    nn.Linear(model.fc.in_features, 256),nn.ReLU(),
    nn.Dropout(p=0.5),
    nn.Linear(256, 128),nn.ReLU(),
    nn.Dropout(p=0.5),
    nn.Linear(128, len(train_set.classes)))
  • ModelD(fc:256+128 with dropout) 训练结果

    Valid set Accuracy 最终稳定在 47% 左右

ModelD(fc:1024+512)

新增的全连接层为 1024 neuron + 512 neuron

# -- Load Baseline model  --
model.fc = nn.Sequential(
    nn.Linear(model.fc.in_features, 1024),
    nn.Linear(1024, 512),
    nn.Linear(512, len(train_set.classes)))
  • ModelD(fc:1024+512) 训练结果

    Valid set Accuracy 最终稳定在 60% 左右

ModelD(fc:1024+512 + dropout)

新增的全连接层为 1024 neuron + 512 neuron 并且有p=0.5的dropout层

# -- Load Baseline model  --
model.fc = nn.Sequential(
    nn.Linear(model.fc.in_features, 1024),nn.ReLU(),nn.Dropout(p=0.5),
    nn.Linear(1024, 512),nn.ReLU(),nn.Dropout(p=0.5),
    nn.Linear(512, len(train_set.classes)))
  • ModelD(fc:1024+512 with dropout) 训练结果

    Valid set Accuracy 最终稳定在 61% 左右

ModelD(fc:2048+1024)

新增的全连接层为 2048 neuron + 1024 neuron

# -- Load Baseline model  --
model.fc = nn.Sequential(
    nn.Linear(model.fc.in_features, 2048),
    nn.Linear(2048, 1024),
    nn.Linear(1024, len(train_set.classes)))
  • ModelD(fc:2048+1024) 训练结果

    Valid set Accuracy 最终稳定在 60% 左右

ModelD(fc:2048+1024 + dropout)

新增的全连接层为 2048 neuron + 1024 neuron 并且有p=0.5的dropout层

# -- Load Baseline model  --
model.fc = nn.Sequential(
    nn.Linear(model.fc.in_features, 2048),nn.ReLU(),nn.Dropout(p=0.5),
    nn.Linear(2048, 1024),nn.ReLU(),nn.Dropout(p=0.5),
    nn.Linear(1024, len(train_set.classes)))
  • ModelD(fc:2048+1024 with dropout) 训练结果

    Valid set Accuracy 最终稳定在 60% 左右

About

Fine tune Model 神经网络模型微调作业

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published