## BERT 在Fine-tuning时 freeze 参数的影响

### freeze 全部parameter
载入预训练模型后，遍历 parameter 将 requires_grad 设置为 False

In [3]:
from transformers import BertForSequenceClassification

model_path = "/data/projects/NLP_achuan/semantic_similairy/text-semantic-similarity-bert/input/bert-base-chinese"
model = BertForSequenceClassification.from_pretrained(model_path)

Some weights of the model checkpoint at /data/projects/NLP_achuan/semantic_similairy/text-semantic-similarity-bert/input/bert-base-chinese were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights o

现在验证一下model 前参数的 requires_grad 的状态

In [18]:
# model 的 parameter 接口有多个
## name_paramters 会列出 parameter_name， parameter，同时 parameter 也可以
for item in model.named_parameters():
    print('name_parameter:')
    print(type(item), len(item), item[0], type(item[1]), item[1].requires_grad)
    break

## parameter 接口,
for item in model.parameters():
    print('only parameter:')
    print(type(item), len(item), item.requires_grad)
    break

## 可以单独调出 bert 的参数
for param in model.bert.base_model.named_parameters():
    print(param[0])
    break

name_parameter:
<class 'tuple'> 2 bert.embeddings.word_embeddings.weight <class 'torch.nn.parameter.Parameter'> True
only parameter:
<class 'torch.nn.parameter.Parameter'> 21128 True
embeddings.word_embeddings.weight


现在改变 model 中 bert—base的 requires_grad 为False，然后通过named_parameters 来打印展示

In [20]:
# set false
for param in model.bert.base_model.parameters():
    param.requires_grad = False

for param in model.named_parameters():
    print(f'name: {param[0]}, status: {param[1].requires_grad}')

name: bert.embeddings.word_embeddings.weight, status: False
name: bert.embeddings.position_embeddings.weight, status: False
name: bert.embeddings.token_type_embeddings.weight, status: False
name: bert.embeddings.LayerNorm.weight, status: False
name: bert.embeddings.LayerNorm.bias, status: False
name: bert.encoder.layer.0.attention.self.query.weight, status: False
name: bert.encoder.layer.0.attention.self.query.bias, status: False
name: bert.encoder.layer.0.attention.self.key.weight, status: False
name: bert.encoder.layer.0.attention.self.key.bias, status: False
name: bert.encoder.layer.0.attention.self.value.weight, status: False
name: bert.encoder.layer.0.attention.self.value.bias, status: False
name: bert.encoder.layer.0.attention.output.dense.weight, status: False
name: bert.encoder.layer.0.attention.output.dense.bias, status: False
name: bert.encoder.layer.0.attention.output.LayerNorm.weight, status: False
name: bert.encoder.layer.0.attention.output.LayerNorm.bias, status: False
na

观察上面的结果可以发现除了 classifier 层之外的参数都被设置为了False