### 一 查看一个虚拟模型参数类型

In [9]:
from helper import DummyModel 

In [10]:
model = DummyModel()

查看模型结构

In [11]:
model

DummyModel(
  (token_embedding): Embedding(2, 2)
  (linear_1): Linear(in_features=2, out_features=2, bias=True)
  (layernorm_1): LayerNorm((2,), eps=1e-05, elementwise_affine=True)
  (linear_2): Linear(in_features=2, out_features=2, bias=True)
  (layernorm_2): LayerNorm((2,), eps=1e-05, elementwise_affine=True)
  (head): Linear(in_features=2, out_features=2, bias=True)
)

创建一个方法，打印模型参数的数据类型

In [12]:
def print_param_dtype(model):
    for name, param in model.named_parameters():
        print(f"{name} is loaded in {param.dtype}")

In [13]:
print_param_dtype(model)

token_embedding.weight is loaded in torch.float32
linear_1.weight is loaded in torch.float32
linear_1.bias is loaded in torch.float32
layernorm_1.weight is loaded in torch.float32
layernorm_1.bias is loaded in torch.float32
linear_2.weight is loaded in torch.float32
linear_2.bias is loaded in torch.float32
layernorm_2.weight is loaded in torch.float32
layernorm_2.bias is loaded in torch.float32
head.weight is loaded in torch.float32
head.bias is loaded in torch.float32


运行推理测试

In [17]:
import torch

In [40]:
dummy_input = torch.LongTensor([[1, 0, 1]])

In [41]:
dummy_input

tensor([[1, 0, 1]])

In [42]:
logits_fp32 = model(dummy_input)

In [43]:
logits_fp32

tensor([[[-0.6872,  0.7132],
         [-0.6872,  0.7132],
         [-0.6872,  0.7132]]], grad_fn=<ViewBackward0>)

### 二 转换为 Float16

In [15]:
model_fp16 = DummyModel().half()

In [16]:
print_param_dtype(model_fp16)

token_embedding.weight is loaded in torch.float16
linear_1.weight is loaded in torch.float16
linear_1.bias is loaded in torch.float16
layernorm_1.weight is loaded in torch.float16
layernorm_1.bias is loaded in torch.float16
linear_2.weight is loaded in torch.float16
linear_2.bias is loaded in torch.float16
layernorm_2.weight is loaded in torch.float16
layernorm_2.bias is loaded in torch.float16
head.weight is loaded in torch.float16
head.bias is loaded in torch.float16


运行推理测试

In [44]:
logits_fp16 = model_fp16(dummy_input)

In [45]:
logits_fp16

tensor([[[-0.6870,  0.7134],
         [-0.6870,  0.7134],
         [-0.6870,  0.7134]]], dtype=torch.float16, grad_fn=<ViewBackward0>)

### 三 转换为 BFloat16

此处将会使用 `copy.deepcopy` 来拷贝 FP32 版本模型，以确保 BF16 和 FP32 之间具有相同的权重。

**deepcopy**

`copy.deepcopy` 会创建一个与原始对象独立的模型副本。你对副本所做的修改不会影响到原始对象，因为你是创建了一个“深拷贝”。要查看更多详细信息，参考 Python docs [copy][https://docs.python.org/3/library/copy.html]

In [46]:
from copy import deepcopy

In [47]:
model_bf16 = deepcopy(model)

In [48]:
model_bf16 = model_bf16.to(torch.bfloat16)

In [49]:
print_param_dtype(model_bf16)

token_embedding.weight is loaded in torch.bfloat16
linear_1.weight is loaded in torch.bfloat16
linear_1.bias is loaded in torch.bfloat16
layernorm_1.weight is loaded in torch.bfloat16
layernorm_1.bias is loaded in torch.bfloat16
linear_2.weight is loaded in torch.bfloat16
linear_2.bias is loaded in torch.bfloat16
layernorm_2.weight is loaded in torch.bfloat16
layernorm_2.bias is loaded in torch.bfloat16
head.weight is loaded in torch.bfloat16
head.bias is loaded in torch.bfloat16


运行推理测试

In [50]:
logits_bf16 = model_bf16(dummy_input)

In [51]:
logits_bf16

tensor([[[-0.6875,  0.7148],
         [-0.6875,  0.7148],
         [-0.6875,  0.7148]]], dtype=torch.bfloat16, grad_fn=<ViewBackward0>)

FP32 与 BF16 差异计算

In [52]:
mean_diff = torch.abs(logits_bf16 - logits_fp32).mean().item()
max_diff = torch.abs(logits_bf16 - logits_fp32).max().item()

print(f"Mean diff: {mean_diff} | Max diff: {max_diff}")

Mean diff: 0.0009978810558095574 | Max diff: 0.0016907453536987305
