- eton@251105
### nn.Parameter是PyTorch中用于创建可学习参数的重要工具，主要用于：

- 将张量标记为模型参数，使其能被自动更新
- 在自定义层和模型中创建可训练的权重和偏置
- 实现参数共享和自定义初始化

In [1]:
import os
import torch
import torch.nn as nn
from torchvision import transforms
from PIL import Image
import json

In [6]:
import torch
import torch.nn as nn
import torch.optim as optim

# 创建一个简单的模型
class ModelWithParameters(nn.Module):
    def __init__(self):
        super(ModelWithParameters, self).__init__()
        # 定义可学习参数
        self.weight = nn.Parameter(torch.randn(3, 2))
        self.bias = nn.Parameter(torch.zeros(3))
        
    def forward(self, x):
        # (10,2) multiply (2,3) which is transpose of weight. result=(10,3)
        print("bias=", self.bias)
        return torch.matmul(x, self.weight.t()) + self.bias



In [7]:
# 创建模型、数据、损失函数和优化器
model = ModelWithParameters()
x = torch.randn(10, 2)  # 10个样本，每个样本2个特征
y = torch.randint(0, 3, (10,))  # 10个标签，3个类别
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

print("x:",x)
print("y:", y)

x: tensor([[-0.6055, -1.3876],
        [ 1.1386,  1.3463],
        [-0.9156,  0.3331],
        [-1.0399,  1.0583],
        [-0.6382,  1.1240],
        [-0.2581,  0.8191],
        [-1.2863,  0.2016],
        [ 0.6478, -0.5764],
        [ 0.0757,  1.0395],
        [-0.7400,  0.6699]])
y: tensor([0, 0, 2, 0, 2, 1, 1, 2, 2, 2])


In [8]:
# 训练循环
for epoch in range(5):
    # 前向传播
    outputs = model(x)
    loss = criterion(outputs, y)
    
    # 反向传播和优化
    optimizer.zero_grad()  # 清除梯度
    loss.backward()        # 反向传播计算梯度
    optimizer.step()       # 更新参数
    
    print(f'Epoch {epoch+1}, Loss: {loss.item():.4f}')
    # 打印参数值变化
    print(f'Weight mean: {model.weight.mean().item():.4f}')
    print(f'Bias mean: {model.bias.mean().item():.4f}')

bias= Parameter containing:
tensor([0., 0., 0.], requires_grad=True)
Epoch 1, Loss: 1.2970
Weight mean: 0.2148
Bias mean: 0.0000
bias= Parameter containing:
tensor([-0.0016, -0.0010,  0.0026], requires_grad=True)
Epoch 2, Loss: 1.2946
Weight mean: 0.2148
Bias mean: -0.0000
bias= Parameter containing:
tensor([-0.0032, -0.0020,  0.0051], requires_grad=True)
Epoch 3, Loss: 1.2923
Weight mean: 0.2148
Bias mean: -0.0000
bias= Parameter containing:
tensor([-0.0047, -0.0030,  0.0077], requires_grad=True)
Epoch 4, Loss: 1.2899
Weight mean: 0.2148
Bias mean: -0.0000
bias= Parameter containing:
tensor([-0.0063, -0.0039,  0.0102], requires_grad=True)
Epoch 5, Loss: 1.2876
Weight mean: 0.2148
Bias mean: -0.0000


### - PyTorch中optim.LBFGS优化器的使用详解
optim.LBFGS是PyTorch中实现的L-BFGS（Limited-memory Broyden-Fletcher-Goldfarb-Shanno）算法的优化器。它是一种拟牛顿法，适用于需要精确优化的小批量或全批量训练场景。

### - LBFGS优化器的特点
与常见的SGD或Adam优化器相比，LBFGS具有以下特点：

- 使用二阶导数信息（通过有限内存近似），收敛速度通常更快
- 需要计算完整的损失函数（通常使用全批量）
- 需要定义一个闭包函数来重新计算损失和梯度
- 对于小数据集和需要精确优化的场景效果很好（如您代码中的温度参数校准）

In [11]:
import torch
import torch.optim as optim

# 创建一个可学习的参数
params = torch.nn.Parameter(torch.tensor([1.0]))

# 创建LBFGS优化器，传入参数和配置
optimizer = optim.LBFGS([params], lr=0.1, max_iter=20)

# 定义目标函数（这里是f(x) = (x-2)^2）
def criterion(x):
    return (x - 2) ** 2

# 优化循环
for i in range(1):
    # LBFGS需要一个闭包函数来重新计算损失和梯度
    def closure():
        # 清除之前的梯度
        optimizer.zero_grad()
        # 计算损失
        loss = criterion(params)
        # 反向传播计算梯度
        loss.backward()
        # 打印当前状态
        print(f'Step: {i}, x: {params.item():.4f}, loss: {loss.item():.4f}')
        # 返回损失值
        return loss.item()
    
    # 执行优化步骤，传入闭包函数
    optimizer.step(closure)
warning_in_closure="""
torch/optim/lbfgs.py:457: UserWarning: Converting a tensor with requires_grad=True to a scalar may lead to unexpected behavior.
Consider using tensor.detach() first. (Triggered internally at /pytorch/torch/csrc/autograd/generated/python_variable_methods.cpp:835.)
  loss = float(closure())
这个警告发生在：
  1. LBFGS优化器内部尝试将闭包函数返回的损失值转换为Python浮点数
  2. 当返回的损失张量仍然具有梯度跟踪（requires_grad=True）时
  3. 直接将这样的张量转换为浮点数可能会导致计算图出现问题

solution:
- item() - 将单元素张量转换为Python标量
- detach() - 从计算图中分离张量，停止梯度跟踪
"""

Step: 0, x: 1.0000, loss: 1.0000
Step: 0, x: 1.1000, loss: 0.8100
Step: 0, x: 1.1900, loss: 0.6561
Step: 0, x: 1.2710, loss: 0.5314
Step: 0, x: 1.3439, loss: 0.4305
Step: 0, x: 1.4095, loss: 0.3487
Step: 0, x: 1.4686, loss: 0.2824
Step: 0, x: 1.5217, loss: 0.2288
Step: 0, x: 1.5695, loss: 0.1853
Step: 0, x: 1.6126, loss: 0.1501
Step: 0, x: 1.6513, loss: 0.1216
Step: 0, x: 1.6862, loss: 0.0985
Step: 0, x: 1.7176, loss: 0.0798
Step: 0, x: 1.7458, loss: 0.0646
Step: 0, x: 1.7712, loss: 0.0523
Step: 0, x: 1.7941, loss: 0.0424
Step: 0, x: 1.8147, loss: 0.0343
Step: 0, x: 1.8332, loss: 0.0278
Step: 0, x: 1.8499, loss: 0.0225
Step: 0, x: 1.8649, loss: 0.0182
