我们将从零开始实现整个方法，包括流水线、模型、损失函数和小批量随机梯度下降优化器

In [7]:
%matplotlib inline
import random
import torch

根据带有噪声的线性模型构造一个人造数据集。我们使用线性模型参数w=[2,-3.4].T、b=4.2和噪声项z生成数据集及其标签：
y=Xw+b+z

In [8]:
def synthetic_data(w,b,num_examples):
    """生成y = Xw + b + 噪声"""
    X=torch.normal(0,1,(num_examples,len(w)))
    y=torch.matmul(X,w)+b
    y += torch.normal(0,0.01,y.shape)
    return X,y.reshape((-1,1))

true_w=torch.tensor([2,-3.4])
true_b=4.2
features,labels=synthetic_data(true_w,true_b,1000)

features中的每一行都包含一个二维数据样本，labels中的每一行都包含一维标签值

In [9]:
print('features:',features[0],'\label:',labels[0])

features: tensor([-0.7528, -0.4446]) \label: tensor([4.2052])


定义一个data_iter函数，该函数接收批量大小、特征矩阵和标签向量作为输入，生成大小为batch_size的小批量

In [10]:
def data_iter(batch_size,features,labels):
    num_examples=len(features)
    indices=list(range(num_examples))
    #这些样本是随机读取的，没有特定的顺序
    random.shuffle(indices)
    for i in range(0,num_examples,batch_size):
        batch_indices=torch.tensor(indices[i:min(i+batch_size,num_examples)])
        yield features[batch_indices],labels[batch_indices]

batch_size=10
for X,y in data_iter(batch_size,features,labels):
    print(X,'\n',y)
    break


tensor([[ 1.2564,  1.0902],
        [-2.6264, -1.3179],
        [-1.7285,  0.0992],
        [ 0.4420, -0.0416],
        [ 0.4906,  0.6904],
        [-0.8442,  1.4893],
        [ 0.9545,  0.8187],
        [-1.3342, -2.7616],
        [ 0.1839, -0.5431],
        [-1.1986,  1.0301]]) 
 tensor([[ 3.0126],
        [ 3.4198],
        [ 0.4012],
        [ 5.2131],
        [ 2.8336],
        [-2.5373],
        [ 3.3066],
        [10.9428],
        [ 6.4039],
        [-1.6957]])


定义初始化模型参数

In [29]:
w=torch.normal(0,0.01,size=(2,1),requires_grad=True)
b=torch.zeros(1,requires_grad=True)

定义模型

In [12]:
def linreg(X,w,b):
    """线性回归模型"""
    return torch.matmul(X,w)+b

定义损失函数

In [13]:
def squared_loss(y_hat,y):
    """均方损失"""
    return (y_hat-y.reshape(y_hat.shape))**2/2

定义优化算法

In [16]:
def sgd(params,lr,batch_size):
    """小批量随机梯度下降"""
    with torch.no_grad():
        for param in params:
            param-=lr*param.grad/batch_size
            param.grad.zero_()

训练过程

In [31]:
lr=0.04
num_epochs=3
net=linreg
loss=squared_loss

for epoch in range(num_epochs):
    for X,y in data_iter(batch_size,features,labels):
        l=loss(net(X,w,b),y)#X和y的小批量损失
        #因为l形状是（batch_size,1）,而不是一个标量，因为上面损失函数并没有求和
        l.sum().backward()#计算梯度
        sgd([w,b],lr,batch_size)#使用参数的梯度更新
    with torch.no_grad():
        train_l=loss(net(features,w,b),labels)
        print(f'epoch {epoch+1},loss{float(train_l.mean()):f}')

epoch 1,loss0.000051
epoch 2,loss0.000051
epoch 3,loss0.000051


比较真实参数和通过训练学到的参数来评估训练的成功程度

In [20]:
print(f'w的估计误差：{true_w-w.reshape(true_w.shape)}')
print(f'b的估计误差：{true_b-b}')

w的估计误差：tensor([0.0002, 0.0003], grad_fn=<SubBackward0>)
b的估计误差：tensor([-0.0005], grad_fn=<RsubBackward1>)
