权重衰减是为了解决过拟合问题。

过拟合一般来说有两个方面：
* 数据量太少，导致网络记住了训练数据集
* 特征值太多，也是导致网络记住了训练数据集

In [1]:
import torch
import d2l as d2l
from torch import nn

准备训练数据和测试数据

In [7]:
n_train = 20
n_test = 100
num_inputs = 200
batch_size = 5
num_epochs, lr = 100, 0.003

true_w, true_b = torch.ones((num_inputs, 1)) * 0.01, 0.05

train_data = d2l.synthetic_data(true_w, true_b, n_train)
train_iter = d2l.load_array(train_data, batch_size)

test_data = d2l.synthetic_data(true_w, true_b, n_test)
test_iter = d2l.load_array(test_data, batch_size, is_train=False)

In [8]:
# 初始化权重和偏置
def init_params():
    w = torch.normal(0, 1, size=(num_inputs, 1), requires_grad=True)
    b = torch.zeros(1, requires_grad=True)
    return [w, b]


# 定义L2范数惩罚
def l2_penalty(w):
    return torch.sum(w.pow(2)) / 2  # 除以2为了方便求导

In [26]:
# 训练
def train(lambd):
    w, b = init_params()
    #    net, loss = lambda X: d2l.linreg(X, w, b), d2l.squared_loss

    for epoch in range(num_epochs):
        for X, y in train_iter:
            with torch.enable_grad():
                l = d2l.squared_loss(d2l.linreg(X, w, b),
                                     y) + lambd * l2_penalty(w)
            l.sum().backward()
            d2l.sgd([w, b], lr, batch_size)
    print('w的L2范数是：', torch.norm(w).item())


train(lambd=9)

w的L2范数是： 0.1392524391412735
