## 过拟合与欠拟合现象
以线性回归模型为基础，拟合目标是$$y = 5 + 1.2x - 3.4\frac{x^2}{2!} + 5.6 \frac{x^3}{3!} + \epsilon \text{ where }
\epsilon \sim \mathcal{N}(0, 0.1^2).$$，实验通过变化不同的迭代次数以及最高幂次来调整模型复杂度，数据集可以自行生成，注意区分训练集和验证集

In [3]:
# 生成模拟数据，训练集100条，验证集100条
import torch
import math
import numpy as np

max_degree = 20 # 多项式最高幂次
n_train, n_test = 100, 100
w_true = np.zeros(max_degree)
w_true[0:4] = np.array([5, 1.2, -3.4, 5.6])
# x: 200行1列, 目前仅1个特征x1: x
x1 = np.random.normal(size=(n_train+n_test, 1))
# 从x1生成其他多项式特征, 包括 x0: 1, x2: (x^2/2)等等
np.random.shuffle(x1)
X = np.power(x1, np.arange(max_degree).reshape(1, -1))
X.shape, X[0, :]

((200, 20),
 array([ 1.00000000e+00, -4.10563942e-01,  1.68562751e-01, -6.92057875e-02,
         2.84134010e-02, -1.16655179e-02,  4.78944102e-03, -1.96637179e-03,
         8.07321354e-04, -3.31457038e-04,  1.36084308e-04, -5.58713101e-05,
         2.29387453e-05, -9.41782171e-06,  3.86661801e-06, -1.58749393e-06,
         6.51767768e-07, -2.67592344e-07,  1.09863768e-07, -4.51061017e-08]))

In [7]:
# 这里采用阶乘来scale各个特征
for i in range(max_degree):
    X[:, i] /= math.gamma(i+1) # gamma(i) = (i+1)!
# 生成标签, 添加噪声
y = np.dot(X, w_true)
y += np.random.normal(scale=0.1, size=y.shape)
X_train, y_train, X_val, y_val = X[:n_train, :], y[:n_train], X[n_train:, :], y[n_train:]
w_true, X_train, y_train, X_val, y_val = [torch.tensor(i, dtype=torch.float32) for i in [w_true, X_train, y_train, X_val, y_val]]
w_true.shape, X_train.shape, y_train.shape, X_val.shape, y_val.shape

(torch.Size([20]),
 torch.Size([100, 20]),
 torch.Size([100]),
 torch.Size([100, 20]),
 torch.Size([100]))

In [11]:
from torch import nn
from torch.utils import data
# 进行训练
def load_data(X, y, batchsize, is_train=True):
    dataset = data.TensorDataset(X, y)
    return data.DataLoader(dataset, shuffle=is_train, batch_size=batchsize)

def train(X_train, y_train, X_val, y_val, num_epoch=400):
    loss = nn.MSELoss()
    input_size = X_train.shape[1]
    model = nn.Sequential(nn.Linear(input_size, 1, bias=False))
    trainer = torch.optim.SGD(model.parameters(), lr=0.01)
    # 设置batch_size, 构造batch_iter
    train_iter = load_data(X_train, y_train, batchsize=X_train.shape[0])
    # 进行训练
    train_loss = []
    val_loss = []
    for epoch in range(num_epoch):
        for X, y in train_iter:
            l = loss(model(X), y)
            trainer.zero_grad()
            l.backward()
            trainer.step()
        with torch.no_grad():
            train_loss.append(loss(model(X_train, y_train)))
            val_loss.append(loss(model(X_val, y_val)))
    return train_loss, val_loss, model[0].weight.data

from matplotlib import pyplot as plt
def lossPlot(train_loss, val_loss):
    plt.plot(range(len(train_loss)), train_loss)
    plt.plot(range(len(val_loss)), val_loss)    
    plt.show()


In [12]:
def demostrate(dim=3, num_epoch=400):
    _train, _val = X_train[:, :dim], X_val[:, :dim]
    tl, vl, w = train(_train, y_train, _val, y_val, num_epoch=num_epoch)
    lossPlot(tl, vl)
    return w
# 正常情况
demostrate()

  return F.mse_loss(input, target, reduction=self.reduction)


NameError: name 'optim' is not defined

#### Dataset+DataLoader--创建和读取自己的数据集
`from torch.utils.data import Dataset, DataLoader`
[Pytorch（五）入门：DataLoader 和 Dataset](https://blog.csdn.net/zw__chen/article/details/82806900)
1. Dataset: (deal_dataset = TensorDataset(x_data, y_data))

2. DataLoader: DataLoader是一个比较重要的类，它为我们提供的常用操作有：batch_size(每个batch的大小), shuffle(是否进行shuffle操作), num_workers(加载数据的时候使用几个子进程)