# 单隐含层感知机网络的实现
1. 数据集： FashionMinist数据集, 训练集60000条数据, 验证集1000条, 输入特征维数$28*28=784$，输出分类数为10, 以batch_size=256的大小读取数据集。
<br>
2. 感知机网络结构：隐含层256个节点，两个权重矩阵W1和W2, W1连接了输入层和隐含层，形状是(d, h) = (784, 256), W2连接的是隐含层和输出层，形状是(h, q) = (256, 10)。两个偏置列向量b1和b2，维度分别是h=256以及q=10。另外隐含层节点的非线性变换是ReLU函数。输入数据是维度是(n, d)=(256, 784)，输出是(n, q) = (256, 10)
<br>
3. 交叉熵损失和优化函数SGD：进行参数训练

In [2]:
import torch
from torchvision.datasets import FashionMNIST
from torch.utils.data import DataLoader
from torchvision import transforms
# 1. 导入数据
batch_size = 256
trans = transforms.ToTensor()
minist_train = FashionMNIST(root="../data", train=True, transform=trans)
minist_test = FashionMNIST(root="../data", train=False, transform=trans)
train_iter = DataLoader(minist_train, shuffle=True, num_workers=4, batch_size=256)
test_iter = DataLoader(minist_test, shuffle=False, num_workers=4, batch_size=256)
for X, y in train_iter:
    print(X.shape)
    break

torch.Size([256, 1, 28, 28])


In [16]:
# 2. 构造感知器网络
from torch import nn
input_size = 28*28
output_size = 10
hidden_size = 256
# W1(input_size, hidden_size)  W2(hidden_size, output_size)
# b1(hidden_size, 1) b2(output_size, 1)
# X(batch_size, input_size) y_hat(batch_size, output_size)
W1 = nn.Parameter(torch.normal(0, 0.1, size=(input_size, hidden_size), requires_grad=True))
W2 = nn.Parameter(torch.normal(0, 0.1, size=(hidden_size, output_size), requires_grad=True))
b1 = nn.Parameter(torch.zeros(hidden_size, requires_grad=True))
b2 = nn.Parameter(torch.zeros(output_size, requires_grad=True))

def relu(X):
    return torch.max(X, torch.zeros_like(X))

# 这里“@”代表矩阵乘法
def net(X):
    Hz = torch.matmul(X, W1) + b1 # 第一层的仿射变换
    return torch.matmul(relu(Hz), W2) + b2 # 套上ReLU再进行第二层仿射变换

In [17]:
# 3. 损失函数以及优化函数
lr = 0.03
cross_entropy = nn.CrossEntropyLoss()
sgd = torch.optim.SGD(params=[W1, W2, b1, b2], lr=lr)
# 4. 进行训练
def train(train_iter, epoch_num, net, loss, optim):
    for epoch in range(epoch_num):
        tl, en = 0, 0
        for X, y in train_iter:
            y_hat = net(X.reshape(-1, input_size))
            l = loss(y_hat, y)
            optim.zero_grad()
            l.backward()
            optim.step()
            en += len(y)
            tl += len(y)*l
        with torch.no_grad():
            print(f"epoch {epoch+1}: train-loss = {tl/en:.5f}")

train(train_iter, 10, net, cross_entropy, sgd)

epoch 1: train-loss = 0.84860
epoch 2: train-loss = 0.58720
epoch 3: train-loss = 0.53083
epoch 4: train-loss = 0.49957
epoch 5: train-loss = 0.47825
epoch 6: train-loss = 0.46328
epoch 7: train-loss = 0.45256
epoch 8: train-loss = 0.44205
epoch 9: train-loss = 0.43252
epoch 10: train-loss = 0.42468


In [26]:
# 计算训练集和验证集的准确度
def evaluate(data_iter, net):
    acc, tt = 0, 0
    for X, y in data_iter:
        with torch.no_grad():
            y_hat = net(X.reshape(-1, input_size))
            y_pre = torch.argmax(y_hat, dim=1)
            acc += (y_pre==y).sum()
            tt += len(y)
    return acc, tt

acc, tt = evaluate(train_iter, net)
print(f"Training set: acc={acc}, total={tt}, accuracy={(acc/tt)*100:.3f}%")
acc, tt = evaluate(test_iter, net)
print(f"Testing set: acc={acc}, total={tt}, accuracy={(acc/tt)*100:.3f}%")


Training set: acc=51163, total=60000, accuracy=85.272%
Testing set: acc=8393, total=10000, accuracy=83.930%


1. `torch.zeros_like(X)`: 创建一个形状和X相同的0张量
2. `torch.max(S, V)`: 是逐元素操作的函数，S和V形状相同
```py
tst_aa = torch.tensor([[1, -2.0], [2, -1.0]])
relu(tst_aa)
```
---

广播机制
```py
a = torch.tensor([[1, -2.0], [2, -1.0], [-1.0, -2.0]])
b = torch.tensor([[1], [2], [3]])
a + b # ok, a is (3, 2) and b is (3, 1) --> 按列广播
b = torch.tensor([[1, 2]])
a + b # ok, now b is (1, 2), we can transpose b by row
b = torch.tensor([1, 2])
a + b 
# of course right, now b is a vector, just add b to each row of a
```
---

`loss = nn.CrossEntropyLoss(reduction='mean')`
> reduction: （string，可选）”none”：不应用任何缩减，“mean”：取输出的加权平均值，“sum”：输出将被求和。

定义loss后，通常这样使用 l = loss(y_hat, y), 其中y_hat和y是相同维度的向量, 顺序不可颠倒
交叉熵损失的内部实现其实包含了softmax操作，也即对于y_hat的每一行进行了e指数归一化，随后是熵操作，也即取负对数，随后按照每一个y的分量进行index后求和或取平均或者直接返回向量形式

参考:
【1】[nn.CrossEntropyLoss详解](https://blog.csdn.net/Lucinda6/article/details/116162198)

---
## 使用高级的API进行训练
主要是参数定义的简化，Seq+init模式

In [31]:
# Sequtial + init
model = nn.Sequential(
    nn.Flatten(), # 类似于reshape
    nn.Linear(input_size, hidden_size), # input_layer
    nn.ReLU(), # 激活函数
    nn.Linear(hidden_size, hidden_size) # output_layer
)
def init_params(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, 0.1)
model.apply(init_params)
loss = nn.CrossEntropyLoss()
optim = torch.optim.SGD(model.parameters(), lr=0.03)
train(train_iter, 10, model, cross_entropy, optim)

epoch 1: train-loss = 115.44628
epoch 2: train-loss = 19.59120
epoch 3: train-loss = 12.28244
epoch 4: train-loss = 8.63245
epoch 5: train-loss = 6.89842
epoch 6: train-loss = 5.89781
epoch 7: train-loss = 4.87764
epoch 8: train-loss = 4.25964
epoch 9: train-loss = 3.84621
epoch 10: train-loss = 3.39110


In [32]:
train(train_iter, 10, model, cross_entropy, optim)
acc, tt = evaluate(train_iter, model)
print(f"Training set: acc={acc}, total={tt}, accuracy={(acc/tt)*100:.3f}%")
acc, tt = evaluate(test_iter, model)
print(f"Testing set: acc={acc}, total={tt}, accuracy={(acc/tt)*100:.3f}%")


epoch 1: train-loss = 3.10593
epoch 2: train-loss = 2.85907
epoch 3: train-loss = 2.58393
epoch 4: train-loss = 2.41372
epoch 5: train-loss = 2.23298
epoch 6: train-loss = 2.11691
epoch 7: train-loss = 1.93269
epoch 8: train-loss = 1.87558
epoch 9: train-loss = 1.73258
epoch 10: train-loss = 1.64393
Training set: acc=50396, total=60000, accuracy=83.993%
Testing set: acc=8133, total=10000, accuracy=81.330%
