本页面练习pytorch模型。参考pytorch官方教程页面中的同名部分

https://pytorch.org/tutorials/beginner/introyt/modelsyt_tutorial.html

## `torch.nn.Module` and `torch.nn.Parameter`
一般我们定义一个新的神经网络时，继承自类`torch.nn.Module`,，其中所有模型参数是类`torch.nn.Parameter`的instance.These parameters may be accessed through the `parameters()` method on the Module class.

In [1]:
import torch

class TinyModel(torch.nn.Module):

    def __init__(self):
        super(TinyModel, self).__init__()

        self.linear1 = torch.nn.Linear(100, 200)
        self.activation = torch.nn.ReLU()
        self.linear2 = torch.nn.Linear(200, 10)
        self.softmax = torch.nn.Softmax()

    def forward(self, x):
        x = self.linear1(x)
        x = self.activation(x)
        x = self.linear2(x)
        x = self.softmax(x)
        return x

tinymodel = TinyModel()

print('The model:')
print(tinymodel)

print('\n\nJust one layer:')
print(tinymodel.linear2)

print('\n\nModel params:')
for param in tinymodel.parameters():
    print(param)

print('\n\nLayer params:')
for param in tinymodel.linear2.parameters():
    print(param)

The model:
TinyModel(
  (linear1): Linear(in_features=100, out_features=200, bias=True)
  (activation): ReLU()
  (linear2): Linear(in_features=200, out_features=10, bias=True)
  (softmax): Softmax(dim=None)
)


Just one layer:
Linear(in_features=200, out_features=10, bias=True)


Model params:
Parameter containing:
tensor([[-0.0706,  0.0302,  0.0636,  ...,  0.0524, -0.0039,  0.0952],
        [-0.0879,  0.0449, -0.0947,  ...,  0.0522, -0.0343,  0.0918],
        [-0.0735,  0.0914, -0.0796,  ...,  0.0145,  0.0166, -0.0992],
        ...,
        [ 0.0501, -0.0721,  0.0627,  ..., -0.0322,  0.0625,  0.0443],
        [ 0.0007,  0.0360,  0.0270,  ...,  0.0230,  0.0898, -0.0896],
        [-0.0099,  0.0504,  0.0816,  ..., -0.0648,  0.0191,  0.0771]],
       requires_grad=True)
Parameter containing:
tensor([-0.0509,  0.0045,  0.0606, -0.0004,  0.0384,  0.0341, -0.0694, -0.0241,
        -0.0970,  0.0419, -0.0061,  0.0358,  0.0389,  0.0709, -0.0919,  0.0003,
         0.0051, -0.0936,  0.0576,  0.06

## Common Layer Types
### Linear Layers
If a model has m inputs and n outputs, the weights will be an m x n matrix. For example:  

注意在参数中有bias，即常量维度

In [2]:
lin = torch.nn.Linear(3, 2)
x = torch.rand(1, 3)
print('Input:')
print(x)

print('\n\nWeight and Bias parameters:')
for param in lin.parameters():
    print(param)

y = lin(x)
print('\n\nOutput:')
print(y)

Input:
tensor([[0.6346, 0.6580, 0.4436]])


Weight and Bias parameters:
Parameter containing:
tensor([[-0.3030,  0.2797, -0.1077],
        [ 0.3205,  0.2898, -0.3379]], requires_grad=True)
Parameter containing:
tensor([0.3193, 0.3791], requires_grad=True)


Output:
tensor([[0.2632, 0.6232]], grad_fn=<AddmmBackward0>)


### Convolutional Layers
torch.nn.Conv2d(input_channel, output_channel, kernel_size),第一个参数是输入的channel数量，对于黑白图像取1，对于RGB三通道图像取3，第二个参数是输出的卷积特征的个数，第三个是卷积核大小。Here, the “5” means we’ve chosen a 5x5 kernel. (If you want a kernel with height different from width, you can specify a tuple for this argument - e.g., (3, 5) to get a 3x5 convolution kernel.)

In [3]:
import torch.functional as F


class LeNet(torch.nn.Module):

    def __init__(self):
        super(LeNet, self).__init__()
        # 1 input image channel (black & white), 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = torch.nn.Conv2d(1, 6, 5)
        self.conv2 = torch.nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = torch.nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = torch.nn.Linear(120, 84)
        self.fc3 = torch.nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

### Recurrent Layers
下图展示了循环神经网络在三个相邻时间步的计算逻辑。在任意时间步 $t$, 隐状态的计算可以被视为:
1. 拼接当前时间步 $t$ 的输入 $\mathbf{X}_{t}$ 和前一时间步 $t - 1$ 的隐状态 $\mathbf{H}$ ；
2. 将拼接的结果送入带有激活函数 $\phi$ 的全连接层。全连接层的输出是当前时间步 $t$ 的隐状态 $\mathbf{H}_{t}$ 。
在本例中, 模型参数是 $\mathbf{W}_{xh}$ 和 $\mathbf{W}_{hh}$ 的拼接, 以及 $\mathbf{b}_{h}$ 的偏置, 所有这些参数都来自 (8.4.5)。当前时间步 $t$ 的隐 状态 $\mathbf{H}_{t}$ 将参与计算下一时间步 $t+1$ 的隐状态 $\mathbf{H}_{t}$ 。而且 $\mathbf{H}_{t}$ 还将送入全连接输出层, 用于计算当前时间步 $t$ 的输出 $\mathbf{O}_{t}$ 。
![rnn](https://zh.d2l.ai/_images/rnn.svg)

In [4]:
class LSTMTagger(torch.nn.Module):
    def __init__(self, embedding_dim, hidden_dim, vocabulary_size, tagset_size):
        super(self, LSTMTagger).__init__()
        self.hidden_dim = hidden_dim
        self.word_embeddings = torch.nn.Embedding(
            vocabulary_size, embedding_dim)
        self.lstm = torch.nn.LSTM(embedding_dim, hidden_dim)
        self.hidden2tag = torch.nn.Linear(hidden_dim, tagset_size)

    def forward(self, sentence):
        embeds = self.word_embeddings(sentence)
        print(embeds)
        lstm_out, _ = self.lstm(embeds.view(len(sentence), 1, -1))  
        # input of shape (seq_len, batch, input_size)
        # output of shape (seq_len, batch, num_directions * hidden_size):
        tag_space = self.hidden2tag(lstm_out.view(len(sentence),-1))  
        tag_scores = F.log_softmax(tag_space,dim=1)
        return tag_scores
    

## maxpooling

In [5]:
my_tensor = torch.rand(1, 6, 6)
print(my_tensor)

maxpool_layer = torch.nn.MaxPool2d(3)
print(maxpool_layer(my_tensor))

tensor([[[0.5973, 0.2610, 0.5401, 0.3649, 0.4927, 0.2900],
         [0.5783, 0.4137, 0.2434, 0.4632, 0.5726, 0.0423],
         [0.1840, 0.3327, 0.1273, 0.8292, 0.5662, 0.3883],
         [0.9159, 0.8889, 0.7790, 0.0520, 0.8544, 0.1143],
         [0.8702, 0.8769, 0.1738, 0.8905, 0.8930, 0.5439],
         [0.8401, 0.1985, 0.3133, 0.3256, 0.4264, 0.8857]]])
tensor([[[0.5973, 0.8292],
         [0.9159, 0.8930]]])


## dropout layers
Dropout layers are a tool for encouraging sparse representations in your model - that is, pushing it to do inference with less data.

Dropout layers work by randomly setting parts of the input tensor during training - dropout layers are always turned off for inference. This forces the model to learn against this masked or reduced dataset. For example:

In [7]:
my_tensor = torch.rand(1, 4, 4)

dropout = torch.nn.Dropout(p=0.4)
print(my_tensor)
print(dropout(my_tensor))

tensor([[[0.3020, 0.7684, 0.0335, 0.0034],
         [0.9903, 0.7513, 0.4531, 0.1564],
         [0.7358, 0.1040, 0.3232, 0.5796],
         [0.9785, 0.0677, 0.5032, 0.1122]]])
tensor([[[0.5034, 1.2807, 0.0558, 0.0056],
         [1.6505, 1.2521, 0.7552, 0.0000],
         [1.2263, 0.1734, 0.5387, 0.0000],
         [1.6309, 0.1129, 0.8387, 0.1870]]])
