### LSTM and Bi-LSTM in Pytorch

#### 构造LSTM时的参数列表，也就是初始化LSTM层的参数列表

    input_size – The number of expected features in the input x       （一个词的词向量的维度）
    hidden_size – The number of features in the hidden state h         （存储隐藏状态的矩阵维度）
    num_layers – Number of recurrent layers. E.g., setting num_layers=2 would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1
    nonlinearity – The non-linearity to use. Can be either 'tanh' or 'relu'. Default: 'tanh' （非线性激活函数）
    bias – If False, then the layer does not use bias weights b_ih and b_hh. Default: True    （是否使用偏移量）
    batch_first – If True, then the input and output tensors are provided as (batch, seq, feature). Default: False
    dropout – If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to dropout. Default: 0
    bidirectional – If True, becomes a bidirectional RNN. Default: False



In [2]:
# 导入必须的包
import torch
import torch.nn as nn

In [35]:
# LSTM 例子
num_layers = 1
directional = 1
rnn = nn.LSTM(input_size=10, hidden_size=20, num_layers=num_layers)       # 每个词向量的维度是10， 隐层特征个数是20， 层数是一层
input = torch.randn(5, 4, 10) # 构造输入数据 batchsize是4，意味着提供了4个句子，每个句子有5个词构成，每个词是一个1x10的向量
# 初始化hidden state权重short term memory
# h_0 of shape (num_layers * num_directions, batch, hidden_size)
h0 = torch.randn(1 * 1, 4, 20) 
# 初始化cell state权重long term memory
# c_0 of shape (num_layers * num_directions, batch, hidden_size)
c0 = torch.randn(1 * 1, 4, 20)  
output, (hn, cn) = rnn(input, (h0, c0))

In [29]:
brnn = nn.LSTM(input_size=10, hidden_size=20, num_layers=2, bidirectional=True)   # 使用双向LSTM之后 

bih0 = torch.randn(2*2, 4, 20)
bic0 = torch.randn(2*2, 4, 20)
bioutput, (bihn, bicn) = brnn(input, (bih0, bic0))

In [30]:
print(output.size())
print(bioutput.size())

torch.Size([5, 4, 20])
torch.Size([5, 4, 40])


In [32]:
print(hn.size())
print(cn.size())
print(bihn.size())
print(bicn.size())

torch.Size([1, 4, 20])
torch.Size([1, 4, 20])
torch.Size([4, 4, 20])
torch.Size([4, 4, 20])


通过hidden state和cell state的权重初始化可以看出，训练的时候，其实是训练了batch_size套的权重.
而层数和方向成倍增加了短期记忆参数和长期记忆参数的个数