# RNN 循环神经网络——字符预测
## 使用嵌入层代替独热编码
![dmwb6Q](https://gitee.com/pxqp9W/testmarkdown/raw/master/imgs/2020/07/dmwb6Q.png)
![s9VCVx](https://gitee.com/pxqp9W/testmarkdown/raw/master/imgs/2020/07/s9VCVx.png)
![MQHFCF](https://gitee.com/pxqp9W/testmarkdown/raw/master/imgs/2020/07/MQHFCF.png)
- [万物皆Embedding，从经典的word2vec到深度学习基本操作item2vec - 知乎](https://zhuanlan.zhihu.com/p/53194407)
- [Embedding 的理解 - 知乎](https://zhuanlan.zhihu.com/p/46016518)
- [深度学习中的embedding_星辰大海，脚踏实地-CSDN博客_embedding](https://blog.csdn.net/qq_35799003/article/details/84780289)
- [深度学习中 Embedding层两大作用的个人理解_weixin_42078618的博客-CSDN博客_embedding层](https://blog.csdn.net/weixin_42078618/article/details/82999906)

## 嵌入层的理解
- 降维作用：
    - 假设：我们有一个2 x 6的矩阵，然后乘上一个6 x 3的矩阵后，变成了一个2 x 3的矩阵 -> 把一个12个元素的矩阵变成6个元素的矩阵
    - 假如我们有一个100W X10W的矩阵，用它乘上一个10W X 20的矩阵，我们可以把它降到100W X 20，瞬间量级降了10W/20=5000倍
    - 这就是嵌入层的一个作用——降维。中间那个 10W X 20的矩阵，可以理解为**查询表**，也可以理解为**映射表**，也可以理解为**过度表**
- 既然可以降维，当然也可以升维。
    - embedding的又一个作用体现了：对低维的数据进行升维时，可能把一些其他特征给放大了，或者把笼统的特征给分开了。
    - 同时，这个embedding是一直在学习在优化的，就使得整个拉近拉远的过程慢慢形成一个良好的观察点。


In [2]:
# Parameters
num_class = 4 
input_size = 4 
hidden_size = 8 
embedding_size = 10
num_layers = 2 
batch_size = 1 
seq_len = 5

In [3]:
# Dataset
idx2char = ['e', 'h', 'l', 'o'] # 字典
x_data = [[1, 0, 2, 2, 3]] # hello, 注意这里的shape
y_data = [3, 1, 2, 3, 2] # ohlol

inputs = torch.LongTensor(x_data) 
labels = torch.LongTensor(y_data)

In [4]:
# Design Model
class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__() 
        # Lookup matrix of Embedding:（𝒊𝒏𝒑𝒖𝒕𝑺𝒊𝒛𝒆, 𝒆𝒎𝒃𝒆𝒅𝒅𝒊𝒏𝒈𝑺𝒊𝒛𝒆）
        self.emb = torch.nn.Embedding(input_size, embedding_size) 
        # Input of RNN:（𝒃𝒂𝒕𝒄𝒉𝑺𝒊𝒛𝒆, 𝒔𝒆𝒒𝑳𝒆𝒏, 𝒆𝒎𝒃𝒆𝒅𝒅𝒊𝒏𝒈𝑺𝒊𝒛𝒆）
        # Output of RNN:（𝒃𝒂𝒕𝒄𝒉𝑺𝒊𝒛𝒆, 𝒔𝒆𝒒𝑳𝒆𝒏, 𝒉𝒊𝒅𝒅𝒆𝒏𝑺𝒊𝒛𝒆）
        self.rnn = torch.nn.RNN(input_size=embedding_size, 
                                hidden_size=hidden_size, 
                                num_layers=num_layers, 
                                batch_first=True)

        self.fc = torch.nn.Linear(hidden_size, num_class)

    def forward(self, x):
        hidden = torch.zeros(num_layers, x.size( 0), hidden_size) 
        x = self.emb(x) # (batch, seqLen, embeddingSize) 
        x, _ = self.rnn(x, hidden) 
        # Input of FC Layer: （𝒃𝒂𝒕𝒄𝒉𝑺𝒊𝒛𝒆, 𝒔𝒆𝒒𝑳𝒆𝒏, 𝒉𝒊𝒅𝒅𝒆𝒏𝑺𝒊𝒛𝒆）
        x = self.fc(x) 
        # Output of FC Layer：（𝒃𝒂𝒕𝒄𝒉𝑺𝒊𝒛𝒆, 𝒔𝒆𝒒𝑳𝒆𝒏, 𝒏𝒖𝒎𝑪𝒍𝒂𝒔𝒔）
        return x.view(-1, num_class) # Reshape result to use Cross Entropy

net = Model()

In [5]:
# Loss and Optimizer
criterion = torch.nn.CrossEntropyLoss() 
optimizer = torch.optim.Adam(net.parameters(), lr=0.5)

In [6]:
# Training Cycle

In [7]:
for epoch in range(15):
    optimizer.zero_grad() 
    outputs = net(inputs) 
    loss = criterion(outputs, labels) 
    loss.backward() 
    optimizer.step()
    
    _, idx = outputs.max(dim=1) 
    idx = idx.data.numpy() 
    print('Predicted: ', ''.join([idx2char[x] for x in idx]), end='') 
    print(', Epoch [%d/15] loss = %.3f' % (epoch + 1, loss.item()))

Predicted:  hhhhh, Epoch [1/15] loss = 1.398
Predicted:  loolo, Epoch [2/15] loss = 2.079
Predicted:  olllh, Epoch [3/15] loss = 1.616
Predicted:  ohloo, Epoch [4/15] loss = 0.700
Predicted:  ohool, Epoch [5/15] loss = 0.881
Predicted:  ohool, Epoch [6/15] loss = 0.586
Predicted:  lhlll, Epoch [7/15] loss = 0.583
Predicted:  lhlll, Epoch [8/15] loss = 0.557
Predicted:  ohlol, Epoch [9/15] loss = 0.176
Predicted:  ohlol, Epoch [10/15] loss = 0.037
Predicted:  ohlol, Epoch [11/15] loss = 0.021
Predicted:  ohlol, Epoch [12/15] loss = 0.018
Predicted:  ohlol, Epoch [13/15] loss = 0.015
Predicted:  ohlol, Epoch [14/15] loss = 0.012
Predicted:  ohlol, Epoch [15/15] loss = 0.013
