In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

## Embedding
![Embedding](./img/embedding.png)

onehotベクトルを入力とする全結合層。onehotベクトルの$1$となるインデックス$\rm idx$を入力として、そこに相当する行を抜き出す。

$$
{\bf h} = {\bf W}_{{\rm idx}, *}
$$

### サイズ
|　変数　|　名前　|　サイズ　|
|:---:|:---:|:---:|
|-|onehotベクトル|`(n_alphabet)`|
|${\rm idx}$|インデックス|`(1)`|
|${\bf h}$|埋め込みベクトル|`(embed_dim)`|
|${\bf W}$|重み|`(n_alphabet, embed_dim)`|

In [2]:
n_alphabet = 10; embed_dim = 3
embedding = nn.Embedding(n_alphabet, embed_dim)
idxs = torch.LongTensor([[0,2,4,5],[4,0,2,9]])
print(idxs)
embed = embedding(idxs)
print(embed)

# zero_padding
PAD = 0
embedding = nn.Embedding(n_alphabet, embed_dim, padding_idx=PAD)
embed = embedding(idxs)
print(embed)

tensor([[0, 2, 4, 5],
        [4, 0, 2, 9]])
tensor([[[ 1.6995,  0.9900, -0.4197],
         [-0.6903, -1.0622,  0.0646],
         [-0.7134,  0.3108,  0.2643],
         [-0.6781, -0.6527,  0.7753]],

        [[-0.7134,  0.3108,  0.2643],
         [ 1.6995,  0.9900, -0.4197],
         [-0.6903, -1.0622,  0.0646],
         [ 0.7821,  0.2614, -3.2782]]], grad_fn=<EmbeddingBackward>)
tensor([[[ 0.0000,  0.0000,  0.0000],
         [ 0.0819,  0.0736,  0.9592],
         [-0.2574,  0.0104,  1.1147],
         [-0.8644,  0.2818,  1.0298]],

        [[-0.2574,  0.0104,  1.1147],
         [ 0.0000,  0.0000,  0.0000],
         [ 0.0819,  0.0736,  0.9592],
         [-0.2122,  1.0935, -0.4656]]], grad_fn=<EmbeddingBackward>)


## LSTM
![LSTM](./img/lstm.png)

入力${\bf x}_t$ `(in_dim)`と前時刻の隠れ状態${\bf h}_{t-1}$ `(hid_dim)`を使用して、記憶セル${\bf c}_{t-1}$ `(hid_dim)`と隠れ状態${\bf h}_{t}$ `(hid_dim)`を更新する。

$$
{\bf c}_t = {\bf f}_t{\bf c}_{t-1} + {\bf i}_t{\bf g}_t\\
{\bf h}_t = {\bf o}_t \tanh\left({\bf c}_t\right)
$$

前時刻の記憶セル${\bf c}_{t-1}$をどれだけ保持するか決める忘却ゲート${\bf f}_t$：
$$
{\bf f}_t = \sigma\left( {\bf W}_{if}{\bf x}_t + {\bf W}_{hf}{\bf h}_{t-1} \right)
$$

記憶セルに加算する${\bf g}_t$とそれの反映率を決める入力ゲート${\bf i}_t$：
$$
{\bf i}_t = \sigma\left( {\bf W}_{ii}{\bf x}_t + {\bf W}_{hi}{\bf h}_{t-1} \right)\\
{\bf g}_t = \tanh\left( {\bf W}_{ig}{\bf x}_t + {\bf W}_{hg}{\bf h}_{t-1} \right)\\
$$

隠れ状態${\bf h}_{t}$にどれだけ記憶セルの内容$\tanh( {\bf c}_t )$を反映するかを決める出力ゲート${\bf o}_t $：
$$
{\bf o}_t = \sigma\left( {\bf W}_{io}{\bf x}_t + {\bf W}_{ho}{\bf h}_{t-1} \right)
$$

### サイズ
|　変数　|　名前　|　サイズ　|
|:---:|:---:|:---:|
|${\bf x}$|入力|`(in_dim)`|`(seq, batch_size, in_dim)`|
|${\bf h}$|隠れ状態|`(hid_dim)`|
|${\bf c}$|記憶セル|`(hid_dim)`|
|${\bf W}_{i*}$|入力にかける重み|`(hid_dim, in_dim)`|
|${\bf W}_{h*}$|隠れ状態にかける重み|`(hid_dim, hid_dim)`|
|${\bf i, g, f, o}$|各ゲート等|`(hid_dim)`|

In [3]:
in_dim = 3; hid_dim = 3; batch_size = 5;
lstm = nn.LSTM(in_dim, hid_dim)
inputs = [torch.randn(1, in_dim) for _ in range(batch_size)]
inputs = torch.cat(inputs).view(len(inputs), -1, in_dim) # seq, batch, feature(default: batch_first=False)
print(inputs)

# initialize the cell hidden state.
hidden = (torch.randn(1, 1, 3), torch.randn(1, 1, 3))
out, hidden = lstm(inputs, hidden)
print(out)    # each hidden
print(hidden) # (last cell, last hidden)

tensor([[[-0.3707,  1.6203,  0.9399]],

        [[ 1.1986, -0.2543, -0.2785]],

        [[-0.1050, -2.2704, -1.5767]],

        [[ 0.4714,  0.3945,  1.2863]],

        [[-1.3864,  0.2574, -0.0920]]])
tensor([[[-0.4008, -0.0932,  0.2109]],

        [[-0.3955,  0.1253,  0.1145]],

        [[-0.1316,  0.1825, -0.0205]],

        [[-0.5473,  0.1051,  0.1221]],

        [[-0.0443,  0.1136,  0.1150]]], grad_fn=<CatBackward>)
(tensor([[[-0.0443,  0.1136,  0.1150]]], grad_fn=<ViewBackward>), tensor([[[-0.1209,  0.2656,  0.2066]]], grad_fn=<ViewBackward>))
