[toc]

# RNN Numpy 实现二—— 代码实现

这里我们取输入序列为 `abcdefghijklmnopqrstuvwxyz`，用这个序列来产生我们的训练样本。我们使用 one-hot encoding 来将字母编码。

In [52]:
import numpy as np
from sklearn.preprocessing import OneHotEncoder

input_word = list("I am learning RNN")
words = set(input_word)
input_word_onehot = OneHotEncoder(sparse=False).fit_transform(np.array(input_word).reshape(-1, 1))

我们来完成一个 Character-Level Language Models 的任务：用前一个字母来预测后一个字母。

假设我们有一句话，"hello"。我们希望：
输入 h 输出 e，
输入 he，输出 l, 
输入 hel，输出 l 。

这里，我们取序列长度为 3，因此我们的一个样本应该是

((h, e, l), (e, l, l))

In [53]:
# 序列长度取 3
sequenceLen = 3

x = []
y = []
for i in range(len(input_word_onehot) - sequenceLen):
    x.append(input_word_onehot[i:i + sequenceLen])
    y.append(input_word_onehot[i + 1: i + 1 + sequenceLen])  

x_train = np.array(x)
y_train = np.array(y)

我们定义两个初始化weight 和 bias 的函数 `get_weights` 和 `get_bias`，输入形状和类型就可以初始化 weight 和 bias

In [54]:
def get_weights(shape, dtype=np.float32):
    np.random.seed(123)
    return np.array(np.random.randn(*shape), dtype=dtype)

def get_bias(shape, dtype=np.float32):
    return np.zeros(shape, dtype=dtype)

定义一些形状相关的参数

In [55]:
n_class = len(words) # 所有字母的个数
nx = n_class
ny = n_class
nh = 4

为了方便管理，将 weight 和 bias 放在一个字典中，并初始化。

In [56]:
weights = {
    'Wxh': get_weights((nx, nh)),
    'Why': get_weights((nh, ny)),
    'Whh': get_weights((nh, nh)),
    'bh': get_bias((1, nh)),
    'by': get_bias((1, ny))
}

定义一个 softmax 函数，之后会用到，注意防止溢出

In [57]:
def softmax(a):
    c = np.max(a)
    exp_a = np.exp(a - c)
    sum_exp = np.sum(exp_a)
    y = exp_a / sum_exp
    return y

## 前向传播

为了简化思考，我们先考虑只输入一个样本的情况。

每个样本的形状为 (3 x 12, 3 x 12)，X 包含三个词，每个词 one-hot 之后的形状为 12，因此是 3 x 12。 Y 也包含三个词，因此形状也是 3 x 12。

In [58]:
def forward(xs, weights):
    Why = weights['Why']
    Whh = weights['Whh']
    Wxh = weights['Wxh']
    bh = weights['bh']
    by = weights['by']

    n_sequence = xs.shape[0]
    ny = Why.shape[1]
    nh = Wxh.shape[1]

    a = np.zeros((n_sequence, nh))
    h = np.zeros((n_sequence, nh))
    o = np.zeros((n_sequence, ny))
    yhat = np.zeros((n_sequence, ny))
    hprev = None

    for t, x in enumerate(xs):
        if t == 0:
            hprev = np.zeros((1, nh))
        else:
            hprev = h[t - 1]

        a[t] = np.matmul(x, Wxh) + np.matmul(hprev, Whh) + bh
        h[t] = np.tanh(a[t])
        o[t] = np.matmul(h[t], Why) + by
        yhat[t] = softmax(o[t])
    return yhat, a, h, o

## 反向传播

反向传播直接应用上一篇计算出来的公式即可。

In [59]:
def backward(xs, ys, weights, a, o, h, yhat):
    n_sequences = xs.shape[0]

    Why = weights['Why']
    Whh = weights['Whh']
    Wxh = weights['Wxh']
    bh = weights['bh']
    by = weights['by']

    grads = {name: np.zeros_like(weights[name]) for name in weights}
    danext = None
    for i in range(n_sequences - 1, -1, -1):
        if i == n_sequences - 1:
            danext = np.zeros_like(a[i:i + 1])

        dot = yhat[i:i + 1] - ys[i:i + 1]

        # backprop through ot
        dby = dot
        dWhy = np.matmul(h[i:i + 1].T, dot)
        dht = np.matmul(dot, Why.T) + np.matmul(danext, Whh.T)
        dWhh = np.matmul(h[i:i + 1].T, danext)

        # backprop through ht
        dat = dht * (1 - h[i:i + 1] ** 2)

        # backprop through at
        dWxh = np.matmul(xs[i:i + 1].T, dat)
        dbh = dat

        # 累加梯度
        grads['by'] += dby
        grads['bh'] += dbh
        grads['Whh'] += dWhh
        grads['Wxh'] += dWxh
        grads['Why'] += dWhy
        danext = dat

    for k in grads:
        grads[k] = grads[k] / n_sequences
    return grads

计算结果，这里我们只输入一个样本测试结果

In [61]:
x, y = x_train[0], y_train[0]
yhat, a, h, o = forward(x, weights)
grads = backward(x, y, weights, a, o, h, yhat)
for name in grads:
    print(name)
    print(grads[name])

Wxh
[[-5.8980067e-03 -1.0567507e-03  5.3060856e-02  3.2835844e-01]
 [-2.1574606e-01 -4.6572205e-02 -1.2665960e-03 -2.1078084e-01]
 [ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00]
 [ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00]
 [ 2.6740572e-01 -2.2931943e-04 -1.3003336e-01  3.1455082e-03]
 [ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00]
 [ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00]
 [ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00]
 [ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00]
 [ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00]
 [ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00]
 [ 0.0000000e+00  0.0000000e+00  0.0000000e+00  0.0000000e+00]]
Why
[[ 8.50889087e-02  6.36707619e-03 -7.47825659e-04 -1.98840220e-02
   2.58626133e-01 -2.30374616e-02 -1.09052859e-01 -2.62917131e-02
   4.05798629e-02 -2.19466284e-01  3.74572002e-03  4.07243753e-03]
 [-1.46826804e-01  1.54953711e-02  2