## 循环神经网络的工作过程

循环神经网络是一个处理时间序列数据的神经网络结构，也就是说，我们需要在脑海里有一根时间轴，循环神经网络具有初始状态 $s_0$ ，在每个时间点 $t$ 迭代对当前时间的输入 $x_t$ 进行处理，修改自身的状态 $s_t$ ，并进行输出 $o_t$ 。

循环神经网络的核心是状态 $s$ ，是一个特定维数的向量，类似于神经网络的 “记忆”。在 $t=0$ 的初始时刻，$s_0$ 被赋予一个初始值（常用的为全 0 向量）。然后，我们用类似于递归的方法来描述循环神经网络的工作过程。即在 $t$ 时刻，我们假设 $s_{t-1}$ 已经求出，关注如何在此基础上求出 $s_{t}$ ：

- 对输入向量 $x_t$ 通过矩阵 $U$ 进行线性变换，$U x_t$ 与状态 $s$ 具有相同的维度；

- 对 $s_{t-1}$ 通过矩阵 $W$ 进行线性变换，$W s_{t-1}$ 与状态 $s$ 具有相同的维度；

- 将上述得到的两个向量相加并通过激活函数，作为当前状态 $s_t$ 的值，即 $s_t = f(U x_t + W s_{t-1})$。也就是说，当前状态的值是上一个状态的值和当前输入进行某种信息整合而产生的；

- 对当前状态 $s_t$ 通过矩阵 $V$ 进行线性变换，得到当前时刻的输出 $o_t$。

![title](rnn_cell_zh.png)

## 循环神经网络基本结构

RNN 常见的结构如下

![titile](rnn.png)

其中的单元 A 可以是全连接RNN，LSTM，GRU。

tensorfow2.0中将这三个封装到以下接口中：

- keras.layers.SimpleRNN

- keras.layers.GRU

- keras.layers.LSTM

对于序列预测，如机器翻译，我们需要同时获得所有的单元A的输出$(h_0,h_1\cdots,h_t)$。对于分类和回归问题，则只需要最后一个单元输出$h_t$。这个过程通过参数 return_sequences=True 控制。


In [1]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np
import sklearn
import pandas as pd
import os
import sys
import time
import matplotlib.pyplot as plt
%matplotlib inline

In [2]:
num_words = 300
sequence_length = 300
embedding_dimension = 100

# 生成数据
def gen_datasets():
    '''
    generate train datasets and testing datasets from imdb data
    :return:
        x_train, y_train,
        x_test, y_test
    '''
    (x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=num_words)
    return  (x_train, y_train), (x_test, y_test)

(x_train, y_train), (x_test, y_test) = gen_datasets()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz


  x_train, y_train = np.array(xs[:idx]), np.array(labels[:idx])
  x_test, y_test = np.array(xs[idx:]), np.array(labels[idx:])


In [3]:
word_index = keras.datasets.imdb.get_word_index()
print(len(word_index))

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
88584


In [4]:
(x_train, y_train), (x_test, y_test) = gen_datasets()
x_train = pad_sequences(x_train, maxlen=sequence_length) 
x_test = pad_sequences(x_test,maxlen=sequence_length)
print(x_train.shape)
print(x_test.shape)
print(y_train.shape)
print(y_test.shape)

  x_train, y_train = np.array(xs[:idx]), np.array(labels[:idx])


(25000, 300)
(25000, 300)
(25000,)
(25000,)


  x_test, y_test = np.array(xs[idx:]), np.array(labels[idx:])


## 单向RNN

In [5]:
embedding_dimension = 100
single_rnn_model = keras.models.Sequential([
    layers.Embedding(input_dim=num_words,
                     output_dim=embedding_dimension,
                     input_length=sequence_length),
    layers.SimpleRNN(units=64, return_sequences=False),  #False 只返回最后一层
    layers.Dense(64, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])

single_rnn_model.compile(optimizer=keras.optimizers.Adam(),
                         loss=keras.losses.BinaryCrossentropy(),
                         metrics=['accuracy'])

single_rnn_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, 300, 100)          30000     
_________________________________________________________________
simple_rnn (SimpleRNN)       (None, 64)                10560     
_________________________________________________________________
dense (Dense)                (None, 64)                4160      
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 65        
Total params: 44,785
Trainable params: 44,785
Non-trainable params: 0
_________________________________________________________________


In [6]:
history = single_rnn_model.fit(x_train, y_train, batch_size=64, epochs=10, validation_split=0.1)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## 单向 LSTM

In [7]:
embedding_dimension = 100
single_lstm_model = keras.models.Sequential([
    layers.Embedding(input_dim=num_words,
                     output_dim=embedding_dimension,
                     input_length=sequence_length),
    #     layers.SimpleRNN(units = 64, return_sequences = False),#False 只返回最后一层
    layers.LSTM(units=64, return_sequences=False),
    layers.Dense(64, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])

single_lstm_model.compile(optimizer=keras.optimizers.Adam(),
                          loss=keras.losses.BinaryCrossentropy(),
                          metrics=['accuracy'])

single_lstm_model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 300, 100)          30000     
_________________________________________________________________
lstm (LSTM)                  (None, 64)                42240     
_________________________________________________________________
dense_2 (Dense)              (None, 64)                4160      
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 65        
Total params: 76,465
Trainable params: 76,465
Non-trainable params: 0
_________________________________________________________________


In [8]:
history = single_lstm_model.fit(x_train, y_train, batch_size=64, epochs=10, validation_split=0.1)

Train on 22500 samples, validate on 2500 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## 双向结构

![title](bidirection.png)

In [9]:
embedding_dimension = 100
bidirection_lstm_model = keras.models.Sequential([
    layers.Embedding(input_dim=num_words,
                     output_dim=embedding_dimension,
                     input_length=sequence_length),
    layers.Bidirectional(layers.LSTM(units=64,
                                     return_sequences=True)),  # True 返回多层
    layers.Bidirectional(layers.LSTM(units=64, return_sequences=False)),
    layers.Dense(64, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])

bidirection_lstm_model.compile(optimizer=keras.optimizers.Adam(),
                               loss=keras.losses.BinaryCrossentropy(),
                               metrics=['accuracy'])

bidirection_lstm_model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 300, 100)          30000     
_________________________________________________________________
bidirectional (Bidirectional (None, 300, 128)          84480     
_________________________________________________________________
bidirectional_1 (Bidirection (None, 128)               98816     
_________________________________________________________________
dense_4 (Dense)              (None, 64)                8256      
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 65        
Total params: 221,617
Trainable params: 221,617
Non-trainable params: 0
_________________________________________________________________


In [10]:
history = bidirection_lstm_model.fit(x_train, y_train, batch_size=64, epochs=10, validation_split=0.1)

Train on 22500 samples, validate on 2500 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## 测试

In [11]:
result = bidirection_lstm_model.evaluate(x_test, y_test)
print("loss=%s,accuracy=%s" %(result[0],result[1]))



loss=0.41429665595054627,accuracy=0.8064
