In [None]:
%%HTML
<link rel="stylesheet" type="text/css" href="../css/custom.css">

# Recurrent neural networks

![footer_logo](../images/logo.png)

## Goal

We will discuss Recurrent neural networks and how they enable us to perform sequential tasks.

## Program

- [Sequences and neural networks]()
- [Recurrent neural networks (RNNs)]()
- [Applications]()
- [Types of RNNs]()
- [Training an RNN (BPTT)]()

# Sequential data 

### The order of a sequence holds information

> proudly part of Xebia Group

vs

> Group part of Xebia proudly



# Feed-forward 💔 sequences

We need a different kind of unit!

![center quarter](../images/rnn/feedforward-sequence.png)


# Recurrent ❤️ sequences

Internal loop feeds back the previous state 

<center><img src="../images/rnn/rnn-architecture.png" width="800"><center>


# Formal comparison

**Feed forward**

<img src="../images/rnn/formula-feedforward.png" style="height: 50px; display: block; margin-left: auto !important; margin-right: auto !important;" align="center"/>

**Recurrent**

<img src="../images/rnn/formula-recurrent.png" style="height: 56px; display: block; margin-left: auto !important; margin-right: auto !important;" align="center"/>


# Recurrent neural network (RNN)

- contains at least one feed-back connection
- enables the neural network to do temporal processing and learn sequences

![center quarter](../images/rnn/rnn_loop.png)



# Recurrent Neural Network (RNN)

<img src="../images/rnn/rnn_unit.png" align='right' width='300'>

- Proposed in the 80s for modeling time series

- An RNN does not start its "thinking" from scratch

- Networks can persist information

- Take earlier inputs into account when making predictions

# Recurrent Neural Network (RNN)

Network architectures:

- Simple RNNs
- Long short-term memory
- Gated recurrent units

Use cases:

- Forecasting
- Classification
- Outlier detection

# Applications: Language translation

![three_quarters center](../images/rnn/google-translate.png)


# Applications: Image captioning

![half center](../images/rnn/image-captioning.png)

# Applications: Speech recognition

![half center](../images/rnn/speech_recognition.jpg)


# Applications: Session-based recommendations

![center](../images/rnn/gru4rec.png)

<sup>Source: [Hidasi et. al, 2015 "
Session-based Recommendations with Recurrent Neural Networks"](https://arxiv.org/abs/1511.06939)<sup/>


# Applications: Sentiment analysis


![center three_quarters](../images/rnn/sentiment-neuron.gif)

<sup>Source: [Unsupervised Sentiment Neuron](https://blog.openai.com/unsupervised-sentiment-neuron/)<sup/>
    


# Applications: Entity recognition


![center three_quarters](../images/rnn/entity_recognition.gif)

<sup>Source: [DL experiment for a GDD client by Marcel Raas](https://godatadriven.com/players/marcel-raas)<sup/>


# Modeling sequences

- Feedforward: activation determined by the input
- RNN: architecture contains loops
- RNN: activation might be determined also by its own activation at an earlier time. 

> A RNN can be thought of as multiple copies of the same network linked through time

![center three_quarters](../images/rnn/RNN_unrolled.png)

# Types of RNNs

- A RNN provides a natural and flexible architecture for modeling all kinds of sequence data

![center half](../images/rnn/rnn_sequence.jpeg)


# One to many example task?

![center](../images/rnn/rnn_one_to_many.jpeg)

> Image captioning

# Many to one example task?

![center](../images/rnn/rnn_many_to_one.jpeg)

> Sentiment analysis

# Many to many example task?

![center](../images/rnn/rnn_many_to_many_a.jpeg)

> Language translation

# Many to many example task?

![center](../images/rnn/rnn_many_to_many_b.jpeg)

> Video frames classification

# Training an RNN: Backpropagation through time ([BPTT](http://ir.hit.edu.cn/~jguo/docs/notes/bptt.pdf))

Below is a simple RNN (many-to-one)

![third center](../images/rnn/simple_recurrent.png)

$$\begin{align}
a_t &= \varphi(W_h\cdot x_t + W_r\cdot a_{t-1})\\
y_t &= W_o\cdot a_t
\end{align}$$
<!-- 
- One-step ahead forecasting: $x_t\rightarrow y_{t-1}$ -->


# Training an RNN: Backpropagation through time ([BPTT](http://ir.hit.edu.cn/~jguo/docs/notes/bptt.pdf))

To train an RNN we backpropogate the error through time.

$$\frac{\partial L}{\partial w} = \Sigma_t\frac{\partial L_t}{\partial w}$$

Backpropagation of the $\delta$ error vectors through the network.

![center third](../images/rnn/bptt_recurrent.png)


# Parameter sharing

Note that we do not have serperate weights at each time step

![center third](../images/rnn/bptt_recurrent.png)

# Parameter sharing

- It would require a lot of resources, if there parameters are not shared. 
- We can generalize to sequences of different lengths.
- Reflects the fact that we are performing the same task at each step, as a result, we don't have to relearn the rules at each point in the sequnce
- Oftentimes, components of a sequences operate the same across the sequence. For instance, in NLP:

                                                     "On Monday it was snowing"

                                                     "It was snowing on Monday"
    *i.e. we observe order, not position.* 



# Summary

In this notebook we have covered,

- The issues with sequences and standard neural networks.
- Neural networks with a recurrent connection (RNNs) and their applications.
- Training an RNN with BPTT and the importance of paramter sharing.

# RNN Exercise
[Exercise: RNN forecast airline passengers](../exercises/03_01_rnn_forecast_airline_passenger.ipynb)
