In [1]:
from IPython.display import Audio, Image, YouTubeVideo

# LESSON 7: Recurrent Neural Network

## CHAPTER 1: Intro to RNNs

![meme.png](attachment:meme.png)
*Hi, it's Luis again!*

### Recurrent Neural Networks

Hi! It's Luis again!

Now that you have some experience with PyTorch and deep learning, I'll be teaching you about recurrent neural networks (__RNNs__) and long short-term memory (__LSTM__) . RNNs are designed specifically to learn from sequences of data by passing the hidden state from one step in the sequence to the next step in the sequence, combined with the input. LSTMs are an improvement the RNNs, and are quite useful when our neural network needs to switch between remembering recent things, and things from long time ago. But first, I want to give you some great references to study this further. There are many posts out there about LSTMs, here are a few of my favorites:

* [Chris Olah's LSTM post](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
* [Edwin Chen's LSTM post](http://blog.echen.me/2017/05/30/exploring-lstms/)
* [Andrej Karpathy's blog post](http://karpathy.github.io/2015/05/21/rnn-effectiveness/) on RNNs
* [Andrej Karpathy's lecture](https://www.youtube.com/watch?v=iX5V1WpxxkY) on RNNs and LSTMs from CS231n

So, let's dig in!


## CHAPTER 2: RNN vs LSTM

In [2]:
id = '70MgF-IwAr8'
YouTubeVideo(id=id, width=600)

## CHAPTER 3: Basics of LSTM

In [3]:
id = 'gjb68a4XsqE'
YouTubeVideo(id=id, width=600)

## CHAPTER 4: Architecture of LSTM

In [4]:
id = 'ycwthhdx8ws'
YouTubeVideo(id=id, width=600)

## CHAPTER 5: The Learn Gate

In [5]:
id = 'aVHVI7ovbHY'
YouTubeVideo(id=id, width=600)

The output of the Learn Gate is Ntit where:

![screen-shot-2017-11-16-at-4.26.22-pm.png](attachment:screen-shot-2017-11-16-at-4.26.22-pm.png)
*Equation 1*

## CHAPTER 6: The Forget Gate

In [6]:
id = 'iWxpfxLUPSU'
YouTubeVideo(id=id, width=600)

The output of the Forget Gate is *LTMt−1* where:
![screen-shot-2017-11-16-at-4.27.58-pm.png](attachment:screen-shot-2017-11-16-at-4.27.58-pm.png)
*Equation 2*

## CHAPTER 7: The Remember Gate

In [7]:
id = '0qlm86HaXuU'
YouTubeVideo(id=id, width=600)

The output of the Remember Gate is:

LTMt−1ft+Ntit

Equation 3

(Nt,it and ft are calculated in equations 1 and _2_)


## CHAPTER 8: The Use Gate

In [8]:
id = '5Ifolm1jTdY'
YouTubeVideo(id=id, width=600)

At 00:27 : Luis refers to obtaining New Short Term Memory instead it's New Long Term Memory.

The output of the Use Gate is *UtVt* where:
![screen-shot-2017-11-16-at-4.31.41-pm.png](attachment:screen-shot-2017-11-16-at-4.31.41-pm.png)
*Equation 4*

## CHAPTER 9: Putting it All Together

In [9]:
id = 'IF8FlKW-Zo0'
YouTubeVideo(id=id, width=600)

## CHAPTER 10: Other architectures

In [10]:
id = 'MsxFDuYlTuQ'
YouTubeVideo(id=id, width=600)

Additional information about GRUs can be found in the following links:

* [Michael Guerzhoy's post](http://www.cs.toronto.edu/~guerzhoy/321/lec/W09/rnn_gated.pdf)
* [Steve Carell](http://despicableme.wikia.com/wiki/Felonius_Gru)



## CHAPTER 11: Implementing RNNs

![cezanne-head.jpg](attachment:cezanne-head.jpg)
*I'm back! This time, showing you RNNs*

### Implementing Recurrent Neural Networks

Now that you've learned about RNNs and LSTMs from Luis, it's time to see how we implement them in PyTorch. With a bit of an assist from Mat, I'll be leading you through a couple notebooks showing how to build RNNs with PyTorch. First, I'll show you how to learn from time-series data. Then, you'll implement a character-level RNN. That is, it will learn from some text one character at a time, then generate new text one character at a time.

## CHAPTER 12: Time-Series Prediction

### Code Walkthrough & Repository

The below video is a walkthrough of code that you can find in our public Github repository, if you navigate to ``recurrent-neural-networks > time-series`` and [the Simple_RNN.ipynb notebook](https://github.com/udacity/deep-learning-v2-pytorch/blob/master/recurrent-neural-networks/time-series/Simple_RNN.ipynb). Feel free to go through this code on your own, locally.

This example is meant to give you an idea of how PyTorch represents RNNs and how you might represent memory in code. Later, you'll be given more complex exercise and solution notebooks, in-classroom.


In [11]:
id = 'xV5jHLFfJbQ'
YouTubeVideo(id=id, width=600)

## CHAPTER 13: Training & Memory

In [12]:
id = 'sx7T_KP5v9I'
YouTubeVideo(id=id, width=600)

### Recurrent Layers

Here is the documentation for the main types of[ recurrent layers in PyTorch](https://pytorch.org/docs/stable/nn.html#recurrent-layers). Take a look and read about the three main types: RNN, LSTM, and GRU.
Hidden State Dimensions

### Quiz Question

Say you've defined a GRU layer with ``input_size = 100, hidden_size = 20, and num_layers=1``. What will the dimensions of the hidden state be if you're passing in data, batch first, in batches of 3 sequences at a time?
#### ANSWER:
(1, 3, 20)


## CHAPTER 14: Character-wise RNNs

In [13]:
id = 'dXl3eWCGLdU'
YouTubeVideo(id=id, width=600)

## CHAPTER 15: Sequence Batching

In [14]:
id = 'Z4OiyU0Cldg'
YouTubeVideo(id=id, width=600)

## CHAPTER 16: Notebook: Character-Level RNN

### Notebook: Character-Level RNN

Now you have all the information you need to implement an RNN of our own. The next few videos will be all about character-level text prediction with an LSTM!

__It's suggested that you open the notebook in a new, working tab and continue working on it as you go through the instructional videos in this tab__. This way you can toggle between learning new skills and coding/applying new skills.

To open this notebook, go to our notebook repo (available [from here on Github](https://github.com/udacity/deep-learning-v2-pytorch) and open the notebook __Character_Level_RNN_Exercise.ipynb in the recurrent-neural-networks > char-rnn__ folder. You can either download the repository with ``git clone https://github.com/udacity/deep-learning-v2-pytorch.git``, or download it as an archive file from [this link](https://github.com/udacity/deep-learning-v2-pytorch/archive/master.zip).

### Instructions

* Load in text data
* Pre-process that data, encoding characters as integers and creating one-hot input vectors
* Define an RNN that predicts the next character when given an input sequence
* Train the RNN and use it to generate new text

This is a self-assessed lab. If you need any help or want to check your answers, feel free to check out the solutions notebook in the same folder, or by clicking [here](https://github.com/udacity/deep-learning-v2-pytorch/blob/master/recurrent-neural-networks/char-rnn/Character_Level_RNN_Solution.ipynb).

### Note about GPUs

In this notebook, you'll find training these networks is much faster if you use a GPU. However, you can still complete the exercises without a GPU. If you can't use a local GPU, we suggest you use cloud platforms such as [AWS](https://docs.aws.amazon.com/dlami/latest/devguide/gpu.html), [GCP](https://cloud.google.com/gpu/), and [FloydHub](https://www.floydhub.com/) to train your networks on a GPU.


## CHAPTER 17: Implementing a Char-RNN

In [15]:
id = 'MMtgZXzFB10'
YouTubeVideo(id=id, width=600)

*Typo: Above you may see the title, ``Chararacter_Level_RNN_Exercise.`` This is a mistake on my part and the in-classroom notebooks have been updated with the correct spelling.*

Know that the code is correct even if the title has a typo :)

## CHAPTER 18: Batching Data, Solution

In [16]:
id = '9Eg0wf3eW-k'
YouTubeVideo(id=id, width=600)

## CHAPTER 19: Defining the Model

In [17]:
id = '_LWzyqq4hCY'
YouTubeVideo(id=id, width=600)

### Contiguous variables

If you are stacking up multiple LSTM outputs, it may be necessary to use ``.contiguous()`` to reshape the output. The notebook and Github repo code has been updated to include this use case in the ``forward`` function of the model:
```python
# stack up LSTM outputs
out = out.contiguous().view(-1, self.n_hidden)
```

## CHAPTER 20: Char-RNN, Solution

In [18]:
id = 'ed33qePHrJM'
YouTubeVideo(id=id, width=600)

### Representing Memory

You’ve learned that RNN’s work well for sequences of data because they have a kind of memory. This memory is represented by something called the ``hidden state``.

In the character-level LSTM example, each LSTM cell, in addition to accepting a character as input and generating an output character, also has some hidden state, and each cell will pass along its hidden state to the next cell.

This connection creates a kind of memory by which a series of cells can remember which characters they’ve just seen and use that information to inform the next prediction!

For example, if a cell has just generated the character ``a`` it likely will not generate another ``a``, right after that!

``net.eval()``

There is an omission in the above code: including ``net.eval()`` !

``net.eval()`` will set all the layers in your model to evaluation mode. This affects layers like dropout layers that turn "off" nodes during training with some probability, but should allow every node to be "on" for evaluation. So, you should set your model to evaluation mode __before testing or validating your model__, and before, for example, sampling and making predictions about the likely next character in a given sequence. I'll set ``net.train()`` (training mode) only during the training loop.

This is reflected in the previous notebook code and in our [Github repository](https://github.com/udacity/deep-learning-v2-pytorch/blob/master/recurrent-neural-networks/char-rnn).



## CHAPTER 21: Making Predictions

In [19]:
id = 'BhrpV3kwATo'
YouTubeVideo(id=id, width=600)

### Examples of RNNs

Take a look at one of my favorite examples of RNNs making predictions based on some user-generated input dat: the [sketch-rnn by Magenta](https://magenta.tensorflow.org/assets/sketch_rnn_demo/index.html). This RNN takes as input a starting sketch, drawn by you, and then tries to complete your sketch using a particular model. For example, it can learn to complete a sketch of a pineapple or the mona lisa!

![screen-shot-2018-10-15-at-8.35.15-pm.png](attachment:screen-shot-2018-10-15-at-8.35.15-pm.png)
*Example sketch-rnn output of the mona lisa.*