# Chapter 14: Recurrent Neural Networks

## Exercises

### 1. Can you think of a few applications for a sequence-to-sequence RNN? What about a sequence-to-vector RNN? And a vector-to-sequence RNN?

A sequence-to-sequence RNN is generally used for a model for predicting the future behavior of some input sequence. This can be used to create a predictive model for determining what the next word you are about to type might be.

A sequence-to-vector RNN is good for classifying sequences, such as sentiment analysis. Also a sequence-to-vector RNN is used for finding the embeddings of a vocabulary of words in a denser, smaller vector space.

In machine translation, vector-to-sequence RNN is good for decoding embeddings from input in one language into words in the target language. You can also use vector-to-sequence RNNs to generate captions for images.

### 2. Why do people use encoder-decoder RNNs rather than plain sequence-to-sequence RNNs for automatic translation?

Most models for automatic translation encode vocabularies as a vector space where each word is a perpendicular unit vector. For a vocabulary of 50,000 words, this means the input sequences are vectors in a 50,000-dimensional space. Training a sequence-to-sequence RNN for machine translation with large vocabularies would take a large amount of memory, making it inefficient.

Using an Encoder-Decoder model allows you to train the encoder to find a denser representation of the words, making training more efficient. Also training the model to find an embedding also helps the model learn what words are closely related to one another.

### 3. How could you combine a convolutional neural network and an RNN to classify videos?

Since a video is a sequence of images, you could create a convolutional neural network where each cell is a convolutional layers which learn feature maps for the images in each frame of the video. You could have it learn one set of feature maps for the input and another feature map for the previous output.

### 4. What are the advantages of building an RNN using `dynamic_rnn()` rather than `static_rnn()`?

The `static_rnn()` function creates new graph nodes for each time step in the sequence. This means that if you are processing a sequence with a large number of steps, you risk getting an OOM error when building your TensorFlow graph. The `dynamic_rnn()` function uses a while loop to perform multiple operations using the same nodes. The `dyanmic_rnn()` function also allows you to swap memory between the GPU and the CPU using the `swap_memory` parameter. It also accepts a single tensor as an input instead of a list of tensors for each time step in the sequence.

### 5. How can you deal with variable-length input sequences? What about variable-length output sequences?

The `dynamic_rnn()` function takes a `sequence_length` parameter which is a 1D tensor of integers which represent the sequence length of each of the inputs. Input sequences that are less than the maximum length sequence are padded with zeros.

For variable-length output sequences, since it is not possible to determine how long each output will be prior to training, so each output sequence ends with an end of sequence (EOS) character to delimit the end.

### 6. What is a common way to distribute training and execution of a deep RNN across multiple GPUs?

In order to distribute an RNN across devices you cannot just simply call the `tf.device()` function. This is because TensorFlow's built in RNN cell classes like `BasicRNNCell` do not create the graph ncdes themselves, rather they are cell factories.

In order to distribute an RNN across devices, you must define a new cell factory which actually creates each cell on a separate device. For an example, see `DeviceCellWrapper` in `RecurrentNeuralNetworks.ipynb`.