# Exploring RNNs

In this assignment, you will be asked to modify the [notebook](https://colab.research.google.com/drive/1Ge7HNinj0riX56ayukvbVpKpg-BFgni0?usp=sharing) which we went over in class exploring the use of RNNs.

We begin by donwloading and unziping the dataset.

In [None]:
!wget https://download.pytorch.org/tutorial/data.zip

In [None]:
!unzip data.zip

## Loading and Formatting the Data

We provide helper functions for loading the data, and store it as a dictionary with entries for each nationality. Data is split into training and test as well.

In [None]:
from __future__ import unicode_literals, print_function, division
from io import open
import glob
import os
import unicodedata
import string
import torch

# This function returns the path of the files in the dataset
def findFiles(path): return glob.glob(path)

# Specifying list of characters
all_letters = string.ascii_letters + " .,;'"
n_letters = len(all_letters)

# This function converts unicode to ascii
def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
        and c in all_letters
    )

# Function for reading afile and splitting into lines
def readLines(filename):
    lines = open(filename, encoding='utf-8').read().strip().split('\n')
    return [unicodeToAscii(line) for line in lines]

# Specifying percentage of data used for training
perTrain = 0.9

# Build the category_lines dictionary, a list of names per language
category_lines_train = {}
category_lines_test = {}
all_categories = []

# Loading all the data
for filename in findFiles('data/names/*.txt'):
    category = os.path.splitext(os.path.basename(filename))[0]
    all_categories.append(category)
    lines = readLines(filename)
    category_lines_train[category] = lines[0:int(perTrain*len(lines))]
    category_lines_test[category] = lines[int(perTrain*len(lines)):]

# Specifying the number of categories
n_categories = len(all_categories)

# Find letter index from all_letters, e.g. "a" = 0
def letterToIndex(letter):
    return all_letters.find(letter)

# Turn a line into a <line_length x 1 x n_letters>,
# or an array of one-hot letter vectors
def lineToTensor(line):
    tensor = torch.zeros(len(line), 1, n_letters)
    for li, letter in enumerate(line):
        tensor[li][0][letterToIndex(letter)] = 1
    return tensor

## [Task 1] Using Adam Optimizer [30 pts]

We will be replacing the implementation of standard gradient descent in the baseline model by a call of the Adam optimizer. Do the following:

1. [12 pts] Train the baseline model with the standard gradient descent and a version using Adam optimizer. Plot the learning curves for both approaches. Train both models for only 50,000 iterations.
2. [12 pts] Evaluate the performance of both models on the test set.
3. [6 pts] Comment on the performance of both methods on the test set, and the shape of their learning curves.


### Task 1.1

In [None]:
### TO DO - Enter Your Code here... You are welcome to add more cell if needed

### Task 1.2

In [None]:
### TO DO - Enter Your Code here... You are welcome to add more cell if needed

### Task 1.3

TO DO - Enter Your Response here



## [Task 2] Implementing a More Complex RNN [30 pts]

Replace the linear layer $g$ in the baseline model by a two-layer fully connected neural network with ReLU activation. The new subnetwork should implement:

$$h^{(t)} = g(c^{(t)}) = \sigma\left(W_2 \cdot \sigma \left(W_1 \cdot c^{(t)} + b_1 \right) + b_2 \right),$$

where $\sigma$ is a ReLU activation, and $(W_k,b_k)$ are the parameters for a linear layer. Then, answer the following:

1. [12 pts] Compare the learning curves for the baseline trained with Adam and this more complex model trained with Adam as well. Train both models for only 50,000 iterations.
2. [12 pts] Evaluate the performance of both models on the test set.
3. [6 pts] Comment on the performance of both methods on the test set, and the shape of their learning curves.


### Task 2.1

In [None]:
### TO DO - Enter Your Code here... You are welcome to add more cell if needed

### Task 2.2


In [None]:
### TO DO - Enter Your Code here... You are welcome to add more cell if needed

### Task 2.3

TO DO - Enter Your Response here




## [Task 3] Using LSTM [40 pts]

Replace the custom-built RNN for a standard LSTM layer. This may require you to do some significant changes to the network class and training functions.

1. [16 pts] Compare the learning curves for the baseline trained with Adam and the LSTM model trained with Adam as well. Train both models for only 50,000 iterations.
2. [16 pts] Evaluate the performance of both models on the test set.
3. [8 pts] Comment on the performance of both methods on the test set, and the shape of their learning curves.

### Task 3.1

In [None]:
### TO DO - Enter Your Code here... You are welcome to add more cell if needed

### Task 3.2

In [None]:
### TO DO - Enter Your Code here... You are welcome to add more cell if needed

### Task 3.3

TO DO - Enter Your Response here
