<a href="https://colab.research.google.com/github/ALMerrill/cs474_labs_f2019/blob/master/DL_Lab6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 6: Sequence-to-sequence models

## Description:
For this lab, you will code up the [char-rnn model of Karpathy](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). This is a recurrent neural network that is trained probabilistically on sequences of characters, and that can then be used to sample new sequences that are like the original.

This lab will help you develop several new skills, as well as understand some best practices needed for building large models. In addition, we'll be able to create networks that generate neat text!

## There are two parts of this lab:
###  1.   Wiring up a basic sequence-to-sequence computation graph
###  2.   Implementing your own GRU cell.


An example of my final samples are shown below (more detail in the
final section of this writeup), after 150 passes through the data.
Please generate about 15 samples for each dataset.

<code>
And ifte thin forgision forward thene over up to a fear not your
And freitions, which is great God. Behold these are the loss sub
And ache with the Lord hath bloes, which was done to the holy Gr
And appeicis arm vinimonahites strong in name, to doth piseling 
And miniquithers these words, he commanded order not; neither sa
And min for many would happine even to the earth, to said unto m
And mie first be traditions? Behold, you, because it was a sound
And from tike ended the Lamanites had administered, and I say bi
</code>


---

## Part 0: Readings, data loading, and high level training

---

There is a tutorial here that will help build out scaffolding code, and get an understanding of using sequences in pytorch.

* Read the following

> * [Pytorch sequence-to-sequence tutorial](https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html)
* [Understanding LSTM Networks](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)






In [4]:
! wget -O ./text_files.tar.gz 'https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz' 
! tar -xzf text_files.tar.gz
! pip install unidecode
! pip install torch
import torch

def char_tensor(string):
  tensor = torch.zeros(len(string)).long()
  for c in range(len(string)):
      tensor[c] = all_characters.index(string[c])
  return tensor


--2019-10-18 17:01:46--  https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz
Resolving piazza.com (piazza.com)... 3.214.17.10, 34.205.95.128, 52.2.48.133, ...
Connecting to piazza.com (piazza.com)|3.214.17.10|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://d1b10bmlvqabco.cloudfront.net/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz [following]
--2019-10-18 17:01:46--  https://d1b10bmlvqabco.cloudfront.net/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz
Resolving d1b10bmlvqabco.cloudfront.net (d1b10bmlvqabco.cloudfront.net)... 13.226.52.181, 13.226.52.35, 13.226.52.170, ...
Connecting to d1b10bmlvqabco.cloudfront.net (d1b10bmlvqabco.cloudfront.net)|13.226.52.181|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1533290 (1.5M) [application/x-gzip]
Saving to: ‘./text_files.tar.gz’


2019-10-18 17:01:46 (9.66 MB

In [0]:
# input_size = dimmension of vocab
# hidden_size = dimmension of embedding (hyperparameter)
# makes a lookup table where each row is a vector of length embedding for a character in the vocab
  # these values are trained
  # takes in a character like 'a' and returns the current embedding for that character.
  # that embedding becomes the input to the network

import unidecode
import string
import random
import re
import pdb
from torch.utils.data import Dataset, DataLoader

all_characters = string.printable
n_characters = len(all_characters)


class TextDataset(Dataset):
  def __init__(self, chunk_len=200, file_name='./text_files/lotr.txt'):
    self.chunk_len = chunk_len
    self.file = unidecode.unidecode(open(file_name).read())
    self.file = self.file[2000:]
    self.len = len(self.file)
  
  def __getitem__(self):
    pass
  
  def __len__(self):
    return self.len
  
  def random_training_set(self):    
    chunk = self.random_chunk(self.chunk_len)
    inp = char_tensor(chunk[:-1])
    target = char_tensor(chunk[1:])
    return inp, target
  
  def random_chunk(self, chunk_len):
    start_index = random.randint(0, self.len - chunk_len)
    end_index = start_index + chunk_len + 1
    return self.file[start_index:end_index]
  



---

## Part 4: Creating your own GRU cell 

**(Come back to this later - its defined here so that the GRU will be defined before it is used)**

---

The cell that you used in Part 1 was a pre-defined Pytorch layer. Now, write your own GRU class using the same parameters as the built-in Pytorch class does.

Please try not to look at the GRU cell definition. The answer is right there in the code, and in theory, you could just cut-and-paste it. This bit is on your honor!

**TODO:**
* Create a custom GRU cell

**DONE:**



In [0]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn import Parameter

device = torch.device("cuda") 

class GRU(nn.Module):
  def __init__(self, input_size, hidden_size, num_layers):
    super(GRU, self).__init__()
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.num_layers = num_layers
    
    self.sigmoid = nn.Sigmoid()
    self.tanh = nn.Tanh()
    self.W_xr = Parameter(torch.zeros((input_size, hidden_size)))
    self.W_hr = Parameter(torch.zeros((input_size, hidden_size)))
    self.W_xz = Parameter(torch.zeros((input_size, hidden_size)))
    self.W_hz = Parameter(torch.zeros((hidden_size, hidden_size)))
    self.W_xh = Parameter(torch.zeros((input_size, hidden_size)))
    self.W_hh = Parameter(torch.zeros((hidden_size, hidden_size)))
    self.b_r = 0
    self.b_z = 1
    self.b_h = 0

    
  
  def forward(self, inputs, hidden):
    print(inputs)
    print(hidden)
    print(self.W_xr)
    this = torch.matmul(W_xr, inputs)
    that = torch.matmul(W_hr, hidden)
    r_t = self.sigmoid(torch.matmul(W_xr, inputs) + torch.matmul(W_hr, hidden) + b_r)
    z_t = self.sigmoid(torch.matmul(W_xz, inputs) + torch.matmul(W_hz, hidden) + b_z)
    ht_t = self.tanh(torch.matmul(W_xh, inputs) + torch.matmul(W_hh, r_t * hidden) + b_h)
    h_t = (z_t * hidden) + ((1 - z_t) * ht_t)
    hiddens = hidden #with more layers this would be all of the hidden layers for each t
    return h_t, hiddens
  


---

##  Part 1: Building a sequence to sequence model

---

Great! We have the data in a useable form. We can switch out which text file we are reading from, and trying to simulate.

We now want to build out an RNN model, in this section, we will use all built in Pytorch pieces when building our RNN class.


**TODO:**
* Create an RNN class that extends from nn.Module.

**DONE:**



In [0]:
class RNN(nn.Module): #This is just a decoder. There is no encoder, because we just "decoding" the input string to determine the next character
  def __init__(self, input_size, hidden_size, output_size, n_layers=1):
    super(RNN, self).__init__()
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.output_size = output_size
    self.n_layers = n_layers
    
    self.embedding = nn.Embedding(input_size, hidden_size)
    self.relu = nn.ReLU()
#     self.gru = nn.GRU(input_size, hidden_size, n_layers)
    self.gru = GRU(hidden_size, hidden_size, n_layers)
    self.out_layer = nn.Linear(hidden_size, output_size)

  def forward(self, input_char, hidden):
    embedding = self.embedding(input_char).view(1,1,-1)
    output, hidden = self.gru(embedding, hidden)
    out_decoded = self.relu(self.out_layer(output))
    
    return out_decoded, hidden

  def init_hidden(self):
    return torch.zeros(self.n_layers, 1, self.hidden_size)

---

## Part 2: Sample text and Training information

---

We now want to be able to train our network, and sample text after training.

This function outlines how training a sequence style network goes. 

**TODO:**
* Fill in the pieces.

**DONE:**




In [64]:
import time
from matplotlib import pyplot as plt

def train(n_layers, lr):
  hidden_size = n_characters
  in_size = n_characters
  out_size = n_characters
  file_name = './text_files/lotr.txt'
  train_dataset = TextDataset(file_name=file_name)
  decoder = RNN(in_size, hidden_size, out_size, n_layers=n_layers).to(device)
  optimizer = torch.optim.Adam(decoder.parameters(), lr=lr)
  objective = nn.CrossEntropyLoss()

  losses = []
  optimizer.zero_grad()
  hidden = decoder.init_hidden()
  n_epochs = 200
  print_every = 10
  start = time.time()
  running_loss = 0
  for epoch in range(1, n_epochs + 1):
    running_loss = 0
    optimizer.zero_grad()
    input_string, target_string = train_dataset.random_training_set()
    for char, target_char in zip(input_string, target_string):
      char = char.to(device)
      hidden = hidden.to(device)
      out_char, hidden = decoder(char, hidden)
      running_loss += objective(out_char.squeeze(0).to(device), target_char.unsqueeze(0).to(device))
    loss = running_loss / len(target_string)
    losses.append(running_loss / len(target_string))
    running_loss.backward(retain_graph=True)
    optimizer.step()

    if epoch % print_every == 0:
        print('[%s (%d %d%%) %.4f]' % (time.time() - start, epoch, epoch / n_epochs * 100, loss))
        print(evaluate(decoder, 'Wh', 100), '\n')

    if epoch % print_every == 0:
#         losses.append(running_loss / plot_every)
        plt.plot(range(len(losses)), losses)
        plt.xlabel("Epoch: {}".format(epoch))
        plt.ylabel("Loss")
        plt.show()
#         running_loss = 0
         
train(1, .001)

tensor([[[-1.5189,  0.1143, -0.2660,  0.2226,  0.8069, -1.1690, -0.4320,
          -0.2509,  0.0362, -0.9921,  0.0224,  1.9225, -0.4184, -0.0522,
          -2.0048, -1.3874,  1.1870,  1.0744, -0.2361, -0.0456,  0.3009,
           0.3192,  0.0260, -2.4913, -0.4589, -0.4989, -1.5520,  0.8990,
          -1.4714, -1.0081,  0.9003, -1.1380, -0.9710, -0.2123, -0.3366,
          -0.6903,  0.8239, -1.4174, -0.6192,  0.3352,  1.1234, -0.5133,
          -0.2509, -0.5791,  0.8230, -1.0418, -0.8340, -0.1937,  0.8949,
          -0.2797,  1.8409,  2.0266, -0.0503, -0.6953,  0.8293,  0.3020,
           0.7271, -1.1779,  0.2033, -0.7879, -0.7827,  1.9222, -2.2402,
           1.0104,  0.7234,  1.6254, -1.3088, -0.8844,  0.5216, -1.3708,
           1.4328, -1.6704, -0.8570,  0.4054, -0.4175,  1.8689, -2.3778,
           0.8304, -0.2069,  0.3178,  0.9535, -0.8565, -0.8037,  0.0052,
          -0.0156, -0.9732,  1.3268, -0.1699,  0.1601, -0.9047,  0.8915,
           1.5144, -2.7263, -1.7598, -1.6704,  0.11

RuntimeError: ignored

---

## Part 3: Sample text and Training information

---

You can at this time, if you choose, also write out your train loop boilerplate that samples random sequences and trains your RNN. This will be helpful to have working before writing your own GRU class.

If you are finished training, or during training, and you want to sample from the network you may consider using the following function. If your RNN model is instantiated as `decoder`then this will probabilistically sample a sequence of length `predict_len`

**TODO:**
* Fill out the evaluate function to generate text frome a primed string

**DONE:**



In [0]:
def evaluate(decoder, prime_str='A', predict_len=100, temperature=0.8):
  hidden = decoder.init_hidden()
  input_str = char_tensor(prime_str)
  
  for i in range(len(input_str) - 1):
    _, hidden = decoder(input_str[i].to(device, hidden.to(device)))
    
  eval_input = input_str[-1]
  
  predicted = prime_str
  for char in range(predict_len):
    out, hidden = decoder(eval_input.to(device), hidden.to(device))
    distribution = out.view(-1)
    probs = torch.exp(distribution / temperature)
    candidate = torch.multinomial(distribution, 1)[0]
    next_char = all_characters[candidate]
    eval_input = char_tensor(next_char)
    predicted += next_char
    
  return predicted
  

---

## Part 4: (Create a GRU cell, requirements above)

---



---

## Part 5: Run it and generate some text!

---

Assuming everything has gone well, you should be able to run the main function in the scaffold code, using either your custom GRU cell or the built in layer, and see output something like this. I trained on the “lotr.txt” dataset, using chunk_length=200, hidden_size=100 for 2000 epochs gave.

**TODO:** 
* Create some cool output

**DONE:**



In [0]:
train(1, .001)

In [0]:
for i in range(10):
  start_strings = [" Th", " wh", " he", " I ", " ca", " G", " lo", " ra"]
  start = random.randint(0,len(start_strings)-1)
  print(start_strings[start])
#   all_characters.index(string[c])
  print(evaluate(start_strings[start], 200), '\n')

---

## Part 6: Generate output on a different dataset

---

**TODO:**

* Choose a textual dataset. Here are some [text datasets](https://www.kaggle.com/datasets?tags=14104-text+data%2C13205-text+mining) from Kaggle 

* Generate some decent looking results and evaluate your model's performance (say what it did well / not so well)

**DONE:**

