<a 
href="https://colab.research.google.com/github/wingated/cs474_labs_f2019/blob/master/DL_Lab6.ipynb"
  target="_parent">
  <img
    src="https://colab.research.google.com/assets/colab-badge.svg"
    alt="Open In Colab"/>
</a>

# Lab 6: Sequence-to-sequence models

### Description:
For this lab, you will code up the [char-rnn model of Karpathy](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). This is a recurrent neural network that is trained probabilistically on sequences of characters, and that can then be used to sample new sequences that are like the original.

This lab will help you develop several new skills, as well as understand some best practices needed for building large models. In addition, we'll be able to create networks that generate neat text!

### Deliverable:
- Fill in the code for the RNN (using PyTorch's built-in GRU).
- Fill in the training loop
- Fill in the evaluation loop. In this loop, rather than using a validation set, you will sample text from the RNN.
- Implement your own GRU cell.
- Train your RNN on a new domain of text (Star Wars, political speeches, etc. - have fun!)

### Grading Standards:
- 20% Implementation the RNN
- 20% Implementation training loop
- 20% Implementation of evaluation loop
- 20% Implementation of your own GRU cell
- 20% Training of your RNN on a domain of your choice

### Tips:
- Read through all the helper functions, run them, and make sure you understand what they are doing
- At each stage, ask yourself: What should the dimensions of this tensor be? Should its data type be float or int? (int is called `long` in PyTorch)
- Don't apply a softmax inside the RNN if you are using an nn.CrossEntropyLoss (this module already applies a softmax to its input).

### Example Output:
An example of my final samples are shown below (more detail in the
final section of this writeup), after 150 passes through the data.
Please generate about 15 samples for each dataset.

<code>
And ifte thin forgision forward thene over up to a fear not your
And freitions, which is great God. Behold these are the loss sub
And ache with the Lord hath bloes, which was done to the holy Gr
And appeicis arm vinimonahites strong in name, to doth piseling 
And miniquithers these words, he commanded order not; neither sa
And min for many would happine even to the earth, to said unto m
And mie first be traditions? Behold, you, because it was a sound
And from tike ended the Lamanites had administered, and I say bi
</code>


---

## Part 0: Readings, data loading, and high level training

---

There is a tutorial here that will help build out scaffolding code, and get an understanding of using sequences in pytorch.

* Read the following

> * [Pytorch sequence-to-sequence tutorial](https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html) (Take note that you will not be implementing the encoder part of this tutorial.)
* [Understanding LSTM Networks](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)






In [None]:
! wget -O ./text_files.tar.gz 'https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz' 
! tar -xzf text_files.tar.gz
! pip install unidecode
! pip install torch

import unidecode
import string
import random
import re
 
import pdb
 
all_characters = string.printable
n_characters = len(all_characters)
file = unidecode.unidecode(open('./text_files/lotr.txt').read())
file_len = len(file)
print('file_len =', file_len)

--2020-10-10 03:15:33--  https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz
Resolving piazza.com (piazza.com)... 54.205.50.50, 3.212.158.118, 52.5.213.57, ...
Connecting to piazza.com (piazza.com)|54.205.50.50|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://d1b10bmlvqabco.cloudfront.net/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz [following]
--2020-10-10 03:15:33--  https://d1b10bmlvqabco.cloudfront.net/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz
Resolving d1b10bmlvqabco.cloudfront.net (d1b10bmlvqabco.cloudfront.net)... 99.86.33.8, 99.86.33.29, 99.86.33.155, ...
Connecting to d1b10bmlvqabco.cloudfront.net (d1b10bmlvqabco.cloudfront.net)|99.86.33.8|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1533290 (1.5M) [application/x-gzip]
Saving to: ‘./text_files.tar.gz’


2020-10-10 03:15:33 (16.1 MB/s) - 

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
!pip install kaggle

!mkdir -p ~/.kaggle
!cp '/content/drive/My Drive/Colab Notebooks/kaggle.json' ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json
!mkdir -p /content/data

!kaggle datasets download -d ekrembayar/avatar-the-last-air-bender -p/content/data
from zipfile import ZipFile
with ZipFile("/content/data/avatar-the-last-air-bender.zip", "r", ) as zip_ref:
  zip_ref.extractall("/content/data")

import pandas as pd
avatar_data = pd.read_csv("/content/data/avatar.csv", encoding='windows-1252', usecols=["full_text"])

use_avatar = True

print(avatar_data['full_text'][0][:200])

Downloading avatar-the-last-air-bender.zip to /content/data
  0% 0.00/0.99M [00:00<?, ?B/s]
100% 0.99M/0.99M [00:00<00:00, 69.6MB/s]
Water. Earth. Fire. Air. My grandmother used to tell me stories about the old days: a time of peace when the Avatar kept balance between the Water Tribes, Earth Kingdom, Fire Nation and Air Nomads. Bu


In [None]:
print(len(avatar_data['full_text']))
print(sum([len(x) for x in avatar_data['full_text']]) / len(avatar_data['full_text']))

13385
134.47575644378034


In [None]:
chunk_len = 200
 
def random_chunk():
  if not use_avatar:
    start_index = random.randint(0, file_len - chunk_len)
    end_index = start_index + chunk_len + 1
    return file[start_index:end_index]
  else:
    # index = random.randint(0, len(avatar_data['full_text']) - 1)
    # final_string = avatar_data['full_text'][index][:chunk_len]
    # while len(final_string) < chunk_len:
    #   final_string += ' '
    #   index = random.randint(0, len(avatar_data['full_text']) - 1)
    #   final_string += avatar_data['full_text'][index][:chunk_len - len(final_string)]
    # final_string = ''.join([a for a in final_string if a in all_characters])
    num_chunks = 3
    start_index = random.randint(0, len(avatar_data['full_text']) - (num_chunks + 1))
    final_string = ''
    for i in range(num_chunks):
      final_string += avatar_data['full_text'][start_index + i] + ' '

    final_string = ''.join([a for a in final_string if a in all_characters])
    return final_string

  
print(random_chunk())

I'm saying, I love you! [He kisses her. After he pulls back, she embraces and kisses him back.] [After kissing for a while she pulls back, whispering slightly.] What are we doing? What our hearts have been telling us to do for a long, long time. Baby, [He moves Katara down to the side.] you're my forever girl. [Aang puckers up his lips for another kiss.] 


In [None]:
import torch
# Turn string into list of longs
def char_tensor(string):
  tensor = torch.zeros(len(string)).long()
  for c in range(len(string)):
    try:
      tensor[c] = all_characters.index(string[c])
    except ValueError as e:
      print(e, "'", string[c], "'")
      raise e
  return tensor.cuda()

abcDEF_tensor = char_tensor('abcDEF')
print(abcDEF_tensor)

def tensor_char(tensor):
  result_str = ''
  for l in range(tensor.size()[0]):
    result_str += all_characters[tensor[l]]
  return result_str

abcDEF_string = tensor_char(abcDEF_tensor)
print(abcDEF_string)
# twice_tensor = abcDEF_tensor.append(abcDEF_tensor)
# print(tensor_char(twice_tensor))

tensor([10, 11, 12, 39, 40, 41], device='cuda:0')
abcDEF


---

## Part 4: Creating your own GRU cell 

**(Come back to this later - its defined here so that the GRU will be defined before it is used)**

---

The cell that you used in Part 1 was a pre-defined Pytorch layer. Now, write your own GRU class using the same parameters as the built-in Pytorch class does.

Please try not to look at the GRU cell definition. The answer is right there in the code, and in theory, you could just cut-and-paste it. This bit is on your honor!

**TODO:**
* Create a custom GRU cell

**DONE:**



In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class GRU(nn.Module):
  class GRU_UNIT(nn.Module):
    def __init__(self, input_size, hidden_size):
      super(GRU.GRU_UNIT, self).__init__()
      self.input_size = input_size
      self.hidden_size = hidden_size

      self.W_xr = nn.Linear(input_size, hidden_size, False)
      self.W_hr = nn.Linear(input_size, hidden_size, True)
      
      self.W_xz = nn.Linear(input_size, hidden_size, False)
      self.W_hz = nn.Linear(input_size, hidden_size, True)

      self.W_xh = nn.Linear(input_size, hidden_size, False)
      self.W_hh = nn.Linear(input_size, hidden_size, True)

    # def init_hidden(self):
    #   return torch.zeros((1, self.hidden_size))
      
    
    def forward(self, inputs, hidden):
      r_t = torch.sigmoid(self.W_xr(inputs) + self.W_hr(hidden))
      z_t = torch.sigmoid(self.W_xz(inputs) + self.W_hz(hidden))
      print('w_xr', self.W_xr.weight.size(), 'input', inputs.size(), 'w_hr', self.W_hr.weight.size(), 'hidden', hidden.size())
      print("r_t", r_t.size(), "z_t", z_t.size())
      h_prime = F.tanh(self.W_xh(inputs) + self.W_hh(torch.matmul(r_t, hidden)))
      h_t = torch.matmul(z_t, hidden) + torch.matmul((1 - z_t), h_prime)
      
      return h_t, h_t

  def __init__(self, input_size, hidden_size, num_layers=1):
    super(GRU, self).__init__()
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.num_layers = num_layers

    # self.hiddens_h = []
    self.blocks = nn.ModuleList()
    for _i in range(self.num_layers):
      self.blocks.append(self.GRU_UNIT(self.input_size, self.hidden_size))
      # self.hiddens_h.append(self.blocks[-1].init_hidden())

  # def init_hidden(self):
  #   for i, block in enumerate(self.blocks):
  #     self.hiddens_h[i] = block.init_hidden()
  #   return self.hiddens_h[0]

    
  
  def forward(self, inputs, hidden):
    '''
      # Each layer does the following:
      # r_t = sigmoid(W_ir*x_t + b_ir + W_hr*h_(t-1) + b_hr)
      # z_t = sigmoid(W_iz*x_t + b_iz + W_hz*h_(t-1) + b_hz)
      # n_t = tanh(W_in*x_t + b_in + r_t**(W_hn*h_(t-1) + b_hn))
      # h_(t) = (1 - z_t)**n_t + z_t**h_(t-1)
      # Where ** is hadamard product (not matrix multiplication, but elementwise multiplication)
    '''

    # for i, block in enumerate(self.blocks):
    #   inputs, self.hiddens_h[i] = block(inputs, self.hiddens_h[i] if i > 0 else hidden)

    # return inputs, self.hiddens_h[-1]
    for i in range(self.num_layers):
      inputs, hidden[i] = self.blocks[i](inputs, hidden[i])
    
    return inputs, hidden
  


---

##  Part 1: Building a sequence to sequence model

---

Great! We have the data in a useable form. We can switch out which text file we are reading from, and trying to simulate.

We now want to build out an RNN model, in this section, we will use all built in Pytorch pieces when building our RNN class.


**TODO:**

**DONE:**
* Create an RNN class that extends from nn.Module.




In [None]:
class RNN(nn.Module):
  def __init__(self, input_size, hidden_size, output_size, n_layers=1, soft_max=False):
    super(RNN, self).__init__()
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.output_size = output_size
    self.n_layers = n_layers
    
    # more stuff here...
    self.embedding = nn.Embedding(self.output_size, self.hidden_size)
    self.gru = nn.GRU(input_size=self.hidden_size, hidden_size=self.hidden_size, num_layers=self.n_layers)
    self.linOut = nn.Linear(self.hidden_size, self.output_size)
    self.softmax = nn.Softmax(dim=1) if soft_max is True else None

  def forward(self, input_char, hidden):
    # by reviewing the documentation, construct a forward function that properly uses the output
    # of the GRU
    # https://pytorch.org/docs/stable/generated/torch.nn.GRU.html

    # stuff here
    out_decoded = self.embedding(input_char).view(1, 1, -1)
    out_decoded = F.relu(out_decoded)
    out_decoded, hidden = self.gru(out_decoded, hidden)
    out_decoded = self.linOut(out_decoded[0])
    if self.softmax is not None:
      out_decoded = self.softmax(out_decoded)
    
    return out_decoded, hidden

  def init_hidden(self):
    return torch.zeros(self.n_layers, 1, self.hidden_size).cuda()

In [None]:
def random_training_set():    
  chunk = random_chunk()
  inp = char_tensor(chunk[:-1])
  target = char_tensor(chunk[1:])
  return inp, target

# print(random_training_set())

---

## Part 2: Sample text and Training information

---

We now want to be able to train our network, and sample text after training.

This function outlines how training a sequence style network goes. 

**TODO:**

**DONE:**
* Fill in the pieces.





In [None]:
# NOTE: decoder_optimizer, decoder, and criterion will be defined below as global variables
def train(inp, target, per_char=True, teacher_forcing=True):
  ## initialize hidden layers, set up gradient and loss 
    # your code here
  ## /
  inp_len = inp.size(0)
  target_len = target.size(0)

  decoder.train()
  decoder_optimizer.zero_grad()
  hidden = decoder.init_hidden()
  # print(hidden.size())
  # print(asgfae)
  loss = 0
  
  # result = []
  # more stuff here...
  if per_char:
    next_char = inp[0]
    for index in range(inp_len):
      output, hidden = decoder(next_char, hidden)
      values, indicies = output.topk(1)
      # print(values, indicies)
      result_char = indicies.squeeze().detach()
      # result.append(result_char)
      if teacher_forcing:
        next_char = target[index]
      else:
        next_char = result_char
      # output.transpose_(1, 2)
      # print(output.size(), output.squeeze().size(), target[index].view(-1).size())
      loss += criterion(output, target[index].view(-1))

  else:
    result, hidden = decoder(inp, hidden)
    loss += criterion(result, target)
  
  loss.backward()
  decoder_optimizer.step()
  
  return loss.item() / target_len


---

## Part 3: Sample text and Training information

---

You can at this time, if you choose, also write out your train loop boilerplate that samples random sequences and trains your RNN. This will be helpful to have working before writing your own GRU class.

If you are finished training, or during training, and you want to sample from the network you may consider using the following function. If your RNN model is instantiated as `decoder`then this will probabilistically sample a sequence of length `predict_len`

**TODO:**

**DONE:**
* Fill out the evaluate function to generate text frome a primed string




In [None]:
def sample_outputs(output, temperature):
    """Takes in a vector of unnormalized probability weights and samples a character from the distribution"""
    # print(output.size())
    return torch.multinomial(torch.exp(output / temperature), 1)

def evaluate(prime_str='A', predict_len=100, temperature=0.8):
  ## initialize hidden state, initialize other useful variables
    # your code here
  ## /
  torch.no_grad()
  decoder.eval()
  result_str = ''
  tensor_str = char_tensor(prime_str)

  hidden = decoder.init_hidden()

  predicted = []

  next_char = char_tensor(' ')
  for i in range(predict_len):
    next_char = tensor_str[i].unsqueeze(0).unsqueeze(0) if i < len(prime_str) else next_char
    # print(next_char, next_char < len(all_characters), len(all_characters))
    predicted.append(next_char)
    output, hidden = decoder(next_char, hidden)
    next_char = sample_outputs(output, temperature)
    # predicted.append(next_char)

  predicted.append(next_char)
  # print(predicted)

  predicted = torch.cat(predicted)
  result_str = tensor_char(predicted)

  return result_str


# def scope(epoch):
#   gc.collect()  

---

## Part 4: (Create a GRU cell, requirements above)

---



---

## Part 5: Run it and generate some text!

---

Assuming everything has gone well, you should be able to run the main function in the scaffold code, using either your custom GRU cell or the built in layer, and see output something like this. I trained on the “lotr.txt” dataset, using chunk_length=200, hidden_size=100 for 2000 epochs gave.

**TODO:** 
* Create some cool output

**DONE:**



In [None]:
import time
n_epochs = 7000
print_every = 200
plot_every = 10
hidden_size = 200
n_layers = 3
lr = 0.001
 
decoder = RNN(n_characters, hidden_size, n_characters, n_layers)
decoder = decoder.cuda()
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=lr)
criterion = nn.CrossEntropyLoss()
 
start = time.time()
all_losses = []
loss_avg = 0

In [None]:

# n_epochs = 2000 
for epoch in range(1, n_epochs + 1):
  loss_ = train(*random_training_set())       
  loss_avg += loss_

  if epoch % print_every == 0:
      print('[%s (%d %d%%) %.4f]' % (time.time() - start, epoch, epoch / n_epochs * 100, loss_))
      print(evaluate('Wh', 100), '\n')

  if epoch % plot_every == 0:
      all_losses.append(loss_avg / plot_every)
      loss_avg = 0

[128.37408328056335 (200 2%) 2.6856]
Whel tuat oolerhinrpirigea. rerson Tupan herovene lin. the thee oppetif. yhar'r. ifns hor it the Zy r 

[253.24490332603455 (400 5%) 2.2482]
Whis inly and.] Whes it dall of fater of dong lisesed. [Aens. Souklusll hasto hice diellull add. Zuko 

[386.6027681827545 (600 8%) 1.6603]
Why and danges hin fornt of the fiderly to kever stokey it of soth to at Sokka a watres but hears tim 

[493.6465849876404 (800 11%) 1.6084]
Whot on down how heres. I'cross, and hond wern of led her aolled and smying! He bech mall intined, [P 

[626.6996891498566 (1000 14%) 1.2262]
Whatwer, I'm father narly upward. We ba chall,. I trumbles for a gos and leaver tree stame. [Welling  

[752.8695955276489 (1200 17%) 1.4795]
What away. The can clive is expicl roff. Appa would show in the spearturing across and Sokka, and App 

[884.8338508605957 (1400 20%) 1.3625]
What? [Aver his hand.] I'm before the side-view of her. [To poisletion.] you holding the Beet his see 

[1025.78986

In [None]:
for i in range(10):
  start_strings = [" Th", " wh", " he", " I ", " ca", " G", " lo", " ra"]
  start = random.randint(0,len(start_strings)-1)
  print(start_strings[start])
#   all_characters.index(string[c])
  print(evaluate(start_strings[start], 200), '\n')

 wh
 who the camera puried sliding pointing down to Azula. He looks at the two clouds as the the screen picks her left that the positure ensose in the side. [Shoutside the point of side of the chitters and 

 ca
 cample thanks at his painted for a point to compley at the news. Good inside the darkens under the mountainated in the corning here. She looks at Ty Lee as they look around to control a blast at Azula 

 ca
 can time to landed to this arrow. Cuts to Aang, who had a plamated reaching the doorway rises. Standing up the edgatures around the cliff considered. Cuts to Azula, who long with forth and looks to pr 

 lo
 looks at her.] This first is in the temple. Katara did an energy leaps impacts the wall, who die approaching the sourcated in the count friend. It's because this. It can find someone is trying to comm 

 G
 Grai.] If you feel this time humble going the ends up I'm indowing here. Concentrations one lost what? You ... I think I think perspective oh nothing. Zuko! [They h

In [None]:
def evaluate_sentences(prime_str='A', sentences=5, temperature=0.8):
  ## initialize hidden state, initialize other useful variables
    # your code here
  ## /
  torch.no_grad()
  decoder.eval()
  result_str = ''
  tensor_str = char_tensor(prime_str)

  hidden = decoder.init_hidden()

  predicted = []

  next_char = char_tensor(' ')
  for _i in range(sentences):
    while len(result_str) == 0 or result_str[-1] != '.':
      next_char = tensor_str[i].unsqueeze(0).unsqueeze(0) if i < len(prime_str) else next_char
      result_str += tensor_char(next_char)
      output, hidden = decoder(next_char, hidden)
      next_char = sample_outputs(output, temperature)
    result_str += ' '
  result_str += tensor_char(next_char)


  return result_str

print(evaluate_sentences("H"), '\n')
print(evaluate("Zuko and Aang ", 300), '\n')

 Sokka up the was boudly.  Azula lifts the first pulls up a rock toward the last walks onto the room.  He luts and looks at a mower who is amproaching the city.  Aang proceeds and narrows and arm continues to the blast of his return passed again.  Cuts to shot of the two ship turns out of the sandbence.   

Zuko and Aang as the camera shows her latter before the four machine and stop well in the armor. Cuts to shot of Azula. Because the camera zooms to the circle as Painted Lady animal through this time. [Annoyance the think coans.] I that didn't master this ice on here, but you're talking before. I let 



---

## Part 6: Generate output on a different dataset

---

**TODO:**



**DONE:**
* Choose a textual dataset. Here are some [text datasets](https://www.kaggle.com/datasets?tags=14104-text+data%2C13205-text+mining) from Kaggle 

* Generate some decent looking results and evaluate your model's performance (say what it did well / not so well)



## What it did well:
* Opening and closing brackets
* Capitalizing proper names
* Narration notes in brackets, dialogue outside
* Beginning of sentence capitalizaiton

## What it did poorly:
* Loats of speelling mistkes (I did that on purpose)
* Perhaps this comes becuase of how `evaluate` works, but not always ending sentences with punctuation? (Could be different based on how I use my dataset)
* Gender consistency
* Scene consistency (needs to remember more, perhaps over a scene)