<a 
href="https://colab.research.google.com/github/wingated/cs474_labs_f2019/blob/master/DL_Lab6.ipynb"
  target="_parent">
  <img
    src="https://colab.research.google.com/assets/colab-badge.svg"
    alt="Open In Colab"/>
</a>

# Lab 6: Sequence-to-sequence models

### Description:
For this lab, you will code up the [char-rnn model of Karpathy](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). This is a recurrent neural network that is trained probabilistically on sequences of characters, and that can then be used to sample new sequences that are like the original.

This lab will help you develop several new skills, as well as understand some best practices needed for building large models. In addition, we'll be able to create networks that generate neat text!

### Deliverable:
- Fill in the code for the RNN (using PyTorch's built-in GRU).
- Fill in the training loop
- Fill in the evaluation loop. In this loop, rather than using a validation set, you will sample text from the RNN.
- Implement your own GRU cell.
- Train your RNN on a new domain of text (Star Wars, political speeches, etc. - have fun!)

### Grading Standards:
- 20% Implementation the RNN
- 20% Implementation training loop
- 20% Implementation of evaluation loop
- 20% Implementation of your own GRU cell
- 20% Training of your RNN on a domain of your choice

### Tips:
- Read through all the helper functions, run them, and make sure you understand what they are doing
- At each stage, ask yourself: What should the dimensions of this tensor be? Should its data type be float or int? (int is called `long` in PyTorch)
- Don't apply a softmax inside the RNN if you are using an nn.CrossEntropyLoss (this module already applies a softmax to its input).

### Example Output:
An example of my final samples are shown below (more detail in the
final section of this writeup), after 150 passes through the data.
Please generate about 15 samples for each dataset.

<code>
And ifte thin forgision forward thene over up to a fear not your
And freitions, which is great God. Behold these are the loss sub
And ache with the Lord hath bloes, which was done to the holy Gr
And appeicis arm vinimonahites strong in name, to doth piseling 
And miniquithers these words, he commanded order not; neither sa
And min for many would happine even to the earth, to said unto m
And mie first be traditions? Behold, you, because it was a sound
And from tike ended the Lamanites had administered, and I say bi
</code>


---

## Part 0: Readings, data loading, and high level training

---

There is a tutorial here that will help build out scaffolding code, and get an understanding of using sequences in pytorch.

* Read the following

> * [Pytorch sequence-to-sequence tutorial](https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html) (Take note that you will not be implementing the encoder part of this tutorial.)
* [Understanding LSTM Networks](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)






In [1]:
! wget -O ./text_files.tar.gz 'https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz' 
! tar -xzf text_files.tar.gz
! pip install unidecode
! pip install torch

import unidecode
import string
import random
import re
 
import pdb
 
all_characters = string.printable
n_characters = len(all_characters)
file = unidecode.unidecode(open('./text_files/lotr.txt').read())
file_len = len(file)
print('file_len =', file_len)

--2022-02-10 18:17:14--  https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz
Resolving piazza.com (piazza.com)... 52.7.218.200, 18.214.211.171, 3.221.126.233, ...
Connecting to piazza.com (piazza.com)|52.7.218.200|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-uploads.piazza.com/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz [following]
--2022-02-10 18:17:14--  https://cdn-uploads.piazza.com/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz
Resolving cdn-uploads.piazza.com (cdn-uploads.piazza.com)... 99.84.110.25, 99.84.110.97, 99.84.110.19, ...
Connecting to cdn-uploads.piazza.com (cdn-uploads.piazza.com)|99.84.110.25|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1533290 (1.5M) [application/x-gzip]
Saving to: ‘./text_files.tar.gz’


2022-02-10 18:17:14 (80.7 MB/s) - ‘./text_files.tar.gz’ saved [15332

In [2]:
chunk_len = 200
 
def random_chunk():
  start_index = random.randint(0, file_len - chunk_len)
  end_index = start_index + chunk_len + 1
  return file[start_index:end_index]
  
print(random_chunk())

 he set foot upon the far bank of Silverlode a strange 
feeling had come upon him, and it deepened as he walked on into the Naith: 
it seemed to him that he had stepped over a bridge of time into a cor


In [3]:
import torch
# Turn string into list of longs
def char_tensor(string):
  tensor = torch.zeros(len(string)).long()
  for c in range(len(string)):
      tensor[c] = all_characters.index(string[c])
  return tensor

print(char_tensor('abcDEF'))

tensor([10, 11, 12, 39, 40, 41])


---

## Part 4: Creating your own GRU cell 

**(Come back to this later - its defined here so that the GRU will be defined before it is used)**

---

The cell that you used in Part 1 was a pre-defined Pytorch layer. Now, write your own GRU class using the same parameters as the built-in Pytorch class does.

Please try not to look at the GRU cell definition. The answer is right there in the code, and in theory, you could just cut-and-paste it. This bit is on your honor!

**TODO:**
* Create a custom GRU cell

**DONE:**



In [4]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn.parameter import Parameter
import torch.optim as optim




class GRU(nn.Module):
  def __init__(self, input_size, hidden_size, num_layers):
    super(GRU, self).__init__()
    
    self.__dict__.update(locals())
   
    self.W = nn.ModuleList()
    self.B = nn.ParameterList()

    for l in range(num_layers):
      self.W_ir, self.W_iz, self.W_in, = nn.Linear(self.input_size, self.hidden_size),nn.Linear(self.input_size, self.hidden_size),nn.Linear(self.input_size, self.hidden_size)
      self.W_hz, self.W_hn, self.W_hr = nn.Linear(self.hidden_size, self.hidden_size),nn.Linear(self.hidden_size, self.hidden_size),nn.Linear(self.hidden_size, self.hidden_size)
      for w in [self.W_ir, self.W_hr, self.W_iz,self.W_hz, self.W_in, self.W_hn]:
        self.W.append(w)
      
      self.b_ir =  self.b_iz =  self.b_in  =  self.b_hr =  self.b_hz =   self.b_hn = Parameter(torch.Tensor(1, 
                            self.hidden_size))
      for b in [self.b_ir, self.b_iz, self.b_in,  self.b_hr,  self.b_hz, self.b_hn]:
        torch.nn.init.xavier_uniform_(b, gain=1.0)

    self.s1 = nn.Sigmoid()
    self.s2 = nn.Sigmoid()
    self.t1 = nn.Tanh()

  def forward(self, inputs, hidden):
    hidden_new = []
    for i, l in enumerate(range(self.num_layers)):
      #print(torch.matmul(inputs, self.W_ir).shape, torch.matmul(hidden, self.W_hr ).shape, self.b_hr.shape)
      r_t = self.s1(self.W[0+l*i](inputs) + self.W[1+l*i](hidden[i]) + self.b_hr)
      z_t = self.s2(self.W[2+l*i](inputs) + self.W[3+l*i](hidden[i])   + self.b_hz)
      n_t = self.t1(self.W[4+l*i](inputs) + torch.mul( r_t, self.W[5+l*i](hidden[i]) + self.b_hn) + self.b_hn)
      h_t = torch.mul(1 - z_t, n_t) + torch.mul(z_t, hidden[i])
      hidden_new.append(h_t)
    hidden = torch.cat(hidden_new)
    outputs = h_t
    
    return outputs, hidden
  


---

##  Part 1: Building a sequence to sequence model

---

Great! We have the data in a useable form. We can switch out which text file we are reading from, and trying to simulate.

We now want to build out an RNN model, in this section, we will use all built in Pytorch pieces when building our RNN class.


**TODO:**
* Create an RNN class that extends from nn.Module.

**DONE:**



In [20]:
class RNN(nn.Module):
  def __init__(self, input_size, hidden_size, output_size, n_layers=1):
    super(RNN, self).__init__()
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.output_size = output_size
    self.n_layers = n_layers
    
    # more stuff here...
    self.GRU = GRU(input_size, hidden_size, n_layers)
    self.embedding = nn.Embedding(output_size, hidden_size)
    self.out = nn.Linear(hidden_size, output_size)
    self.softmax = nn.LogSoftmax(dim=1)

  def forward(self, input_char, hidden):
    # by reviewing the documentation, construct a forward function that properly uses the output of the GRU
    output = self.embedding(input_char).view(1, 1, -1)
    output = F.relu(output)

    out_decoded, hidden = self.GRU(output, hidden)
    
    out_decoded = self.out(out_decoded[0])

    return out_decoded, hidden

  def init_hidden(self):
    return torch.zeros(self.n_layers, 1, self.hidden_size)

In [21]:
def random_training_set():    
  chunk = random_chunk()
  inp = char_tensor(chunk[:-1])
  target = char_tensor(chunk[1:])
  return inp, target

---

## Part 2: Sample text and Training information

---

We now want to be able to train our network, and sample text after training.

This function outlines how training a sequence style network goes. 

**TODO:**
* Fill in the pieces.

**DONE:**




In [98]:
# NOTE: decoder_optimizer, decoder, and criterion will be defined below as global variables

n_hidden = 128

### hello my name is ____ bob 
### coverts to tenvector tensor vector, then you model model will bedm 
### training text, Take a random section from the text 
### inp = i went to the store today
### target =  went to the store today.

input_size, hidden_size, output_size = 100,100,n_characters
learning_rate = 0.001
rnn = RNN(input_size, hidden_size, output_size)
decoder_optimizer = torch.optim.Adam(rnn.parameters(), lr=learning_rate)
objective = nn.CrossEntropyLoss()
def train(inp, target):
  decoder_optimizer.zero_grad()
  hidden = rnn.init_hidden()
  loss = 0

  for i in range(inp.size()[0]):

      output, hidden = rnn(inp[i], hidden)
      loss += objective(output, target[i].unsqueeze(0))
  loss.backward()
  decoder_optimizer.step()
  return  loss.item()



    

---

## Part 3: Sample text and Training information

---

You can at this time, if you choose, also write out your train loop boilerplate that samples random sequences and trains your RNN. This will be helpful to have working before writing your own GRU class.

If you are finished training, or during training, and you want to sample from the network you may consider using the following function. If your RNN model is instantiated as `decoder`then this will probabilistically sample a sequence of length `predict_len`

**TODO:**
* Fill out the evaluate function to generate text frome a primed string

**DONE:**



In [23]:
def sample_outputs(output, temperature):
    """Takes in a vector of unnormalized probability weights and samples a character from the distribution"""
    return torch.multinomial(torch.exp(output / temperature), 1)

def evaluate(prime_str='A', predict_len=100, temperature=.5):
  ## initialize hidden state, initialize other useful variables
    # your code here
  ## /
  hidden = rnn.init_hidden()
  input = char_tensor(prime_str)
  for i in range(len(input[:-1])):
      output, hidden = rnn(input[i], hidden)

  input = input[-1]
  for i in range(predict_len):
      output, hidden = rnn(input, hidden)
      new_char = all_characters[sample_outputs(output,temperature).item()]
      prime_str+= new_char
      input = char_tensor(new_char)
      #print(prime_str)

  return prime_str
  

---

## Part 4: (Create a GRU cell, requirements above)

---



---

## Part 5: Run it and generate some text!

---


**TODO:** 
* Create some cool output

**DONE:**




Assuming everything has gone well, you should be able to run the main function in the scaffold code, using either your custom GRU cell or the built in layer, and see output something like this. I trained on the “lotr.txt” dataset, using chunk_length=200, hidden_size=100 for 2000 epochs. These are the results, along with the prime string:




In [24]:
import time
n_epochs = 5000
print_every = 20
plot_every = 10
hidden_size = 200
n_layers = 3

 
#decoder = RNN(n_characters, hidden_size, n_characters, n_layers)

#criterion = nn.CrossEntropyLoss()
 
start = time.time()
all_losses = []
loss_avg = 0

In [25]:
# n_epochs = 2000
for epoch in range(1, n_epochs + 1):
  loss_ = train(*random_training_set())    
  loss_avg += loss_
  

  if epoch % print_every == 0:
      print('[%s (%d %d%%) %.4f]' % (time.time() - start, epoch, epoch / n_epochs * 100, loss_))
      print(evaluate('Wh', 100), '\n')

  if epoch % plot_every == 0:
      all_losses.append(loss_avg / plot_every)
      loss_avg = 0

[4.374957323074341 (20 0%) 625.6779]
Wh   i alhnne eat ae e ennnnJ  o e s ine e   etlr deia   nc dh n hdi en o thae e ihlon en i  ea n   hh 

[8.464582443237305 (40 0%) 620.8953]
Wh taoee   o . h eo ooe de  at ow s s sane tre  saf e ttll   t n ao  t thes  uam t  waa i o s n  t ete 

[12.24028205871582 (60 1%) 568.5573]
Wh t t tte e   he d t hr ata ahe th n he ae  ha tn.  
f ar he t  e thstloni h mtt it th mote heo t t t 

[16.061116695404053 (80 1%) 549.4240]
Wharth s  al orai tohe he thre 
ao we tie  the s at eth te twe i h ou no thn the te the tline to f he  

[19.87338876724243 (100 2%) 539.5125]
When te uanasrn tre thent w nao raea at aso 
af thehe he tanid and nd are noe d, se gin ind sas ans t  

[23.855522632598877 (120 2%) 520.1741]
Whe fe the the the 
o the gol de ad toat hero ano t hass ing the the the ait hhe me tthe oine foe the  

[27.724486589431763 (140 2%) 496.8302]
Whed wed rinwind awer ante the the h theet the ans anon the 
and ant here the the cee ad fo the a
ewe  



NameError: ignored

In [26]:
for i in range(10):
  start_strings = [" Th", " wh", " he", " I ", " ca", " G", " lo", " ra"]
  start = random.randint(0,len(start_strings)-1)
  print(start_strings[start])
#   all_characters.index(string[c])
  print(evaluate(start_strings[start], 200), '\n')

 Th
 There was the same out they took the great came the 
things that the 
sound the last the dreat of the dark and have the riders be called on the 
beather that sound of the walk of the passed the Roots, t 

 he
 he do not was that hell, and the hobbits of the end the Sam see better to the was that was a sunder of the Roother the said the drider to the now the now the soon the hight a that stranged but to do not 

 ca
 can the 











































































Chore far. And 
thought the fear and stalls and the sound that you have so shound the spoting a 
into the 
dear of the 

 Th
 Through the still to looked the rook that was but a shadow that we said the near back 
and the wonder of the 
they that good they still to the sound a still that here and the fire the last or strouses t 

 wh
 where to long in the word of the River of the right of the fing the 
door shast that be the might that the great the stood the she wear for the seemir to t

---

## Part 6: Generate output on a different dataset

---

**TODO:**

* Choose a textual dataset. Here are some [text datasets](https://www.kaggle.com/datasets?tags=14104-text+data%2C13205-text+mining) from Kaggle 

* Generate some decent looking results and evaluate your model's performance (say what it did well / not so well)

**DONE:**



In [94]:
file = unidecode.unidecode(open('/content/TrumpSpeaches.txt').read())
file_len = len(file)


In [103]:
n_epochs = 2500
print_every = 100
plot_every = 10
hidden_size = 200
n_layers = 3

 
#decoder = RNN(n_characters, hidden_size, n_characters, n_layers)

#criterion = nn.CrossEntropyLoss()
 
start = time.time()
all_losses2 = []
loss_avg = 0

In [96]:
random_chunk()

"e a look at the deal he's making with Iran. He makes that deal, Israel maybe won't exist very long. It's a disaster, and we have to protect Israel. But...\nSo we need people -- I'm a free trader. But th"

In [105]:
# n_epochs = 2000
for epoch in range(1, n_epochs + 1):
  loss_ = train(*random_training_set())    
  loss_avg += loss_
  

  if epoch % print_every == 0:
      print('[%s (%d %d%%) %.4f]' % (time.time() - start, epoch, epoch / n_epochs * 100, loss_))
      print(evaluate('HUGE', 100), '\n')

  if epoch % plot_every == 0:
      all_losses.append(loss_avg / plot_every)
      loss_avg = 0

[205.04574537277222 (100 4%) 350.9297]
HUGE BEMENE IT DINT THE DE BBE OU THE THE DAN THEYT I TTOE AVNE WE SO THE THE NOUUT THE EREE ALET THAT U 

[224.04309678077698 (200 8%) 375.5424]
HUGE THE THE ORE THE THE GOT THE TO THE EERE OOU THE ANET THE TRE DHERS TO OUN THE THE THE ORE THE ANEE  

[243.27290296554565 (300 12%) 280.8697]
HUGE OUS THE OULE OOU won't I can the very the waring country very be the se the want to the can the wil 

[262.2316403388977 (400 16%) 467.0982]
HUGE ING AIN THE BOUM GOU CORE ING UNG THAN GOE FOME OUT WE THO UNT ANG I TING ANG ARE HEND ING THAT TE  

[281.3165030479431 (500 20%) 354.3181]
HUGE TO BENT HAVE THE AN HENG THE LOY HE ERE SEAN AS LEN THE GEVENG TH THE THE HAVE THE FOUT TING IND WE 

[300.32251954078674 (600 24%) 305.9919]
HUGE We loth of the don't Pend the reating to the some to Madd and I was and I think in that and the ric 

[319.25694704055786 (700 28%) 348.7767]
HUGE I SOTHE SOREN OU THE WE HAN THE WE WE BE NDO PEON WE HANG THEN WE WE WE WOUR 

In [None]:
###Paragraph 


We see that The Model is ok at figure out words. Other that the model is pretty bad at actually forming any coherant sentances. Marcov chains do a much better job at forming possible sentacnce that reselmble images. The model does improve overtime but seems to bottom for the most part. 