<a 
href="https://colab.research.google.com/github/wingated/cs474_labs_f2019/blob/master/DL_Lab6.ipynb"
  target="_parent">
  <img
    src="https://colab.research.google.com/assets/colab-badge.svg"
    alt="Open In Colab"/>
</a>

# Sequence-to-sequence models

### Description:
I developed this lab using code from the [char-rnn model of Karpathy](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). This is a recurrent neural network that is trained probabilistically on sequences of characters, and that can then be used to sample new sequences that are like the original.


### Example Output:
An example of my final samples are shown below (more detail in the
final section of this writeup), after 150 passes through the data.

<code>
And ifte thin forgision forward thene over up to a fear not your
And freitions, which is great God. Behold these are the loss sub
And ache with the Lord hath bloes, which was done to the holy Gr
And appeicis arm vinimonahites strong in name, to doth piseling 
And miniquithers these words, he commanded order not; neither sa
And min for many would happine even to the earth, to said unto m
And mie first be traditions? Behold, you, because it was a sound
And from tike ended the Lamanites had administered, and I say bi
</code>


In [1]:
! wget -O ./text_files.tar.gz 'https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz' 
! tar -xzf text_files.tar.gz
! pip install unidecode
! pip install torch

import unidecode
import string
import random
import re
 
import pdb
 
all_characters = string.printable
n_characters = len(all_characters)
file = unidecode.unidecode(open('./text_files/lotr.txt').read())
file_len = len(file)
print('file_len =', file_len)

--2023-02-19 04:23:17--  https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz
Resolving piazza.com (piazza.com)... 18.215.222.38, 52.205.194.150, 52.206.193.161, ...
Connecting to piazza.com (piazza.com)|18.215.222.38|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-uploads.piazza.com/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz [following]
--2023-02-19 04:23:18--  https://cdn-uploads.piazza.com/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz
Resolving cdn-uploads.piazza.com (cdn-uploads.piazza.com)... 18.161.111.39, 18.161.111.44, 18.161.111.80, ...
Connecting to cdn-uploads.piazza.com (cdn-uploads.piazza.com)|18.161.111.39|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1533290 (1.5M) [application/x-gzip]
Saving to: ‘./text_files.tar.gz’


2023-02-19 04:23:19 (1.40 MB/s) - ‘./text_files.tar.gz’ saved

In [2]:
chunk_len = 200
 
def random_chunk():
  start_index = random.randint(0, file_len - chunk_len)
  end_index = start_index + chunk_len + 1
  return file[start_index:end_index]
  
print(random_chunk())


unwatched. His legs were securely bound, but his arms were only tied about 
the wrists, and his hands were in front of him. He could move them both 
together, though the bonds were cruelly tight. He p


In [3]:
import torch
import torch.nn as nn
import torch.nn.functional as F

In [4]:
# Turn string into list of longs
def char_tensor(string):
  tensor = torch.zeros(len(string)).long()
  for c in range(len(string)):
      tensor[c] = all_characters.index(string[c])
  return tensor

print(char_tensor('abcDEF'))

tensor([10, 11, 12, 39, 40, 41])


---

## Part 4: Creating a GRU cell 

---

Custom GRU class using the same parameters as the built-in Pytorch class does.

In [5]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class GRU(nn.Module):
  def __init__(self, input_size, hidden_size, num_layers):
    super(GRU, self).__init__()
    # make the weights based on the parameter sizes
    
    self.num_layers = num_layers
    self.HR = nn.ModuleList([nn.Linear(hidden_size, hidden_size) 
                                       for k in range(num_layers)])
    self.IR = nn.ModuleList([nn.Linear(input_size, hidden_size) if k == 0 
                             else nn.Linear(hidden_size, hidden_size) 
                             for k in range(num_layers)])
    
    self.HZ = nn.ModuleList([nn.Linear(hidden_size, hidden_size) 
                                       for k in range(num_layers)])
    self.IZ = nn.ModuleList([nn.Linear(input_size, hidden_size) if k == 0 
                             else nn.Linear(hidden_size, hidden_size) 
                             for k in range(num_layers)])
    
    self.HH = nn.ModuleList([nn.Linear(hidden_size, hidden_size) 
                                       for k in range(num_layers)])
    self.IH = nn.ModuleList([nn.Linear(input_size, hidden_size) if k == 0 
                             else nn.Linear(hidden_size, hidden_size) 
                             for k in range(num_layers)])
  
  def forward(self, inputs, hidden):

    updated_hiddens = []
    xT = inputs

    for k, prev in enumerate(hidden):
      r_t = torch.sigmoid(self.IR[k](xT) + self.HR[k](prev))
      z_t = torch.sigmoid(self.IZ[k](xT) + self.HR[k](prev))
      n_t = torch.tanh(self.IH[k](xT) + r_t * self.HH[k](prev))
      h_t = (1 - z_t) * n_t + z_t * prev

      xT = h_t
      updated_hiddens.append(h_t.unsqueeze(0))
    return xT, torch.cat(updated_hiddens, 0)

---

##  Part 1: Building a sequence to sequence model

---

Great! We have the data in a useable form. We can switch out which text file we are reading from and trying to simulate.

We now want to build out an RNN model, in this section, we will use all built in Pytorch pieces when building our RNN class.

In [6]:
# PART 1

class RNN(nn.Module):
  def __init__(self, input_size, hidden_size, output_size, n_layers=1):
    super(RNN, self).__init__()
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.output_size = output_size
    self.n_layers = n_layers
    
    self.embedding = nn.Embedding(output_size, hidden_size)
    self.gru = GRU(hidden_size, hidden_size, self.n_layers)
    self.out = nn.Linear(hidden_size, output_size)

  # typically expects a sequence, but this function takes a single character (tensor)
  # with the hidden (tensor)
  # should return the updated hidden and tensor of probabilities same size as input_char
  
  def forward(self, input_char, hidden):
    output = self.embedding(input_char).unsqueeze(0)
    output = F.relu(output)
    output, hidden = self.gru(output, hidden)
    output = self.out(output)
    return output, hidden
    
  def init_hidden(self):
    return torch.zeros(self.n_layers, 1, self.hidden_size)

In [7]:
def random_training_set():    
  chunk = random_chunk()
  inp = char_tensor(chunk[:-1])
  target = char_tensor(chunk[1:])
  return inp, target

# Train

## Part 2: Sample text and Training information

---

We now want to be able to train our network, and sample text after training.

This function outlines how training a sequence style network goes. 

In [8]:
# PART 2

# NOTE: decoder_optimizer, decoder, and criterion will be defined below as global variables
def train(inp, target):
  # initialize hidden layers, set up gradient and loss 
  decoder_optimizer.zero_grad()
  hidden = decoder.init_hidden()
  loss = 0
  length = len(inp)

  # how to handle a sequence of information. 
  # Use a for loop in the training function to iterate over characters
  for character, letter in zip(inp, target):
    y_hat, hidden = decoder(character, hidden)
    loss += criterion(y_hat, letter.unsqueeze(0))

  loss.backward()
  decoder_optimizer.step()
  return loss.item() / length    

---

## Part 3: Sample text and Training information

---


In [9]:
# PART 3

def sample_outputs(output, temperature):
    """Takes in a vector of unnormalized probability weights and samples a character from the distribution"""
    return torch.multinomial(torch.exp(output / temperature), 1)

def evaluate(prime_str='A', predict_len=100, temperature=0.8):
  # initialize hidden state, initialize other useful variables
  # /use a for loop to iterate over the 100 characters    
  with torch.no_grad():
    hidden_state = decoder.init_hidden()
    generated_text = prime_str
    prediction = ''

    for k in range(predict_len):

      if k < len(prime_str):
        prime = char_tensor(prime_str[k])
        output, hidden_state = decoder(prime.squeeze(), hidden_state)
      else: 
        generated_text += all_characters[prediction]
        output, hidden_state = decoder(prediction, hidden_state)

      prediction = sample_outputs(output.squeeze(0).squeeze(0), temperature)

    generated_text += all_characters[prediction]
    return generated_text


---

## Part 5: Run it and generate some text!

---
I trained on the “lotr.txt” dataset, using chunk_length=200, hidden_size=100 for 2000 epochs. These are the results, along with the prime string

---

In [10]:
# PART 5

import time
n_epochs = 5000
print_every = 200
plot_every = 10
hidden_size = 200
n_layers = 3
lr = 0.001
 
decoder = RNN(n_characters, hidden_size, n_characters, n_layers)
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=lr)
criterion = nn.CrossEntropyLoss()
 
start = time.time()
all_losses = []
loss_avg = 0

In [11]:
# n_epochs = 2000
for epoch in range(1, n_epochs + 1):
  input, targ = random_training_set()
  loss_ = train(input, targ)       
  loss_avg += loss_

  if epoch % print_every == 0:
      print('[%s (%d %d%%) %.4f]' % (time.time() - start, epoch, epoch / n_epochs * 100, loss_))
      print(evaluate('Wh', 100), '\n')

  if epoch % plot_every == 0:
      all_losses.append(loss_avg / plot_every)
      loss_avg = 0

[111.99504733085632 (200 4%) 2.5684]
Wheres wong anld and gard he hot, hur the wel 
tod cI hasy geft tand, reiln 
vas annd Imer or wecolt  

[218.788715839386 (400 8%) 2.0271]
Whainins, atse sice fon.' Treat even endorlen'sbis onet in the bow the wat pokere undase 
the herpore 

[324.88765835762024 (600 12%) 1.9141]
Whilled, and nowly, lessed yourres in't loth he deally 
the romoung. 
wive 
dowr oumale his lenk fall 

[432.4325795173645 (800 16%) 1.8566]
Wherd streen a smape grownds houred the were on the like-and did cits were in shickth sparn, unkest s 

[547.4234738349915 (1000 20%) 1.8525]
Why fars, but luded longersh, that could gland. 'I reyestrest and 
striling. 'It was the more, you he 

[663.5982022285461 (1200 24%) 1.4807]
What way passed though to be 
were way 
sword epress sat the had obe have don't go not of the for ene 

[775.0870950222015 (1400 28%) 1.4664]
When when the sumpers was or you do us; creary of some was 
not 
hope fire. 

Frodo 

arching to from 

[885.759133

KeyboardInterrupt: ignored

In [12]:
for i in range(10):
  start_strings = [" Th", " wh", " he", " I ", " ca", " G", " lo", " ra"]
  start = random.randint(0,len(start_strings)-1)
  print(start_strings[start])
#   all_characters.index(string[c])
  print(evaluate(start_strings[start], 200), '\n')

 I 
 I day 
early 
and they flasing to thin, 
and that the tought out of the morning the 
boot of the tongue hornly 
fage and percally. If I said: 'We male they down and still side,' said Fard, After at th 

 he
 he last looked it stride: in my land. Then the Baggin begany with the that is the Half, you said the hall not looked is a down 
you go to they sent about the last went of pons; and not her wells of se 

 he
 heard away one to a 
morner and weel 
bent to be long his not left of the end they way, 
the Stone Merry of the puttless with a grieved fles ahead, though the emvers in the Moon that the Ding 
felt, a 

 ca
 call, I must not we wes and until let my for his, and all the darmed 
the black the slus the Falken there, the susped for the holly 
to the rose with 
such the Fraggons of my let they were 
sound sick 

 ca
 came the other of distance his feet, and 
then them. Mord the rimposs. Which the true the fath's fear what with I should had in we 
would return and aid. "They had

---

## Part 6: Generate output on a different dataset

---


I trained my model on tweets about president Trump during the 2020 election. The model learned proper names and grammatical and syntactic structure well, but not all the sentences make very much sense and a lot of the words look made up. The model may have had trouble training because some tweets where not in English, and some contained non-latin or alphanumerical characters.

In [13]:
all_characters = string.printable
num_char = len(all_characters)
file_name = './text_files/tweets.txt'
text_file = unidecode.unidecode(open(file_name).read())
length = len(text_file)

print('File Length =', length)

File Length = 1211307


In [14]:
n_epochs = 2001
print_every = 200
hidden_size = 200
n_layers = 3
lr = 0.001

In [16]:
decoder = RNN(num_char, hidden_size, num_char, n_layers)
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=lr)
criterion = nn.CrossEntropyLoss()
 
start = time.time()
total_loss = []

for e in range(1, n_epochs + 1):
  input, targ = random_training_set()
  loss_ = train(input, targ)        

  if e % print_every == 0:
      print('[%s (%d %d%%) %.4f]' % (time.time() - start, e, e / n_epochs * 100, loss_))
      print(evaluate('Wh', 100), '\n')

for i in range(10):
  start_strings = [" Th", " wh", " he", " I ", " ca", " G", " lo", " ra"]
  start = random.randint(0,len(start_strings)-1)
  print(start_strings[start])
#   all_characters.index(string[c])
  print(evaluate(start_strings[start], 200), '\n')

[116.40340971946716 (200 9%) 2.4978]
Whe, the wo wen anl the an let 
hehe his ais rgoB soles lod etlius kanotl 
nis aind clcirguk vithd ce 

[231.13353061676025 (400 19%) 2.1435]
When slado fred thing do shand werreet he the the the 
the ridwen the like sat mere the shems, do fre 

[344.92059111595154 (600 29%) 1.8615]
Wh mast haf for for and we weras bening.' Thit a darly 
sunt.wh stert about aly his hieghing ha>ome p 

[461.94024634361267 (800 39%) 1.8642]
Wher they the Frossess and agand for and 
louds sdaws. So ethert with houve a speme in the buat the s 

[577.1062970161438 (1000 49%) 1.7767]
Whing way had not Men them med. But hought on we 
reat the cold old crongain light in the fern inroth 

[692.2199897766113 (1200 59%) 1.6700]
Why brownly thing's nespay. Then more there matt, and promisil they shast, and they with brown 
that  

[807.2993059158325 (1400 69%) 1.7416]
Where out. 

'It a sunting. Where decome, as that pery turred of this stay a stearfart,' said they gr 

[921.69