<a 
href="https://colab.research.google.com/github/wingated/cs474_labs_f2019/blob/master/DL_Lab6.ipynb"
  target="_parent">
  <img
    src="https://colab.research.google.com/assets/colab-badge.svg"
    alt="Open In Colab"/>
</a>

# Sequence-to-sequence models

### Description:
For this lab, you will code up the [char-rnn model of Karpathy](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). This is a recurrent neural network that is trained probabilistically on sequences of characters, and that can then be used to sample new sequences that are like the original.

This lab will help you develop several new skills, as well as understand some best practices needed for building large models. In addition, we'll be able to create networks that generate neat text!



In [19]:
! wget -O ./text_files.tar.gz 'https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz' 
! tar -xzf text_files.tar.gz
! pip install unidecode
! pip install torch

import unidecode
import string
import random
import re
 
import pdb
import gc
 
all_characters = string.printable
n_characters = len(all_characters)
file = unidecode.unidecode(open('./Empire_Strikes_Back.txt').read())
file_len = len(file)
print('file_len =', file_len)

--2023-02-19 04:00:45--  https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz
Resolving piazza.com (piazza.com)... 52.205.194.150, 18.235.185.127, 18.215.222.38, ...
Connecting to piazza.com (piazza.com)|52.205.194.150|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-uploads.piazza.com/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz [following]
--2023-02-19 04:00:45--  https://cdn-uploads.piazza.com/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz
Resolving cdn-uploads.piazza.com (cdn-uploads.piazza.com)... 52.84.18.123, 52.84.18.115, 52.84.18.16, ...
Connecting to cdn-uploads.piazza.com (cdn-uploads.piazza.com)|52.84.18.123|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1533290 (1.5M) [application/x-gzip]
Saving to: ‘./text_files.tar.gz’


2023-02-19 04:00:45 (22.2 MB/s) - ‘./text_files.tar.gz’ saved [15

In [20]:
chunk_len = 200
 
def random_chunk():
  start_index = random.randint(0, file_len - chunk_len)
  end_index = start_index + chunk_len + 1
  return file[start_index:end_index]
  
print(random_chunk())

O: Your Highness, we must take this last transport. It's our  only hope.
LEIA: (to controller) Send all troops in sector twelve to the south  slope to protect the fighters.
ANNOUNCER: (over loudspeaker


In [21]:
import torch
# Turn string into list of longs
def char_tensor(string):
  tensor = torch.zeros(len(string)).long()
  for c in range(len(string)):
      tensor[c] = all_characters.index(string[c])
  return tensor

print(char_tensor('abcDEF'))

tensor([10, 11, 12, 39, 40, 41])


---

## Part 4: Creating your own GRU cell 

**(Come back to this later - its defined here so that the GRU will be defined before it is used)**

---

The cell that you used in Part 1 was a pre-defined Pytorch layer. Now, write your own GRU class using the same parameters as the built-in Pytorch class does.

Please try not to look at the GRU cell definition. The answer is right there in the code, and in theory, you could just cut-and-paste it. This bit is on your honor!


In [6]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class GRU(nn.Module):
  def __init__(self, input_size, hidden_size, num_layers):
    super(GRU, self).__init__()
    self.__dict__.update(locals())

    # Define weight matrices for update gate, reset gate, and new memory cell
    self.update_gate_input_weights = nn.Linear(input_size, hidden_size)
    self.update_gate_hidden_weights = nn.Linear(hidden_size, hidden_size)
    self.reset_gate_input_weights = nn.Linear(input_size, hidden_size)
    self.reset_gate_hidden_weights = nn.Linear(hidden_size, hidden_size)
    self.new_memory_cell_input_weights = nn.Linear(input_size, hidden_size)
    self.new_memory_cell_hidden_weights = nn.Linear(hidden_size, hidden_size)

  def forward(self, inputs, hidden):
    current_input = inputs
    updated_hidden_state = torch.clone(hidden)

    # Iterate over each layer and compute the output and hidden state
    for layer_index in range(self.num_layers):
      current_hidden_state = hidden[layer_index].unsqueeze(0)

      # Compute update gate, reset gate, and new memory cell
      update_gate = torch.sigmoid(
          self.update_gate_input_weights(current_input) + 
          self.update_gate_hidden_weights(current_hidden_state)
      )
      reset_gate = torch.sigmoid(
          self.reset_gate_input_weights(current_input) +
          self.reset_gate_hidden_weights(current_hidden_state)
      )
      new_memory_cell = torch.tanh(
          self.new_memory_cell_input_weights(current_input) +
          reset_gate * self.new_memory_cell_hidden_weights(current_hidden_state)
      )

      # Compute the output of this layer as a weighted combination of old and new memory cells
      layer_output = (1 - update_gate) * new_memory_cell + update_gate * current_hidden_state

      current_input = torch.clone(layer_output)
      updated_hidden_state[layer_index] = torch.clone(layer_output.squeeze(0))

    return current_input, updated_hidden_state


---

##  Part 1: Building a sequence to sequence model

---

Great! We have the data in a useable form. We can switch out which text file we are reading from, and trying to simulate.

We now want to build out an RNN model, in this section, we will use all built in Pytorch pieces when building our RNN class.


In [7]:
class RNN(nn.Module):
  def __init__(self, input_dim, hidden_dim, output_dim, num_layers=1):
    super(RNN, self).__init__()
    self.input_dim = input_dim
    self.hidden_dim = hidden_dim
    self.output_dim = output_dim
    self.num_layers = num_layers
    
    # initialize layers
    self.out_layer = nn.Linear(hidden_dim, output_dim)
    self.softmax_layer = nn.LogSoftmax(dim=1)
    self.embedding_layer = nn.Embedding(input_dim, hidden_dim)
    self.gru_layer = nn.GRU(hidden_dim, hidden_dim, num_layers=num_layers)

  def forward(self, input_token, hidden_state):
    # embed the input token
    embedded = self.embedding_layer(input_token).view(1, 1, -1)
    
    # pass through the GRU layer
    output, hidden_state = self.gru_layer(embedded, hidden_state)
    
    # pass through the output layer
    output = self.softmax_layer(self.out_layer(output[0]))
    
    return output, hidden_state

  def init_hidden(self):
    # initialize the hidden state with zeros
    return torch.zeros(self.num_layers, 1, self.hidden_dim)


In [8]:
def random_training_set():    
  chunk = random_chunk()
  inp = char_tensor(chunk[:-1])
  target = char_tensor(chunk[1:])
  return inp, target

---

## Part 2: Sample text and Training information

---

We now want to be able to train our network, and sample text after training.

This function outlines how training a sequence style network goes. 


In [9]:
# NOTE: decoder_optimizer, decoder, and criterion will be defined below as global variables
def train(input_sequence, target_sequence):
  """
  Train the decoder model on a given input and target sequence.

  Args:
  - input_sequence: tensor of shape (seq_len, batch_size) representing the input sequence
  - target_sequence: tensor of shape (seq_len, batch_size) representing the target sequence

  Returns:
  - loss: average loss over the input sequence
  """

  # Move data to GPU
  input_sequence = input_sequence.cuda()
  target_sequence = target_sequence.cuda()

  # Initialize hidden state, set up gradient and loss
  decoder_optimizer.zero_grad()
  hidden_state = decoder.init_hidden()
  hidden_state = hidden_state.cuda()
  loss = 0

  # Loop over the input sequence and pass through decoder
  for i in range(len(input_sequence)):
    output, hidden_state = decoder(input_sequence[i], hidden_state)
    loss += criterion(output, target_sequence[i].unsqueeze(0).long())

  # Compute average loss over the input sequence
  avg_loss = loss / len(input_sequence)

  # Backpropagate and update weights
  avg_loss.backward()
  decoder_optimizer.step()

  # Collect garbage to free up memory
  gc.collect()

  return avg_loss


---

## Part 3: Sample text and Training information

---

You can at this time, if you choose, also write out your train loop boilerplate that samples random sequences and trains your RNN. This will be helpful to have working before writing your own GRU class.

If you are finished training, or during training, and you want to sample from the network you may consider using the following function. If your RNN model is instantiated as `decoder`then this will probabilistically sample a sequence of length `predict_len`


In [10]:
def sample_outputs(output_tensor, temperature):
    """Takes in a tensor of unnormalized probability weights and samples a character from the distribution"""
    # As temperature approaches 0, this sampling function becomes argmax (no randomness)
    # As temperature approaches infinity, this sampling function becomes a purely random choice
    return torch.multinomial(torch.exp(output_tensor / temperature), 1)

def evaluate(prime_str='A', predict_len=100, temperature=0.8):
    # Initialize the hidden state
    hidden_state = decoder.init_hidden()
    hidden_state = hidden_state.cuda()
    
    # Convert the prime string to a tensor
    prime_tensor = char_tensor(prime_str)

    # Create an empty output string
    output_str = ""

    # Use the RNN to generate a sequence of characters
    for c in prime_tensor:
        c = c.cuda()
        output_tensor, hidden_state = decoder(c, hidden_state)
        output_str += all_characters[c]

    # Sample the next character using the output tensor and temperature
    char = sample_outputs(output_tensor, temperature)
    output_str += all_characters[char]

    for _ in range(predict_len):
        # Use the RNN to generate the next character based on the previous one
        output_tensor, hidden_state = decoder(char, hidden_state)
        char = sample_outputs(output_tensor, temperature)
        output_str += all_characters[char]

        gc.collect()
    
    return output_str


---

## Part 4: (Create a GRU cell, requirements above)

---
See Above


---

## Part 5: Run it and generate some text!

---



In [11]:
import time
n_epochs = 5000
print_every = 200
plot_every = 10
hidden_size = 200
n_layers = 3
lr = 0.001
 
decoder = RNN(n_characters, hidden_size, n_characters, n_layers)
decoder.cuda()
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=lr)
criterion = nn.CrossEntropyLoss()
 
start = time.time()
all_losses = []
loss_avg = 0

## Star Wars Episode 4 trained on 5000 epochs

In [15]:
n_epochs = 5000
for epoch in range(1, n_epochs + 1):
  loss_ = train(*random_training_set())       
  loss_avg += loss_

  if epoch % print_every == 0:
      print('[%s (%d %d%%) %.4f]' % (time.time() - start, epoch, epoch / n_epochs * 100, loss_))
      print(evaluate('Wh', 100), '\n')

  if epoch % plot_every == 0:
      all_losses.append(loss_avg / plot_every)
      loss_avg = 0

[55.644516706466675 (200 4%) 2.4834]
Whice the the thicg on?  I'm thurre the the you rote or you'lg prike it reeqsson you won wip'l the tere 

[110.88877010345459 (400 8%) 2.0227]
What.
BEN	Was wibke mang with pime womd to wepion.  Foron of wo me moll, feed the hearder.
LKE	Whive, t 

[166.90313148498535 (600 12%) 1.5227]
Wheatliding hereted your of of for got Luke about in you're that wich chanter?. we're the scamed this n 

[221.76067900657654 (800 16%) 1.3210]
What we luce easpatooning to is power.  Ellobince this tyou're
LUKE	That's my on to be is to me the sho 

[277.2848732471466 (1000 20%) 1.8521]
What suget what son you.  The planet sight, Sefted the say too kir.  Luke, the but they're goodn.?.
OAN 

[332.47766184806824 (1200 24%) 1.4516]
What?  He-was we am Peeary that wait.  At a the entater anengaage.
THREEPIO	What be a remut comuont!
WI 

[387.5787003040314 (1400 28%) 1.3492]
Whide him!
BIGGS	Luke, we'll see thing ideary and belong has light.  Dowbating to restraining there

## Star Wars Episode 4 trained on 2000 epochs

In [12]:
n_epochs = 2000
for epoch in range(1, n_epochs + 1):
  loss_ = train(*random_training_set())       
  loss_avg += loss_

  if epoch % print_every == 0:
      print('[%s (%d %d%%) %.4f]' % (time.time() - start, epoch, epoch / n_epochs * 100, loss_))
      print(evaluate('Wh', 100), '\n')

  if epoch % plot_every == 0:
      all_losses.append(loss_avg / plot_every)
      loss_avg = 0

[56.8585844039917 (200 10%) 2.2002]
Whe time you on.
HA	Thet ctels to to an us ope.  poout fo of e?  I thile sits mere in.
DEEDAJE	I're tal 

[114.11068654060364 (400 20%) 1.8987]
Where a sir?
LUKE	Thas an ely the dame on to baster to look to cing no this mader shay.
LUKE	Han you a  

[171.93142342567444 (600 30%) 1.4160]
What ontle bade deacharder of something of bad thened soon to fand never for wruntione...
THREEPIO	Red  

[230.01735305786133 (800 40%) 1.7505]
Whed jest destrol will see.
OHBEN	There the Artoo prich the dererfored of and their that the beal you c 

[287.9912483692169 (1000 50%) 1.5858]
Where in...  I want and the could be the simp o on.
RED BERU	Come it.  It's they get.  I'll father an t 

[344.7105243206024 (1200 60%) 1.5703]
Whan sir.
BPER	Get he was away.  He scut any in out of they ring in.
BEN	We've got me could have below  

[401.58359265327454 (1400 70%) 1.1625]
Where.
HAN	I'm course him.
THREEPIO	A mind, bego been here!  I'm going to your eeters.  I can gale 

#### I just ran the code you gave us, but I realized that's probably specific for LOTR so I ran it with star wars specific values later. This is Star Wars IV trained on 5000 epochs

In [13]:
for i in range(10):
  start_strings = [" Th", " wh", " he", " I ", " ca", " G", " lo", " ra"]
  start = random.randint(0,len(start_strings)-1)
  print(start_strings[start])
#   all_characters.index(string[c])
  print(evaluate(start_strings[start], 200), '\n')

 ra
 range.
OWEN	They're those approaching in the Sand People all right.
LUKE	What's that you real going to reward can fordow, it's the Rebel stiming us in the computer on the directors.  It must leamy far li 

 Th
 Threepio!
BEN	The had be with me.  I think she this operal some bither than informations.
LUKE	What are the glacse.
LUKE	It listen the be the have beamed so seconding into be seem plans the found the det 

 ra
 rafter a big before the Rebel base!
BEN	I think those brother will be the Xedine.  See you know, the shooting into run this bear quibling.
RED LEADER	Han, Luke, the Rebel base by is things.
BEN	I don't h 

 G
 Get have to things this close timuse.  I said.
LUKE	But there cull all because the main range.
GIRST TROVOICE	Blast is my back this haven't regafte them!
RED TEN	Artoo, it's nothing... at the Rebel base 

 ra
 rack me, can't make it and like them?
LUKE	What a big it!
THREEPIO	You report the Rebel shick in some back one.
THREEPIO	What is councels!  This standing

#### Star Wars IV trained on 2000 epochs

In [16]:
for i in range(10):
  start_strings = [" Th", " wh", " he", " I ", " ca", " G", " lo", " ra"]
  start = random.randint(0,len(start_strings)-1)
  print(start_strings[start])
#   all_characters.index(string[c])
  print(evaluate(start_strings[start], 200), '\n')

 lo
 lot of this.  So you'll be right three five?
GREEDO	If you've got to at going to be come for spies.  Some like do sometimes, and any threat.
HAN	You're as far system.  You'd better have madness with you. 

 lo
 long time by to be on the side of a canyon was your sad directly carries.
GREEDO	They're my right above his for you, kid.  I've known here for anything computer.
HAN	You needn't collody.
FIRST TROOPER	Do 

 lo
 locked in a choice?
GREEDO	They're mady neflic minutes, but they're going to be any mouth before of the Rebellion is crusape a time.  I think you should do who has do able to hide them!  See those one mo 

 ca
 came back to you, business.
LUKE	It's too you.  He did his place can be first-spead us than a big hurry.  He needs you.  You're podrate them from with the down side of this ship about them!  Look, and no 

 ca
 cay only hope.
HAN	What a can be approas one now.
LUKE	You can do before a compart the detention severs.
HAN	What they're made the commander.  The scan

---

## Part 6: Generate output on a different dataset

---
I sort of did this above but this I ran it all again on Star Wars 5 Data

2000 Epochs Specific to star wars stars:


In [17]:
for i in range(10):
  start_strings = ["LUKE", "BEN", "THREEPIO", "HAN", " LEIA", "VADER"]
  start = random.randint(0,len(start_strings)-1)
  print(start_strings[start])
#   all_characters.index(string[c])
  print(evaluate(start_strings[start], 200), '\n')

BEN
BEN	Oll a lot of for asking... and closing this station.
TAGGE	Go one day of your sorry, sir.  It looks like you as fancery will sometimes, but they're going to be carrying myself!
TARKIN	You may be the F 

 LEIA
 LEIA	He did you think a bad bad cnow the matter what you've been around.
HAN	Yeah, but I'm survice is the survively troops.  Look, and in right for sure the blast me times Luke.  It's not much more going t 

 LEIA
 LEIA	Luke!
LEIA	He's here!
HAN	Blast them!  See those one minute!
CREVAN	Secure them prody back home!
BIGGS	Hurry up!  Fope!
RED LEADER	Switch heartieve them!
HAN	It could make this station qurpeder.  Look 

THREEPIO
THREEPIO	The Owend on a drike, much this ship and do bolt here in him.  You're fortress to your Jady.
LEIA	Help me, Obi-Wan Kenobi and I have to the Ambassador.  The Empire heard of the first sign of an extint 

LUKE
LUKE	Are you sure them?  They're our now complote of Anchorhead so pilot... them.
HAN	ertoo!!  Cover with me!  You'll be a little rou

### I wanted to get strings that had both Vader and Ben (Obi Wan) talking so I got 5 values where that true

In [17]:
count = 0
while count < 5:
  start_strings = ["LUKE","BEN","VADER"]
  start = random.randint(0,len(start_strings)-1)
#   all_characters.index(string[c])
  #print(evaluate(start_strings[start], 200), '\n')
  a = evaluate(start_strings[start], 200)
  if a.find("BEN") != -1 and a.find("VADER") != -1:
    print(count + 1)
    print(a)
    print("\n\n")
    count += 1

1
BEN	Princess with your father me first.
LUKE	What was this detentions.
VADER	Luke, sir way out of your deflection of have don't controling on the plice all on tell the ike the be a princess.
HAN	I kides o



2
BEN	Your up the got could on the fince... the droiddverse.  If that you do you coming your fast in the comtroc deflector sad the drime.  There are with entrol.
VADER	Seems coming our of the residing time.



3
BEN	Hidden with him.
DODONNA	What's meters!
VADER	I copy, I can do you didn't R2 uhe canture...
VADER	The Force, where way droids.  You can't there's beapons!
LEIA	We've got to disturbing?
THREEPIO	Where 



4
VADER	I have this things yourself faithing up the like the Force.
T
RED LEADER	Can your ship came the seconds.  He don't want the plans of that of it.
BEN	We're same first of the blast dest coming.
RED TEN	



5
BEN	The Force this droids.
THREEPIO	We're sometimes.  I think?  They command Back thems.  I just religions good the jump to be this in the Death ThreeReped 

## Trained 2000 epochs on THE EMPIRE STRIKES BACK

In [22]:
n_epochs = 2000
for epoch in range(1, n_epochs + 1):
  loss_ = train(*random_training_set())       
  loss_avg += loss_

  if epoch % print_every == 0:
      print('[%s (%d %d%%) %.4f]' % (time.time() - start, epoch, epoch / n_epochs * 100, loss_))
      print(evaluate('Wh', 100), '\n')

  if epoch % plot_every == 0:
      all_losses.append(loss_avg / plot_every)
      loss_avg = 0

[1676779330.2258575 (200 10%) 1.4485]
What's your powers will tell to the tractor them. 
YODA: Boy, lime on, yes. 
LANDO: Cimman. Becize. Bec 

[1676779390.4318273 (400 20%) 0.9943]
What we seah. What is all right, it's all a coming out of pretty going to he train. very pesponder man. 

[1676779449.7285726 (600 30%) 1.0474]
What! 
LEIA: What  can Wait and we're a garbage. Captain Solo? 
PIETT: All come it but him? 
HAN: (not  

[1676779508.0417008 (800 40%) 0.9456]
Why we can see there oped to get us.  quite ready had.  He could be a. 
HAN: (over srized  you preses e 

[1676779566.3804438 (1000 50%) 0.8931]
What was  help you meen  all the Scoundrel, and we're  moves enough.
LUKE: He will I have what I want t 

[1676779624.8161707 (1200 60%) 1.1207]
Where and Reezed  hitthing getting carbance. 
HAN: (to Chewie) He? 
LEIA: Han, eare Imperial time.
LEIA 

[1676779682.6477792 (1400 70%) 1.0179]
What's getting the egal power to time to make sure the fleet. 
LANDO: You're a great as the fle

### Conversations between Yoda and Vader

In [23]:
count = 0
while count < 5:
  start_strings = ["LUKE","BEN","VADER","YODA"]
  start = random.randint(0,len(start_strings)-1)
#   all_characters.index(string[c])
  #print(evaluate(start_strings[start], 200), '\n')
  a = evaluate(start_strings[start], 200)
  if a.find("YODA") != -1 and a.find("VADER") != -1:
    print(count + 1)
    print(a)
    print("\n\n")
    count += 1

1
YODA: (discouraged) A Jedi under the hyperdrive.
VADER: Your standing coming ride. 
LUKE: Ben...I thought you pull hold.
LEIA: Stop! 
LUKE: (into comlink) No, Chewie! 
LEIA: You must go. 
HAN: I'll got all



2
VADER: (over loudspeaker) I stupid there. 
pIETT: Why! 
YODA: Use there. 
HAN: Captain Solo...sir, why'll come his is long to get or tall. 
HAN: See you to speeder! You want more the parts and smell ships.




3
YODA: You pulle. Only watch. 
ZEV: (into comlink) Yes, too. (laughs) Heavy around. 
YODA: Rold of the friends likely, you're almost a lot of me. Sometimes you must hope a mind. 
VADER: Release you get you 



4
YODA: (to save Ben... 
HAN: (into comlink) Hob you, that? 
VADER: (studying there's the hyperdrive. 
VADI Sound sure you all right. 
LEIA: So you're being trained so close.
LANDO: But you garbage of the Th



5
VADER: It could be done ground sett the hyperdrive.
YODA: Anoat system. Watch the coords around, you true to save Han...
LUKE: Artoo! I've looks for you,