<a
href="https://colab.research.google.com/github/wingated/cs474_labs_f2019/blob/master/DL_Lab6.ipynb"
  target="_parent">
  <img
    src="https://colab.research.google.com/assets/colab-badge.svg"
    alt="Open In Colab"/>
</a>

# Sequence-to-sequence models with karpathy

### Description:
For this project, I will code up the [char-rnn model of Karpathy](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). This is a recurrent neural network that is trained probabilistically on sequences of characters, and that can then be used to sample new sequences that are like the original.

i
### Example Output:
An example of karpathy final samples are shown below (more detail in the
final section of this writeup), after 150 passes through the data.
Please generate about 15 samples for each dataset.

<code>
And ifte thin forgision forward thene over up to a fear not your
And freitions, which is great God. Behold these are the loss sub
And ache with the Lord hath bloes, which was done to the holy Gr
And appeicis arm vinimonahites strong in name, to doth piseling
And miniquithers these words, he commanded order not; neither sa
And min for many would happine even to the earth, to said unto m
And mie first be traditions? Behold, you, because it was a sound
And from tike ended the Lamanites had administered, and I say bi
</code>


In [None]:
! wget -O ./text_files.tar.gz 'https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz'
! tar -xzf text_files.tar.gz
! pip install unidecode
!pip install torch

--2024-02-17 20:23:01--  https://piazza.com/redirect/s3?bucket=uploads&prefix=attach%2Fjlifkda6h0x5bk%2Fhzosotq4zil49m%2Fjn13x09arfeb%2Ftext_files.tar.gz
Resolving piazza.com (piazza.com)... 3.215.61.73, 35.174.203.227, 23.22.156.213, ...
Connecting to piazza.com (piazza.com)|3.215.61.73|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-uploads.piazza.com/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz [following]
--2024-02-17 20:23:01--  https://cdn-uploads.piazza.com/attach/jlifkda6h0x5bk/hzosotq4zil49m/jn13x09arfeb/text_files.tar.gz
Resolving cdn-uploads.piazza.com (cdn-uploads.piazza.com)... 18.164.154.125, 18.164.154.113, 18.164.154.110, ...
Connecting to cdn-uploads.piazza.com (cdn-uploads.piazza.com)|18.164.154.125|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1533290 (1.5M) [application/x-gzip]
Saving to: ‘./text_files.tar.gz’


2024-02-17 20:23:01 (17.1 MB/s) - ‘./text_files.tar.gz’ saved 

In [None]:
import unidecode
import string
import random
import torch
import torch.nn as nn
import torch.optim as optim

all_characters = string.printable
n_characters = len(all_characters)
file = unidecode.unidecode(open('./text_files/lotr.txt').read())
file_len = len(file)
print('file_len =', file_len)

file_len = 2579888


In [None]:
chunk_len = 200

def random_chunk():
  start_index = random.randint(0, file_len - chunk_len)
  end_index = start_index + chunk_len + 1
  return file[start_index:end_index]

print(random_chunk())

id Frodo. 'If we make as good going this afternoon as we 
have done this morning, we shall have left the Downs before the Sun sets and 
be jogging on in search of a camping place.' But even as he spoke


In [None]:

# Turn string into list of longs
def char_tensor(string):
  tensor = torch.zeros(len(string)).long()
  for c in range(len(string)):
      tensor[c] = all_characters.index(string[c])
  return tensor

print(char_tensor('abcDEF'))

tensor([10, 11, 12, 39, 40, 41])


In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class GRU(nn.Module):
  def __init__(self, input_size, hidden_size, num_layers):
    super(GRU, self).__init__()

    self.input_size = input_size
    self.hidden_size = hidden_size
    self.num_layers = num_layers

    self.sigmoid = nn.Sigmoid()
    self.tanh = nn.Tanh()

    self.lin_ir = nn.Linear(self.input_size, self.hidden_size, bias=True)
    self.lin_hr = nn.Linear(self.hidden_size, self.hidden_size, bias=True)
    self.lin_iz = nn.Linear(self.input_size, self.hidden_size, bias=True)
    self.lin_hz = nn.Linear(self.hidden_size, self.hidden_size, bias=True)
    self.lin_in = nn.Linear(self.input_size, self.hidden_size, bias=True)
    self.lin_hn = nn.Linear(self.hidden_size, self.hidden_size, bias=True)

    self.b_ir = nn.Parameter(torch.zeros(hidden_size))
    self.b_hr = nn.Parameter(torch.zeros(hidden_size))
    self.b_iz = nn.Parameter(torch.zeros(hidden_size))
    self.b_hz = nn.Parameter(torch.zeros(hidden_size))
    self.b_in = nn.Parameter(torch.zeros(hidden_size))
    self.b_hn = nn.Parameter(torch.zeros(hidden_size))

  def forward(self, inputs, hidden):
    # Each layer does the following:
    # r_t = sigmoid(W_ir*x_t + b_ir + W_hr*h_(t-1) + b_hr)

    r_t = self.sigmoid(self.lin_ir(inputs) + self.b_ir + self.lin_hr(hidden) + self.b_hr)
    # z_t = sigmoid(W_iz*x_t + b_iz + W_hz*h_(t-1) + b_hz)
    z_t = self.sigmoid(self.lin_iz(inputs) + self.b_iz + self.lin_hz(hidden) + self.b_hz)
    # n_t = tanh(W_in*x_t + b_in + r_t**(W_hn*h_(t-1) + b_hn))
    n_t = self.tanh(self.lin_in(inputs) + self.b_in + (r_t**(self.lin_hn(hidden) + self.b_hn)))
    # h_(t) = (1 - z_t)**n_t + z_t**h_(t-1)
    hiddens = ((1 - z_t) * n_t) + (z_t * hidden)
    outputs = hiddens[-1:]
    # Your code here
    # Where ** is hadamard product (not matrix multiplication, but elementwise multiplication)

    return outputs, hiddens



In [None]:

class Decoder_RNN(nn.Module):
  def __init__(self, input_size, hidden_size, output_size, n_layers=1):
    super(Decoder_RNN, self).__init__()
    self.input_size = input_size
    self.hidden_size = hidden_size
    self.output_size = output_size
    self.n_layers = n_layers
    self.embedding = nn.Embedding(input_size, hidden_size)
    self.gru = GRU(hidden_size, hidden_size, num_layers=self.n_layers)
    self.out = nn.Linear(hidden_size, output_size)


  def forward(self, input_char, hidden):
        output = self.embedding(input_char).view(1,1,-1)
        output = F.relu(output)
        # output = output.unsqueeze(0)
        output, hidden = self.gru(output, hidden)
        output = self.out(output)
        return output, hidden

  def init_hidden(self):
    return torch.zeros(self.n_layers, 1, self.hidden_size)

In [None]:
def random_training_set():
  chunk = random_chunk()
  inp = char_tensor(chunk[:-1])
  target = char_tensor(chunk[1:])
  return inp, target

In [None]:
# NOTE: decoder_optimizer, decoder, and criterion will be defined below as global variables
def train(inp, target):
  ## initialize hidden layers, set up gradient and loss
    # your code here
  ## /
  #  for char_input, char_tar in zip(inp, target):

  decoder_optimizer.zero_grad()
  hidden = decoder.init_hidden()
  loss = 0
  objective = torch.nn.CrossEntropyLoss()

  for char_input,char_tar in zip(inp,target):
    char_dec,hidden = decoder.forward(char_input,hidden)
    char_hat = char_dec.squeeze(0)
    char_truth = char_tar.unsqueeze(0)

    loss += objective(char_hat, char_truth)

  loss.backward()
  decoder_optimizer.step()
  return loss.item() / len(inp)


  # more stuff here...

In [None]:
def sample_outputs(output, temperature):
    """Takes in a vector of unnormalized probability weights and samples a character from the distribution"""
    # As temperature approaches 0, this sampling function becomes argmax (no randomness)
    # As temperature approaches infinity, this sampling function becomes a purely random choice
    return torch.multinomial(torch.exp(output / temperature), 1)

def evaluate(prime_str='A', predict_len=100, temperature=0.8):
  ## initialize hidden state, initialize other useful variables
    # your code here
  ## /
  hidden = decoder.init_hidden()
  prime_tens = char_tensor(prime_str[:-1])
  print(prime_str)
  total_string = prime_str
  curr_tens = prime_tens[-1]

  for i in range(len(prime_tens)-1):
    curr_out = decoder.forward(prime_tens[i],hidden)
    hidden = curr_out[1]

  i = 0
  while i < predict_len:

    curr_out,hidden = decoder.forward(curr_tens,hidden)

    y_hat = curr_out
    y_hat = y_hat.squeeze(0)
    y_hat = y_hat.squeeze(0)


    i += 1
    curr_tens = sample_outputs(y_hat, temperature).squeeze(0)
    total_string += all_characters[curr_tens.item()]

  return total_string



In [None]:
import time
n_epochs = 1500
print_every = 200
plot_every = 10
hidden_size = 200
n_layers = 3
lr = 0.001

decoder = Decoder_RNN(n_characters, hidden_size, n_characters, n_layers)
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=lr)
criterion = nn.CrossEntropyLoss()

start = time.time()
all_losses = []
loss_avg = 0

In [None]:
# n_epochs = 2000
for epoch in range(1, n_epochs + 1):
  loss_ = train(*random_training_set())
  loss_avg += loss_

  if epoch % print_every == 0:
      print('[%s (%d %d%%) %.4f]' % (time.time() - start, epoch, epoch / n_epochs * 100, loss_))
      print(evaluate('Wh', 100), '\n')

  if epoch % plot_every == 0:
      all_losses.append(loss_avg / plot_every)
      loss_avg = 0

[78.37136936187744 (200 13%) 2.2466]
Wh
Whon thed avo sing wat 
he were an soen the thon the a 
staid nor, 'ord, not his. 

bo Emway nour loon 

[154.08296608924866 (400 26%) 2.0693]
Wh
Whtce and and 
and and the wirken ther the lfime.' 



Theare as the Fill onterled ilf. They 'ind a th 

[229.0678050518036 (600 40%) 2.0405]
Wh
Whast of this for the seling to Gimut the 
break muss a suim at shour not you couth might wat mouthe,  

[303.79496145248413 (800 53%) 2.1669]
Wh
Whe warse he beend as a gusten the stien but of reen to be the gone the maste. To rneyous the mught it 

[378.5814447402954 (1000 66%) 1.7240]
Wh
Whare to of to worred, in at have had the 

mind of the pay sin go light of my tamsed. GBeotely put to 

[453.36096382141113 (1200 80%) 1.6382]
Wh
Whe the had by was Geetwer a as athalf in the asting lan awas past assen sace 
hes, as and sterered tr 

[528.2648463249207 (1400 93%) 1.7955]
Wh
Whher it crise of mort-' hearders ever stame not this leaters. He said Sare turned 


In [None]:
for i in range(10):
  start_strings = [" Th", " wh", " he", " I ", " ca", " G", " lo", " ra"]
  start = random.randint(0,len(start_strings)-1)
  print(start_strings[start])
#   all_characters.index(string[c])
  print(evaluate(start_strings[start], 200), '\n')

 lo
 lo
 loand sloke in bots of the yan dould slone. Ato reampan to noges, start of the 
cat in you hous and the nown leastrods the took-to go like 
have to many. 'Ovey comes can tonge: I't a lippors in as it re 

 G
 G
 Gsaid Fres of is now nou shice the have? 

Sould has dis-to and the ching, just and no starring even saids were the 


Hentiel pindless 
and it sunga of trees lowch the 
of the Shoughs in the away. But 

 ca
 ca
 caound riston in. Bome the borked ares onder cain grims shaw gasting of the all in 
the with 
And like cove that of he had great more to 
Ring stowning of Elves in the birth, and the siepter not of the  

 Th
 Th
 Thhat hil and was 
the ridk the parged that the Blearry prean-the enceed in dark was bed them stome of The that now is in the had said Frodow. Gome them the foring of the Nolling of finght of the Ladbre 

 ra
 ra
 raeeps, fell, to have 
as in the King recho sooding the mong told it he had fount befulies undes. 

'Wen yices had nows and she lookeds bu

In [None]:
import time
n_epochs = 1000
print_every = 200
plot_every = 10
hidden_size = 200
n_layers = 3
lr = 0.001

decoder = Decoder_RNN(n_characters, hidden_size, n_characters, n_layers)
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=lr)
criterion = nn.CrossEntropyLoss()

start = time.time()
all_losses = []
loss_avg = 0

In [None]:
# n_epochs = 2000

file = unidecode.unidecode(open('./text_files/alma.txt').read())
file_len = len(file)
print('file_len =', file_len)

for epoch in range(1, n_epochs + 1):
  loss_ = train(*random_training_set())
  loss_avg += loss_

  if epoch % print_every == 0:
      print('[%s (%d %d%%) %.4f]' % (time.time() - start, epoch, epoch / n_epochs * 100, loss_))
      print(evaluate('Wh', 100), '\n')

  if epoch % plot_every == 0:
      all_losses.append(loss_avg / plot_every)
      loss_avg = 0

file_len = 466656
[82.60434460639954 (200 20%) 1.9768]
Wh
Wh now Lacreperor wain ngthe wame dould se dorss and amand the sord, as the dey dey thes the min thaov 

[164.73279690742493 (400 40%) 1.7111]
Wh
Whings and they the sall tho kin to to the paest oth of that the dod, to ka the lard touk would in did 

[246.23968195915222 (600 60%) 1.4564]
Wh
Whren of the againt the word the Leren to pered the puses; grous that thou becould not men a mans, and 

[321.7609705924988 (800 80%) 1.7018]
Wh
Whed say bu words arnig: Amalies of the ware ent all but be inty ow the pely, as to bexce that in to t 

[396.7105941772461 (1000 100%) 1.7716]
Wh
Whing these ainsea, the year, and theer of there this were have to were this spopes, and desire lading 



After running the text from scripture, it did pretty well.  Learning slowed around a loss of 1.6 and it didn't get much better than that.  This is after running it several times that I observed this.  It picks up key words from alma like Lamanites(not in this iteration), and basic sentence structure, but it struggles to make an actual passage that looks convincing or recognizable as from Alma.  Pretty cool over all.