<a href="https://colab.research.google.com/github/YichengZou626/COMP590_intro_to_deep_learning/blob/main/Homework_7_(Yicheng_Zou).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Homework 6

In this homework you will be training and using a "char-RNN". This is the name given to a character-level recurrent neural network language model by [this famous blog post by Andrej Karpathy](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). Before you start on the rest of the homework, please give the blog post a read, it's quite good!

I don't expect you to implement the char-RNN from scratch. Andrej's original char-rnn is in Torch (the predecessor to PyTorch that is not commonly used anymore). Fortunately, there are many other implementations of this model available; for example, there is one (in both mxnet and pytorch) in chapters 8 and 9 of [the textbook](http://d2l.ai), and another pytorch one [here](https://github.com/spro/char-rnn.pytorch). **Please use one of these example implementations (or another one that you find) when completing this homework**.

For this homework, please complete the following steps:

1. Download and tokenize the [Shakespeare dataset](http://www.gutenberg.org/files/100/100-0.txt) at a character level. I recommend basing your solution on the following code:
```Python
# Remove non-alphabetical characters, lowercase, and replace whitespace with ' '
raw_dataset = ' '.join(re.sub('[^A-Za-z]+','', text).lower().split())
# Maps token index to character
idx_to_char = list(set(raw_dataset))
# Maps character to token index
char_to_idx = dict([(char, i) for i, char in enumerate(idx_to_char)])
# Tokenize the dataset
corpus_indices = [char_to_idx[char] for char in raw_dataset]
```
1. Train a "vanilla" RNN (as described in chapter 8 of [the textbook](http://d2l.ai)) on the Shakespeare dataset. Report the training loss and generate some samples from the model at the end of training.
1. Train a GRU RNN (as described in chapter 9 of [the textbook](http://d2l.ai)) on the Shakespeare datatset. Is the final training loss higher or lower than the vanilla RNN? Are the samples from the model more or less realistic?
1. Find a smaller, simpler dataset than the Shakespeare data (you can find some ideas in Andrej's blog post, but feel free to get creative!) and train either the vanilla or GRU RNN on it instead. Is the final training loss higher or lower than it was for the Shakespeare data?

In [None]:
import re
with open('shake.txt') as f:
  text = f.read()

# Remove non-alphabetical characters, lowercase, and replace whitespace with ' '
raw_dataset = ' '.join(re.sub('[^A-Za-z ]+','', text).lower().split())
# Maps token index to character
idx_to_char = list(set(raw_dataset))
# Maps character to token index
char_to_idx = dict([(char, i) for i, char in enumerate(idx_to_char)])
# Tokenize the dataset
corpus_indices = [char_to_idx[char] for char in raw_dataset]

In [None]:
import torch
import torch.nn as nn
from torch.autograd import Variable
from tqdm import tqdm

import string
import random
import time
import math
import torch

# Turning a string into a tensor

def char_tensor(string):
    c_list = [char_to_idx[char] for char in string]
    tensor = torch.zeros(len(string)).long()
    for c in range(len(c_list)):
        tensor[c] = c_list[c]
    return tensor

# Readable time elapsed

def time_since(since):
    s = time.time() - since
    m = math.floor(s / 60)
    s -= m * 60
    return '%dm %ds' % (m, s)

In [None]:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Use Vanilla RNN

In [None]:
class CharRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, n_layers=1):
        super(CharRNN, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.n_layers = n_layers

        self.encoder = nn.Embedding(input_size, hidden_size)
        self.rnn = nn.RNN(hidden_size, hidden_size, n_layers)
        self.decoder = nn.Linear(hidden_size, output_size)

    def forward(self, input, hidden):
        batch_size = input.size(0)
        encoded = self.encoder(input)
        output, hidden = self.rnn(encoded.view(1, batch_size, -1), hidden)
        output = self.decoder(output.view(batch_size, -1))
        return output, hidden

    def forward2(self, input, hidden):
        encoded = self.encoder(input.view(1, -1))
        output, hidden = self.rnn(encoded.view(1, 1, -1), hidden)
        output = self.decoder(output.view(1, -1))
        return output, hidden

    def init_hidden(self, batch_size):
        return Variable(torch.zeros(self.n_layers, batch_size, self.hidden_size))

In [None]:
def generate(decoder, prime_str='A', predict_len=100, temperature=0.8):
    hidden = decoder.init_hidden(1)
    prime_input = Variable(char_tensor(prime_str).unsqueeze(0))

    hidden = hidden.to(device)
    prime_input = prime_input.to(device)
    predicted = prime_str

    # Use priming string to "build up" hidden state
    for p in range(len(prime_str) - 1):
        _, hidden = decoder(prime_input[:,p], hidden)
        
    inp = prime_input[:,-1]
    
    for p in range(predict_len):
        output, hidden = decoder(inp, hidden)
        
        # Sample from the network as a multinomial distribution
        output_dist = output.data.view(-1).div(temperature).exp()
        top_i = torch.multinomial(output_dist, 1)[0]

        # Add predicted character to string and use as next input
        predicted_char = idx_to_char[top_i]
        predicted += predicted_char
        inp = Variable(char_tensor(predicted_char).unsqueeze(0))
        inp = inp.to(device)

    return predicted

In [None]:
filename = "shake.txt"
n_epochs=2000
print_every=100
hidden_size=100
n_layers=2
learning_rate=0.01
chunk_len=200
batch_size=100


file = raw_dataset
file_len = len(file)

def random_training_set(chunk_len, batch_size):
    inp = torch.LongTensor(batch_size, chunk_len)
    target = torch.LongTensor(batch_size, chunk_len)
    for bi in range(batch_size):
        start_index = random.randint(0, file_len - chunk_len)
        end_index = start_index + chunk_len + 1
        chunk = file[start_index:end_index]
        inp[bi] = char_tensor(chunk[:-1])
        target[bi] = char_tensor(chunk[1:])
    inp = Variable(inp)
    target = Variable(target)
    inp = inp.to(device)
    target = target.to(device)
    return inp, target

def train(inp, target):
    hidden = decoder.init_hidden(batch_size)
    hidden = hidden.to(device)
    decoder.zero_grad()
    loss = 0

    for c in range(chunk_len):
        output, hidden = decoder(inp[:,c], hidden)
        loss += criterion(output.view(batch_size, -1), target[:,c])

    loss.backward()
    decoder_optimizer.step()

    return loss.data / chunk_len

def save():
    save_filename = 'shake1.pt'
    torch.save(decoder, save_filename)
    print('Saved as %s' % save_filename)

# Initialize models and start training

decoder = CharRNN(
    27,
    hidden_size,
    27,
    n_layers,
)
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()


decoder.to(device)

start = time.time()
all_losses = []
loss_avg = 0

try:
    print("Training for %d epochs..." % n_epochs)
    for epoch in tqdm(range(1, n_epochs + 1)):
        loss = train(*random_training_set(chunk_len, batch_size))

        if epoch % print_every == 0:
            print('[%s (%d %d%%) %.4f]' % (time_since(start), epoch, epoch / n_epochs * 100, loss))
            print(generate(decoder, 'where', 100, 0.8), '\n')

    print("Saving...")
    save()

except KeyboardInterrupt:
    print("Saving before quit...")
    save()

Training for 2000 epochs...


  5%|▌         | 100/2000 [00:53<16:44,  1.89it/s]

[0m 53s (100 5%) 1.7837]
where wither my prothers of the strocest bose the romed thim in firtuiing and in the good youch aft the l 



 10%|█         | 200/2000 [01:46<16:21,  1.83it/s]

[1m 46s (200 10%) 1.6778]
wheres to the worst so to herelyford great be anmy ful but then and fants deseary and he thy brants sir h 



 15%|█▌        | 300/2000 [02:39<15:16,  1.85it/s]

[2m 39s (300 15%) 1.6509]
where he arm therefore mother look it nor viny the king dead on say and he hair if this lifes kivecleman  



 20%|██        | 400/2000 [03:32<14:34,  1.83it/s]

[3m 32s (400 20%) 1.6097]
where has it friends come and rebelt tlerence but and and laywhich to speaking in your bound are i enence 



 25%|██▌       | 500/2000 [04:24<13:38,  1.83it/s]

[4m 24s (500 25%) 1.6140]
where decetter my heart and what so like a prithee for him what attend the lovely in his fares it of my w 



 30%|███       | 600/2000 [05:17<13:09,  1.77it/s]

[5m 17s (600 30%) 1.6122]
where stand out the cries at their tage base and and her trues this doth into thee sir from the friends b 



 35%|███▌      | 700/2000 [06:09<11:21,  1.91it/s]

[6m 9s (700 35%) 1.5885]
where me he found gone mancation you age his dear not wirt that in thine every sin be subtle my lord wake 



 40%|████      | 800/2000 [07:01<10:42,  1.87it/s]

[7m 1s (800 40%) 1.5761]
where not in those and deliver would make she was then fought in show you i percent and this best of men  



 45%|████▌     | 900/2000 [07:53<09:56,  1.84it/s]

[7m 53s (900 45%) 1.5784]
where reason near the sake alongd me king one sirrowert you revens cry and thrower sorrows with we will b 



 50%|█████     | 1000/2000 [08:46<09:12,  1.81it/s]

[8m 46s (1000 50%) 1.5995]
where king thing and to the suptiture see that the sent forth mine well i he not writees beare but with a 



 55%|█████▌    | 1100/2000 [09:34<07:05,  2.12it/s]

[9m 34s (1100 55%) 1.5627]
where to will no mine lames thy maid the stayd didst that as that will be dead call awhile in you with my 



 60%|██████    | 1200/2000 [10:20<06:41,  1.99it/s]

[10m 20s (1200 60%) 1.5647]
where skils and he sirtreat cheeks read but tain your good and you hope thee prince you less and well wou 



 65%|██████▌   | 1300/2000 [11:15<06:59,  1.67it/s]

[11m 15s (1300 65%) 1.5668]
where was her imonry he head did all suretyce make the married of t but he were wherefore and sportingers 



 70%|███████   | 1400/2000 [12:08<05:13,  1.91it/s]

[12m 8s (1400 70%) 1.5513]
where ever and and a for this paint for your high that isue him make of this master edgarlight of some be 



 75%|███████▌  | 1500/2000 [13:01<04:42,  1.77it/s]

[13m 1s (1500 75%) 1.5483]
where of me fury entor speechisgai with my fall breath that for me me the houses for heaven water there s 



 80%|████████  | 1600/2000 [13:54<03:32,  1.88it/s]

[13m 54s (1600 80%) 1.5658]
where he this pass much so by three but isly feeds is my lord forband honest her love old exit that i hav 



 85%|████████▌ | 1700/2000 [14:48<02:46,  1.80it/s]

[14m 48s (1700 85%) 1.5611]
where not and thy father in from where of place part do with this i shall not we prose and lang and thus  



 90%|█████████ | 1800/2000 [15:42<01:46,  1.88it/s]

[15m 42s (1800 90%) 1.5521]
where and where fair him stay shame rest right this prey beare there poor exeunt of perceive your seporac 



 95%|█████████▌| 1900/2000 [16:36<00:51,  1.93it/s]

[16m 36s (1900 95%) 1.5482]
where i have all sing the her honour of love ill my still bold for her stay to the leave thee with the he 



100%|██████████| 2000/2000 [17:29<00:00,  1.91it/s]

[17m 29s (2000 100%) 1.5546]
where it say wrong poor under to marles but houset her good speed that reports be the grief thou abolut w 

Saving...
Saved as shake1.pt





In [None]:
decoder = torch.load('shake1.pt')
generate(decoder, 'where', 100, 0.8)

'where what i may how the sight what let here drimmy could for the case i don exit and to the light the he'

The training loss after 2000 epochs is 1.54.

# Use GRU RNN

In [None]:
class CharRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, model="gru", n_layers=1):
        super(CharRNN, self).__init__()
        self.model = model.lower()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.n_layers = n_layers

        self.encoder = nn.Embedding(input_size, hidden_size)
        if self.model == "gru":
            self.rnn = nn.GRU(hidden_size, hidden_size, n_layers)
        elif self.model == "lstm":
            self.rnn = nn.LSTM(hidden_size, hidden_size, n_layers)
        self.decoder = nn.Linear(hidden_size, output_size)

    def forward(self, input, hidden):
        batch_size = input.size(0)
        encoded = self.encoder(input)
        output, hidden = self.rnn(encoded.view(1, batch_size, -1), hidden)
        output = self.decoder(output.view(batch_size, -1))
        return output, hidden

    def forward2(self, input, hidden):
        encoded = self.encoder(input.view(1, -1))
        output, hidden = self.rnn(encoded.view(1, 1, -1), hidden)
        output = self.decoder(output.view(1, -1))
        return output, hidden

    def init_hidden(self, batch_size):
        if self.model == "lstm":
            return (Variable(torch.zeros(self.n_layers, batch_size, self.hidden_size)),
                    Variable(torch.zeros(self.n_layers, batch_size, self.hidden_size)))
        return Variable(torch.zeros(self.n_layers, batch_size, self.hidden_size))

In [None]:
def generate(decoder, prime_str='A', predict_len=100, temperature=0.8):
    hidden = decoder.init_hidden(1)
    prime_input = Variable(char_tensor(prime_str).unsqueeze(0))

    hidden = hidden.to(device)
    prime_input = prime_input.to(device)
    predicted = prime_str

    # Use priming string to "build up" hidden state
    for p in range(len(prime_str) - 1):
        _, hidden = decoder(prime_input[:,p], hidden)
        
    inp = prime_input[:,-1]
    
    for p in range(predict_len):
        output, hidden = decoder(inp, hidden)
        
        # Sample from the network as a multinomial distribution
        output_dist = output.data.view(-1).div(temperature).exp()
        top_i = torch.multinomial(output_dist, 1)[0]

        # Add predicted character to string and use as next input
        predicted_char = idx_to_char[top_i]
        predicted += predicted_char
        inp = Variable(char_tensor(predicted_char).unsqueeze(0))
        inp = inp.to(device)

    return predicted

In [None]:
filename = "shake.txt"
model="gru"
n_epochs=2000
print_every=100
hidden_size=100
n_layers=2
learning_rate=0.01
chunk_len=200
batch_size=100


file = raw_dataset
file_len = len(file)

def random_training_set(chunk_len, batch_size):
    inp = torch.LongTensor(batch_size, chunk_len)
    target = torch.LongTensor(batch_size, chunk_len)
    for bi in range(batch_size):
        start_index = random.randint(0, file_len - chunk_len)
        end_index = start_index + chunk_len + 1
        chunk = file[start_index:end_index]
        inp[bi] = char_tensor(chunk[:-1])
        target[bi] = char_tensor(chunk[1:])
    inp = Variable(inp)
    target = Variable(target)
    inp = inp.to(device)
    target = target.to(device)
    return inp, target

def train(inp, target):
    hidden = decoder.init_hidden(batch_size)
    hidden = hidden.to(device)
    decoder.zero_grad()
    loss = 0

    for c in range(chunk_len):
        output, hidden = decoder(inp[:,c], hidden)
        loss += criterion(output.view(batch_size, -1), target[:,c])

    loss.backward()
    decoder_optimizer.step()

    return loss.data / chunk_len

def save():
    save_filename = 'shake2.pt'
    torch.save(decoder, save_filename)
    print('Saved as %s' % save_filename)

# Initialize models and start training

decoder = CharRNN(
    27,
    hidden_size,
    27,
    model,
    n_layers,
)
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()


decoder.to(device)

start = time.time()
all_losses = []
loss_avg = 0

try:
    print("Training for %d epochs..." % n_epochs)
    for epoch in tqdm(range(1, n_epochs + 1)):
        loss = train(*random_training_set(chunk_len, batch_size))

        if epoch % print_every == 0:
            print('[%s (%d %d%%) %.4f]' % (time_since(start), epoch, epoch / n_epochs * 100, loss))
            print(generate(decoder, 'where', 100, 0.8), '\n')

    print("Saving...")
    save()

except KeyboardInterrupt:
    print("Saving before quit...")
    save()

Training for 2000 epochs...


  5%|▌         | 100/2000 [01:33<30:57,  1.02it/s]

[1m 33s (100 5%) 1.7210]
where is what chick not why glowing nother to to to me good both to th notsce from was forture scrick tru 



 10%|█         | 200/2000 [03:07<28:14,  1.06it/s]

[3m 7s (200 10%) 1.5915]
where melest down and men shall be shoil of the soul and the by a mannes to the like of father to dead wh 



 15%|█▌        | 300/2000 [04:43<26:27,  1.07it/s]

[4m 43s (300 15%) 1.5413]
where take our be here he do my such for have so and but she general faith indeed the clown in thou fear  



 20%|██        | 400/2000 [06:18<26:10,  1.02it/s]

[6m 18s (400 20%) 1.5378]
where come withal so and your bear have you a friend my forth which i do so for it is praise thou may tim 



 25%|██▌       | 500/2000 [07:53<23:32,  1.06it/s]

[7m 53s (500 25%) 1.5157]
where is the crampst some off and this base and holy the ring the great thednay indeed a man lest to thei 



 30%|███       | 600/2000 [09:28<23:17,  1.00it/s]

[9m 28s (600 30%) 1.4886]
where my lord shall be megenot dream the valention of mind of man was and from what come of for event how 



 35%|███▌      | 700/2000 [11:03<20:48,  1.04it/s]

[11m 2s (700 35%) 1.4849]
where is my bosomsdid able all gloucester auty eyes fie a will whats the world and means for thee his lif 



 40%|████      | 800/2000 [12:38<19:37,  1.02it/s]

[12m 38s (800 40%) 1.4720]
where sir she not we the basewave yongut to mine eyes if this vice they for a wanton disprisionanishd tim 



 45%|████▌     | 900/2000 [14:13<17:45,  1.03it/s]

[14m 13s (900 45%) 1.4665]
where is the lord you do be that young must a high youth valentine that sufter berowne king be with all h 



 50%|█████     | 1000/2000 [15:48<16:20,  1.02it/s]

[15m 48s (1000 50%) 1.4890]
where nor she has my lordhave she watch that nome flourishesbreaten be a rest the carbuse i have not men  



 55%|█████▌    | 1100/2000 [17:22<14:23,  1.04it/s]

[17m 22s (1100 55%) 1.4913]
where is could be a glass but heard the person the field to else i have seems richlaniusarman it content  



 60%|██████    | 1200/2000 [18:52<11:09,  1.20it/s]

[18m 52s (1200 60%) 1.4830]
where he shall i hate and that the silkd to dient make her she now exit i may they lies with a man by the 



 65%|██████▌   | 1300/2000 [20:13<09:37,  1.21it/s]

[20m 13s (1300 65%) 1.4794]
where the monamuniep of them what thought hast this this charge to you wakings are get the particulat is  



 70%|███████   | 1400/2000 [21:35<08:40,  1.15it/s]

[21m 35s (1400 70%) 1.4707]
where are pervitamine as i swear can levd the lord this lose all jook and bless brutus in my foot and out 



 75%|███████▌  | 1500/2000 [22:57<07:19,  1.14it/s]

[22m 57s (1500 75%) 1.4732]
where is not this good countried and faith for with me to have made his lord in the treason have i am for 



 80%|████████  | 1600/2000 [24:22<05:42,  1.17it/s]

[24m 22s (1600 80%) 1.4679]
whereatthat yhard i say brings my sir sir descendage exit he must cheerful simpleit his colours a pole an 



 85%|████████▌ | 1700/2000 [25:50<04:26,  1.13it/s]

[25m 50s (1700 85%) 1.4811]
where havious world you havethats to him that suffer the cunning to this and shdust andwhose than whom hi 



 90%|█████████ | 1800/2000 [27:17<03:11,  1.04it/s]

[27m 17s (1800 90%) 1.4574]
where being ere the change him and much thou shalt no westmorle it and things to make him i made the heav 



 95%|█████████▌| 1900/2000 [28:44<01:24,  1.19it/s]

[28m 44s (1900 95%) 1.4961]
where no sir of the action what hoit shall be me where is can pursue me and when i be roman thou from my  



100%|██████████| 2000/2000 [30:07<00:00,  1.11it/s]

[30m 7s (2000 100%) 1.4760]
where a selfof the falls from his hand and the duke pedro scived paris pardon that the bosom hearize but  

Saving...
Saved as shake2.pt





In [None]:
decoder = torch.load('shake2.pt')
generate(decoder, 'where', 100, 0.8)

'where i am a man i am sworn up and soldiers for the better fallswe him in her virtues to london that hath'

The training loss after 2000 epochs is 1.45, which is less than the vanilla RNN model. In that case, use GRU can help the model to train faster.

# Use small dataset of Paul Graham Essays 

In [None]:
with open('pg_essay.txt') as f:
  text = f.read()

# Remove non-alphabetical characters, lowercase, and replace whitespace with ' '
raw_dataset = ' '.join(re.sub('[^A-Za-z ]+','', text).lower().split())
# Maps token index to character
idx_to_char = list(set(raw_dataset))
# Maps character to token index
char_to_idx = dict([(char, i) for i, char in enumerate(idx_to_char)])
# Tokenize the dataset
corpus_indices = [char_to_idx[char] for char in raw_dataset]

In [None]:
filename = "pg_essay.txt"
model="gru"
n_epochs=2000
print_every=100
hidden_size=100
n_layers=2
learning_rate=0.01
chunk_len=200
batch_size=100


file = raw_dataset
file_len = len(file)

def random_training_set(chunk_len, batch_size):
    inp = torch.LongTensor(batch_size, chunk_len)
    target = torch.LongTensor(batch_size, chunk_len)
    for bi in range(batch_size):
        start_index = random.randint(0, file_len - chunk_len)
        end_index = start_index + chunk_len + 1
        chunk = file[start_index:end_index]
        inp[bi] = char_tensor(chunk[:-1])
        target[bi] = char_tensor(chunk[1:])
    inp = Variable(inp)
    target = Variable(target)
    inp = inp.to(device)
    target = target.to(device)
    return inp, target

def train(inp, target):
    hidden = decoder.init_hidden(batch_size)
    hidden = hidden.to(device)
    decoder.zero_grad()
    loss = 0

    for c in range(chunk_len):
        output, hidden = decoder(inp[:,c], hidden)
        loss += criterion(output.view(batch_size, -1), target[:,c])

    loss.backward()
    decoder_optimizer.step()

    return loss.data / chunk_len

def save():
    save_filename = 'pg.pt'
    torch.save(decoder, save_filename)
    print('Saved as %s' % save_filename)

# Initialize models and start training

decoder = CharRNN(
    27,
    hidden_size,
    27,
    model,
    n_layers,
)
decoder_optimizer = torch.optim.Adam(decoder.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss()


decoder.to(device)

start = time.time()
all_losses = []
loss_avg = 0

try:
    print("Training for %d epochs..." % n_epochs)
    for epoch in tqdm(range(1, n_epochs + 1)):
        loss = train(*random_training_set(chunk_len, batch_size))

        if epoch % print_every == 0:
            print('[%s (%d %d%%) %.4f]' % (time_since(start), epoch, epoch / n_epochs * 100, loss))
            print(generate(decoder, 'where', 100, 0.8), '\n')

    print("Saving...")
    save()

except KeyboardInterrupt:
    print("Saving before quit...")
    save()

Training for 2000 epochs...


  5%|▌         | 100/2000 [01:31<29:42,  1.07it/s]

[1m 31s (100 5%) 1.5827]
where the seincy looks and their ungeith or one a natence in existing i evice to rights programs in veste 



 10%|█         | 200/2000 [03:03<27:38,  1.09it/s]

[3m 3s (200 10%) 1.4205]
where meeting on the later is this it was they dont iscoop college the day like fand bgsspect but in many 



 15%|█▌        | 300/2000 [04:35<26:52,  1.05it/s]

[4m 35s (300 15%) 1.4184]
whereas i nerrea in the most reas that hard are they were founders were ingoot what supportun al seems to 



 20%|██        | 400/2000 [06:07<25:12,  1.06it/s]

[6m 7s (400 20%) 1.3583]
where as true of the professors in google for the working to partment often a grants than word startup wo 



 25%|██▌       | 500/2000 [07:39<23:29,  1.06it/s]

[7m 38s (500 25%) 1.3389]
where no pay but it are couldi paydes and working it one will tend to something the describers which mone 



 30%|███       | 600/2000 [09:12<19:27,  1.20it/s]

[9m 12s (600 30%) 1.3191]
where the most companies of their schools have or stockbnowerblockquote its trying to advertives its chan 



 35%|███▌      | 700/2000 [10:35<18:06,  1.20it/s]

[10m 35s (700 35%) 1.3279]
where the developers wants for it only a round mas ask to work its in be suck the only applery the idea o 



 40%|████      | 800/2000 [11:59<17:31,  1.14it/s]

[11m 59s (800 40%) 1.2921]
whereor a startup that in the sinds of the bestinclities or with early comes to refut it and more words b 



 45%|████▌     | 900/2000 [13:23<16:12,  1.13it/s]

[13m 23s (900 45%) 1.3235]
where we would be technology they dont have to discodiences the two big companyin the night has writing t 



 50%|█████     | 1000/2000 [14:47<13:45,  1.21it/s]

[14m 47s (1000 50%) 1.3139]
where would we may look a him that that and all the people and since so predict witz is it you so the bal 



 55%|█████▌    | 1100/2000 [16:11<12:44,  1.18it/s]

[16m 11s (1100 55%) 1.3198]
whereer on to get for every investments would be find the increasing it that firms the most im must rous  



 60%|██████    | 1200/2000 [17:34<11:22,  1.17it/s]

[17m 34s (1200 60%) 1.3118]
where we were being a new jobs which means imost or him they have the number of vcs the whole version to  



 65%|██████▌   | 1300/2000 [18:59<09:55,  1.17it/s]

[18m 59s (1300 65%) 1.3189]
where about designers but is almost such about time or trends that approomption in a company down they co 



 70%|███████   | 1400/2000 [20:22<08:27,  1.18it/s]

[20m 22s (1400 70%) 1.3173]
where it works as a company assee a common kind of clust is actually how for little it wouldnt have been  



 75%|███████▌  | 1500/2000 [21:45<06:59,  1.19it/s]

[21m 45s (1500 75%) 1.3163]
where and the more the most compete programmer vcs college is to ahead to conflict in this code indeedebu 



 80%|████████  | 1600/2000 [23:08<05:35,  1.19it/s]

[23m 8s (1600 80%) 1.2970]
where it wouldnt intellectual whole invest like special fatal notes a big funded in date these gotten of  



 85%|████████▌ | 1700/2000 [24:34<04:22,  1.14it/s]

[24m 34s (1700 85%) 1.3183]
where the famous people a buzzeem as well would be take wealth and for a remarket and new big companies o 



 90%|█████████ | 1800/2000 [25:59<02:48,  1.19it/s]

[25m 59s (1800 90%) 1.2434]
where they sad yet someone market befores now i does the worldial things and like quotes who has to live  



 95%|█████████▌| 1900/2000 [27:24<01:33,  1.07it/s]

[27m 24s (1900 95%) 1.2769]
where are always friends with by whatever better in something to say you really best velous of essay thin 



100%|██████████| 2000/2000 [28:48<00:00,  1.16it/s]

[28m 48s (2000 100%) 1.2993]
where now as forexample for matnow of the outcome they came and a company we find on the presocriate to p 

Saving...
Saved as pg.pt





In [None]:
decoder = torch.load('pg.pt')
generate(decoder, 'where', 100, 0.8)

'where a countried to definitely of the many better and when i dont experiment of the otherif a lot profit'

The training loss after 2000 epochs is 1.24. Since this time I also use the 

1.   List item
2.   List item

same GRU model, the reason for less training loss is becasue the new text dataset is much smaller than shakespare text, which causes the model train faster. 

# Acknowledgement
The code is adpated from: https://github.com/spro/char-rnn.pytorch

The Paul Graham dataset is downloaded from: https://www.kaggle.com/krsoninikhil/pual-graham-essays