<a href="https://colab.research.google.com/github/pearpare/sherlock-lstm/blob/main/lstm_project_pt2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Elisabeth Kam (etk45) 

I decided to try again using a larger window size of 100 characters as the author of this exercise suggested. Also tried a model with three layers. I had to use a different notebook file because I ran out of the free GPU units, so I switched to my other account to finish this project. 

In [1]:
import torch 
import torch.nn as nn
import torch.optim as optim
import numpy as np
import torch.utils.data as data 

In [2]:
torch.cuda.is_available()

True

In [40]:
filename = "/sherlock.txt"
raw_txt = open(filename, 'r', encoding = 'utf-8').read()
sh_raw_txt = sh_raw_txt.lower()
sh_raw_txt = sh_raw_txt[:50000]
chars = sorted(list(set(raw_txt)))
char_to_int = dict((c, i) for i, c in enumerate(chars))

In [43]:
sh_raw_txt = sh_raw_txt.lower()
sh_raw_txt = sh_raw_txt[:50000]
chars = sorted(list(set(sh_raw_txt)))
char_to_int = dict((c, i) for i, c in enumerate(chars))

In [44]:
n_chars = len(sh_raw_txt)
n_vocab = len(chars)
print("Total characters: ", n_chars)
print("Total vocab: ", n_vocab)

Total characters:  50000
Total vocab:  44


In [46]:
#prepare the dataset of input to output pairs encoded as integers
char_seq_len = 100 #larger window size 
X_data = []
y_data = []

for i in range(0, n_chars - char_seq_len, 1):
    seq_in = sh_raw_txt[i:i + char_seq_len]
    seq_out = sh_raw_txt[i + char_seq_len]
    X_data.append([char_to_int[char] for char in seq_in])
    y_data.append(char_to_int[seq_out])
    
n_patterns = len(X_data)
print("Total patterns: ", n_patterns)

Total patterns:  49900


In [47]:
class bookModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.lstm = nn.LSTM(input_size=1, hidden_size=256, num_layers=2, batch_first=True, dropout = 0.2)
        self.dropout = nn.Dropout(0.2) #could try changing droput values for fun 
        self.linear = nn.Linear(256, n_vocab)
    def forward(self, x): 
        x, _ = self.lstm(x)
        # takes only the last output 
        x = x[:, -1, :]
        # produce output 
        x = self.linear(self.dropout(x))
        return x 

In [48]:
X = torch.tensor(X_data, dtype=torch.float32).reshape(n_patterns, char_seq_len, 1)
X = X / float(n_vocab)
y = torch.tensor(y_data)
print(X.shape, y.shape)

torch.Size([49900, 100, 1]) torch.Size([49900])


In [49]:
n_epochs = 50
batch_size = 128 
model = bookModel()
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# print(device)
model.to(device)

optimizer = optim.Adam(model.parameters())
loss_fn = nn.CrossEntropyLoss(reduction="sum")
loader = data.DataLoader(data.TensorDataset(X, y), shuffle = True, batch_size=batch_size)

best_model = None
best_loss = np.inf

for epoch in range(n_epochs):
    model.train()
    for X_batch, y_batch in loader: 
        y_pred = model(X_batch.to(device))
        loss = loss_fn(y_pred, y_batch.to(device))
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    #Validation Time
    model.eval()
    loss = 0
    with torch.no_grad():
        for X_batch, y_batch in loader:
            y_pred = model(X_batch.to(device))
            loss += loss_fn(y_pred, y_batch.to(device))
        if loss < best_loss:
            best_loss = loss
            best_model = model.state_dict()
        print("Epoch %d: Cross-entropy: %.3f" % (epoch, loss))
torch.save([best_model, char_to_int], "single-char2.pth")

Epoch 0: Cross-entropy: 146423.734
Epoch 1: Cross-entropy: 134507.641
Epoch 2: Cross-entropy: 129851.812
Epoch 3: Cross-entropy: 126100.938
Epoch 4: Cross-entropy: 122343.914
Epoch 5: Cross-entropy: 119056.297
Epoch 6: Cross-entropy: 115243.305
Epoch 7: Cross-entropy: 112570.562
Epoch 8: Cross-entropy: 109675.312
Epoch 9: Cross-entropy: 107265.148
Epoch 10: Cross-entropy: 104756.805
Epoch 11: Cross-entropy: 102448.195
Epoch 12: Cross-entropy: 99789.516
Epoch 13: Cross-entropy: 97889.625
Epoch 14: Cross-entropy: 96005.938
Epoch 15: Cross-entropy: 93885.320
Epoch 16: Cross-entropy: 91254.430
Epoch 17: Cross-entropy: 89040.836
Epoch 18: Cross-entropy: 87437.484
Epoch 19: Cross-entropy: 85348.969
Epoch 20: Cross-entropy: 83932.414
Epoch 21: Cross-entropy: 81658.484
Epoch 22: Cross-entropy: 79848.930
Epoch 23: Cross-entropy: 77898.703
Epoch 24: Cross-entropy: 76688.164
Epoch 25: Cross-entropy: 76028.609
Epoch 26: Cross-entropy: 73137.641
Epoch 27: Cross-entropy: 73364.773
Epoch 28: Cross-en

In [59]:
best_model, char_to_int, torch.load("single-char2.pth")
n_vocab = len(char_to_int)
int_to_char = dict((i, c) for c, i in char_to_int.items())
model.load_state_dict(best_model)

<All keys matched successfully>

In [60]:
#generate a prompt here 
file = "/sherlock.txt"
raw_txt2 = open(file, 'r', encoding = 'utf-8').read()
raw_txt2 = raw_txt2.lower()
raw_txt2 = raw_txt2[:50000]
seq_len = 100
start = np.random.randint(0, len(raw_txt2)-seq_len)
prompt = raw_txt2[start:start+seq_len]
pattern = [char_to_int[c] for c in prompt]

In [61]:
model.eval()
print("Prompt:")
print(prompt)
print("Prompt ends here.")
print("\n")
print("Result:")
with torch.no_grad():
  for i in range(1000):
    #format input array of int into pytorch tensor 
    x = np.reshape(pattern, (1, len(pattern), 1)) / float(n_vocab)
    x = torch.tensor(x, dtype=torch.float32)
    #genreate logits as output from the model 
    pred = model(x.to(device))
    #convert logits into one character
    index = int(pred.argmax())
    result = int_to_char[index]
    print(result, end="")
    #append the new character into the prompt for the next iteration
    pattern.append(index)
    pattern = pattern[1:]

print()
print("Done.")

Prompt:
over-precipitance may ruin all."

"and now?" i asked.

"our quest is practically finished. i shall c
Prompt ends here.


Result:
e bole in the matter which he had appalled the most in front of the seiple that i have have the lady of to may her hand, and i have not heard him in the soreet, and i have not seen be oe the lady of his cabres and have a coy of fire, and a gond coon think i have been toinge for the coorer of the street. 
"the court so seruon."

"then they wank to you."

"but you have the han shere i was a lareeine she hing of the seryer. i had been told the lady of his cabres and have a coy of fire, and a gond coon think i have been tooe that i was a wars stice in a sery sitel of his oonmcers. but it is aloosmanion foo the cherr and leuter iis fand and at the house and suiftlar thise oaster of it as once to see me the lady, aut it is and lades of the sort seisle she shoule brongh mot at once to the lett this would be anl arirneting her fard, and i have not seen be an 

I think the model did better with the larger window size. More words are spelled correctly and the sentence meanings are a little easier to parse. 

In [30]:
class book2Model(nn.Module): #created model with 3 layers 
    def __init__(self):
        super().__init__()
        self.lstm = nn.LSTM(input_size=1, hidden_size=256, num_layers=3, batch_first=True, dropout = 0.2)
        self.dropout = nn.Dropout(0.2) #could try changing droput values for fun 
        self.linear = nn.Linear(256, n_vocab)
    def forward(self, x): 
        x, _ = self.lstm(x)
        # takes only the last output 
        x = x[:, -1, :]
        # produce output 
        x = self.linear(self.dropout(x))
        return x 

In [31]:
n_epochs = 50
batch_size = 128 
model = book2Model()
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# print(device)
model.to(device)

optimizer = optim.Adam(model.parameters())
loss_fn = nn.CrossEntropyLoss(reduction="sum")
loader = data.DataLoader(data.TensorDataset(X, y), shuffle = True, batch_size=batch_size)

best_model = None
best_loss = np.inf

for epoch in range(n_epochs):
    model.train()
    for X_batch, y_batch in loader: 
        y_pred = model(X_batch.to(device))
        loss = loss_fn(y_pred, y_batch.to(device))
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    #Validation Time
    model.eval()
    loss = 0
    with torch.no_grad():
        for X_batch, y_batch in loader:
            y_pred = model(X_batch.to(device))
            loss += loss_fn(y_pred, y_batch.to(device))
        if loss < best_loss:
            best_loss = loss
            best_model = model.state_dict()
        print("Epoch %d: Cross-entropy: %.3f" % (epoch, loss))
torch.save([best_model, char_to_int], "single-char-3.pth")

Epoch 0: Cross-entropy: 150608.031
Epoch 1: Cross-entropy: 132858.734
Epoch 2: Cross-entropy: 126033.406
Epoch 3: Cross-entropy: 119851.484
Epoch 4: Cross-entropy: 114205.070
Epoch 5: Cross-entropy: 109782.203
Epoch 6: Cross-entropy: 105523.758
Epoch 7: Cross-entropy: 101148.367
Epoch 8: Cross-entropy: 98283.305
Epoch 9: Cross-entropy: 94372.414
Epoch 10: Cross-entropy: 91456.328
Epoch 11: Cross-entropy: 89205.617
Epoch 12: Cross-entropy: 85368.188
Epoch 13: Cross-entropy: 81939.625
Epoch 14: Cross-entropy: 79257.695
Epoch 15: Cross-entropy: 76299.867
Epoch 16: Cross-entropy: 73443.766
Epoch 17: Cross-entropy: 72205.406
Epoch 18: Cross-entropy: 68441.062
Epoch 19: Cross-entropy: 65744.578
Epoch 20: Cross-entropy: 63983.066
Epoch 21: Cross-entropy: 61112.594
Epoch 22: Cross-entropy: 58649.578
Epoch 23: Cross-entropy: 56628.828
Epoch 24: Cross-entropy: 54666.328
Epoch 25: Cross-entropy: 51987.824
Epoch 26: Cross-entropy: 50582.551
Epoch 27: Cross-entropy: 48712.469
Epoch 28: Cross-entrop

In [32]:
best_model, char_to_int, torch.load("single-char-3.pth")
n_vocab = len(char_to_int)
int_to_char = dict((i, c) for c, i in char_to_int.items())
model.load_state_dict(best_model)

<All keys matched successfully>

In [38]:
#generate a prompt here 
file3 = "/sherlock.txt"
raw_txt3 = open(file, 'r', encoding = 'utf-8').read()
raw_txt3 = raw_txt3.lower()
raw_txt3 = raw_txt3[:50000]
seq_len = 100
start = np.random.randint(0, len(raw_txt3)-seq_len)
prompt = raw_txt3[start:start+seq_len]
pattern = [char_to_int[c] for c in prompt]

In [39]:
model.eval()
print("Prompt:")
print(prompt)
print("Prompt ends here.")
print("\n")
print("Result:")
with torch.no_grad():
  for i in range(1000):
    #format input array of int into pytorch tensor 
    x = np.reshape(pattern, (1, len(pattern), 1)) / float(n_vocab)
    x = torch.tensor(x, dtype=torch.float32)
    #genreate logits as output from the model 
    pred = model(x.to(device))
    #convert logits into one character
    index = int(pred.argmax())
    result = int_to_char[index]
    print(result, end="")
    #append the new character into the prompt for the next iteration
    pattern.append(index)
    pattern = pattern[1:]

print()
print("Done.")

Prompt:
 lucky appearance
saved the bridegroom from having to sally out into the streets in
search of a best
Prompt ends here.


Result:
 man. bnd the has the fielt sespous of iis own high-power
lenses, would bolngs has to be to mittle metters to me to be ro a creat belicate that i have made myself clear?"

"i am to be neanly give myst my friend's amazing powers of importance to le to be ro rilence for the pest roint in the conningst of the most singular that the would has sriee and laughed again, in the count von kramm."

"then i should have thought a little more. j had not in the past which he had apparently has been myst be an alieied."

"to i have not seen, bnd i not be bought under hy wiich i evpected. it was a lews in the oart which has been wayled in the morning. and the world has seen, but as a
lover he would have that he will be of the ouher, while a lews then her husband by the ttreet. 

"mre iade!and a sueft little prince of a lettle maneau which had been lauely. yhich ie do

I think this model the performance was a little better. The sentences are clearer and I can see that where the model got confused with some of the vowels. 