Name: Mehran Sarmadi

Student ID :

In this exercise, you should develop a character-level RNN language model.

You are free to choose the architecture, but you must use GRUs and not LSTMs. A linear embedding layer (hidden size 64), a 2-layer GRU (hidden size 128, dropout 0.1), and a linear classifier head is an example architecture.

You should generate some example outputs using beam search.

Some parts of the code has been done for you. You need to implement the parts that raise `NotImplementedError`.

The index zero has been reserved for the padding token/character. By subtracting one from the token indices, the indices will become ASCII indices. (And the padding index will become `-1`.)

The model's classification head should directly predict ASCII characters (256 possibilities). It should not predict any special tokens, such as padding, start or end.

# Bootstrap

# Install

In [16]:
! pip install -U torch datasets numpy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


# Download the Data

In [17]:
!wget https://files.lilf.ir/Black%20Luminary.txt
# ! ls -lh
# ! realpath *.txt

data_paths = [
    # '/content/Black Luminary.txt',
    "./Black Luminary.txt"
    ]

--2023-05-09 18:48:34--  https://files.lilf.ir/Black%20Luminary.txt
Resolving files.lilf.ir (files.lilf.ir)... 82.102.11.148
Connecting to files.lilf.ir (files.lilf.ir)|82.102.11.148|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3148450 (3.0M) [text/plain]
Saving to: ‘Black Luminary.txt.1’


2023-05-09 18:48:37 (1.86 MB/s) - ‘Black Luminary.txt.1’ saved [3148450/3148450]



## imports

In [18]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import datasets as D
import numpy as np
import statistics
from pprint import pprint
import jax
import time

print(torch.cuda.is_available())
device = torch.device("cpu")

True


# Utils

In [19]:
class NumpyPrintOptions:
    def __init__(self, **kwargs):
        self.options = kwargs
        self.original_options = np.get_printoptions()

    def __enter__(self):
        np.set_printoptions(**self.options)

    def __exit__(self, exc_type, exc_value, traceback):
        np.set_printoptions(**self.original_options)

class NoTruncationNumpyPrintOptions(NumpyPrintOptions):
    def __init__(self):
        super().__init__(
            threshold=np.inf, 
            linewidth=200, 
            suppress=True, 
            precision=4
        )

def torch_shape_get(input):
    def h_shape_get(x):
        return x.dtype, x.shape

    return jax.tree_map(h_shape_get, input)


def has_nan(tensor):
    return torch.any(torch.isnan(tensor))


class ModelEvalMode:
    def __init__(self, model):
        self.model = model

    def __enter__(self):
        self.model.eval()

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.model.train()


def str_to_np(s, dtype=np.int8):
    s = s.encode('ascii', errors='ignore')
    return np.frombuffer(s, dtype=dtype)

def str_to_onehot(s):
    return np.eye(256)[str_to_np(s)]


def p(message, length=80, symbol="=", newline=True):
    length = 80
    num_equals = (length - len(message) - 2) // 2
    output = symbol * num_equals + " " + message + " " + symbol * num_equals
    if len(output) < length:
        output += symbol 
    if newline:
        output += "\n"
    return output

In [20]:
print(str_to_np("hello"))
print(str_to_onehot("hello"))


[104 101 108 108 111]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]


# Data

In [21]:
d = D.load_dataset("text", data_files=data_paths, sample_by="paragraph")
print(p("Loaded dataset using Datasets") , d)

d = d['train']
print(p("Train part of dataset"), d)
print(p("sample text"), d[1000])



  0%|          | 0/1 [00:00<?, ?it/s]

 DatasetDict({
    train: Dataset({
        features: ['text'],
        num_rows: 18423
    })
})
 Dataset({
    features: ['text'],
    num_rows: 18423
})
 {'text': 'Professor Snape threw him backwards, and Harry stumbled, but just managed to keep standing.'}


In [22]:
dc = d.map(lambda batch: {'input': [str_to_np(t).astype(np.int32) + 1 for t in batch['text']]}, batched=True) #: added one to the char indices to make zero available for the pad token
print(p("Add 'input' column (numerical)"), dc)
print(p("Single sample"), dc[0])

dc = dc.filter(lambda x: (len(x['input']) > 30 and len(x['text'].split()) > 4), batched=False)
dc.set_format("torch", columns=["input",])
print(p("Remove short sentences"), dc)

dc = dc.shuffle()
dcs = dc.train_test_split(test_size=0.2)
print(p("Split train to train and test"), dcs)

dt = dcs['train']



 Dataset({
    features: ['text', 'input'],
    num_rows: 18423
})
 {'text': 'Black Luminary', 'input': [67, 109, 98, 100, 108, 33, 77, 118, 110, 106, 111, 98, 115, 122]}
 Dataset({
    features: ['text', 'input'],
    num_rows: 16371
})
 DatasetDict({
    train: Dataset({
        features: ['text', 'input'],
        num_rows: 13096
    })
    test: Dataset({
        features: ['text', 'input'],
        num_rows: 3275
    })
})


# Model

- [GRU --- PyTorch 2.0 documentation](https://pytorch.org/docs/stable/generated/torch.nn.GRU.html)

- [torch.nn.utils.rnn.pack_sequence --- PyTorch 2.0 documentation](https://pytorch.org/docs/stable/generated/torch.nn.utils.rnn.pack_sequence.html#torch.nn.utils.rnn.pack_sequence) (not necessarily needed)

- [Embedding --- PyTorch 2.0 documentation](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html)

- [torch.nn.utils.rnn.pad_sequence --- PyTorch 2.0 documentation](https://pytorch.org/docs/stable/generated/torch.nn.utils.rnn.pad_sequence.html)


In [23]:
class Model(nn.Module):
    def __init__(self):
        super().__init__()

        self.embed = nn.Embedding(num_embeddings=257, embedding_dim=64)
        self.GRU = nn.GRU(input_size=64, hidden_size=128, num_layers=2, batch_first=True)
        self.Classifier = nn.Linear(in_features=128, out_features=257)
        
        
    def forward(self, x, hidden=None):
        if isinstance(x, list):
          x = nn.utils.rnn.pad_sequence(x, padding_value=0, batch_first=True) 
          
        x = self.embed(x)           
        x, hidden = self.GRU(x, hidden) 
        x = self.Classifier(x)          
        return x, hidden.detach()  

In [24]:
def shift_left(tensor_list, pad_value=0.0):
    shifted_tensors = []
    for tensor in tensor_list:
      tensor1 = tensor[1:].to(device)
      tensor2 = torch.tensor([0]).to(device)
      shifted = torch.cat([tensor1, tensor2])
      shifted_tensors.append(shifted)
    
    return shifted_tensors

# Example usage:
input_ids = [torch.tensor([1, 2, 3, 4]), torch.tensor([5, 6, 7, 8])]
print("Input Ids:")
print(input_ids)

target_ids = shift_left(input_ids)
print("Shifted Left (Target Ids):")
print(target_ids)

Input Ids:
[tensor([1, 2, 3, 4]), tensor([5, 6, 7, 8])]
Shifted Left (Target Ids):
[tensor([2, 3, 4, 0]), tensor([6, 7, 8, 0])]


# Beam Search Generation

In [25]:
import torch
import heapq

def tensor_to_string(tensor):
    chars = [chr(c) for c in tensor]
    return ''.join(chars)

def tensor_append_scalar(tensor, scalar):
    scalar_tensor = torch.tensor(scalar).view(1)  
    scalar_tensor = scalar_tensor.to(device)

    # Append the scalar to the original tensor
    result = torch.cat((tensor, scalar_tensor), dim=0)
    return result


def generate_next_top_k(model, input_sequence, k):
    logits, _ = model.forward([input_sequence])
    logits = logits[0, -1, :]
    # ic(torch_shape_get(logits))
    
    probabilities = torch.softmax(logits, dim=-1)
    # ic(torch_shape_get(probabilities))

    top_k_values, top_k_indices = torch.topk(probabilities, k)

    # return [(tensor_append_scalar(input_sequence, idx.item() + 1), log_prob.item()) for idx, log_prob in zip(top_k_indices, top_k_values.log())]
    return [(tensor_append_scalar(input_sequence, idx.item()), log_prob.item()) for idx, log_prob in zip(top_k_indices, top_k_values.log())]

def beam_search(model, desired_length, starting_string, k=5):
    with ModelEvalMode(model), torch.no_grad():
      input_sequence = torch.tensor(str_to_np(starting_string).astype(np.int32) + 1, dtype=torch.long)
      input_sequence = input_sequence.to(device)
      # ic(torch_shape_get(input_sequence))
      
      log_prob = 0.0

      beam = [(input_sequence, log_prob)]

      while len(beam[0][0]) < desired_length:
          new_beam = []
          for seq, log_prob in beam:
              next_top_k = generate_next_top_k(model, seq, k)
              new_beam.extend([(new_seq, new_log_prob + log_prob) for new_seq, new_log_prob in next_top_k])

          beam = heapq.nlargest(k, new_beam, key=lambda x: x[1])

      return [tensor_to_string(seq - 1) for seq, _ in beam]

In [26]:
harry_int = str_to_np("Harry ").astype(np.int32) + 1
harry_str = tensor_to_string(harry_int - 1)
print(harry_int)
print(harry_str, "|")

[ 73  98 115 115 122  33]
Harry  |


In [27]:
def eval_gen(*args, display=999999, **kwargs):
    generated_texts = beam_search(
        *args, **kwargs,
    )

    for idx, text in enumerate(generated_texts):
        if idx >= display:
            break
        
        print(f"Generated text {idx + 1}: {text}")

In [28]:
len_dataset = len(dt)

# Train

In [29]:
#: Training Loop
if torch.cuda.is_available():
    device = 'cuda'
    non_blocking = True
elif True:
    device = 'cpu'
    non_blocking = False
else:
    #: causes NaNs
    device = 'mps'
    non_blocking = False

#: Feel free to edit these hyperparameters or the optimizer
#: You might want to use a learning-rate scheduler, such as
#: [ReduceLROnPlateau --- PyTorch 2.0 documentation](https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.ReduceLROnPlateau.html)

epochs = 400
batch_size = 4096
learning_rate = 0.01
max_len = 0
trunc_size = 64

m = Model().to(device=device, non_blocking=non_blocking)

m.train()
optimizer = torch.optim.AdamW(m.parameters(), lr=learning_rate)
criterion = nn.CrossEntropyLoss(reduction="sum", ignore_index=0)
counter = 0

for epoch in range(epochs):

    dt = dt.shuffle()
    epoch_loss = 0 
    batch_counter = 0

    for i in range(0, len(dt), batch_size):

        
        batch = dt[i:i + batch_size]
        inputs = batch['input']
        current_batch_size = len(inputs)
        batch_counter += 1

        if max_len > 0:
            inputs = list(map(lambda x: x[:max_len] if len(x) > max_len else x, inputs))

        lens = [len(seq) for seq in inputs]
        current_max_len = max(lens)
        mean_len = statistics.mean(lens)

        targets = shift_left(inputs)
        inputs = nn.utils.rnn.pad_sequence(inputs, padding_value=0, batch_first=True)   
        targets = nn.utils.rnn.pad_sequence(targets, padding_value=0, batch_first=True) 

        hidden = None
        trunc_loss = 0
        
        n_sub = 0 
        for trunc in range(0, inputs.size(0), trunc_size):
          
            trunc_out = inputs[:, trunc: trunc + trunc_size]
            trunc_targets = targets[:, trunc: trunc + trunc_size]
            current_trunc_size = trunc_out.size(1)
            
            if current_trunc_size < trunc_size:
              break
              
            n_sub += 1
            trunc_out = trunc_out.to(device)
            trunc_targets = trunc_targets.to(device)
            trunc_outputs, hidden = m(trunc_out, hidden)
            
          
            trunc_outputs = trunc_outputs.contiguous().view(-1, 257) 
            trunc_targets = trunc_targets.contiguous().view(-1)
            loss = criterion(trunc_outputs, trunc_targets)
            trunc_loss += loss.item()
            loss.backward() 
            
        optimizer.step()
        optimizer.zero_grad()
            
        epoch_loss += trunc_loss
        batch_loss = trunc_loss / current_batch_size
        
        
        print(f"batch_loss: {batch_loss:>7f} [{counter:>5d}, epoch={epoch}]")
        counter += 1
        
    l = epoch_loss / len(dt)
    
    
    print(f"Loss: {l:>7f}  [{counter-1:>5d}, epoch={epoch} finished!]")
    if epoch % 15 == 0:
        eval_gen(display=3, model=m, desired_length=100, starting_string="Harry ", k=32)



batch_loss: 1044.320143 [    0, epoch=0]
batch_loss: 945.257664 [    1, epoch=0]
batch_loss: 698.152567 [    2, epoch=0]
batch_loss: 598.620288 [    3, epoch=0]
Loss: 877.567868  [    3, epoch=0 finished!]
Generated text 1: Harry t t t t t t t t t t t t t t t t t  t  t t t t t t  t t  t t  t  t  t  t  t  t  t  t  t  t  t  
Generated text 2: Harry t t t t t t t t t t t t t t t t t  t  t t t t t t t  t t  t  t  t  t  t  t  t  t  t  t  t  t  
Generated text 3: Harry t t t t t t t t t t t t t t t t t  t  t t t t t t  t t  t t  t  t  t  t t  t  t  t t  t  t  t 
batch_loss: 576.735583 [    4, epoch=1]
batch_loss: 578.073069 [    5, epoch=1]
batch_loss: 588.726189 [    6, epoch=1]
batch_loss: 555.000443 [    7, epoch=1]
Loss: 579.563154  [    7, epoch=1 finished!]
batch_loss: 553.358115 [    8, epoch=2]
batch_loss: 547.890641 [    9, epoch=2]
batch_loss: 543.056475 [   10, epoch=2]
batch_loss: 537.080819 [   11, epoch=2]
Loss: 547.421772  [   11, epoch=2 finished!]
batch_loss: 537.181513 [   

In [30]:
eval_gen(display=3, model=m, desired_length=100, starting_string="Harry ", k=32)

Generated text 1: Harry couldn't help his grandfather had allowed towards Hermione with his grandfather had allowed th
Generated text 2: Harry couldn't help his grandfather had allowed towards Hermione with his grandfather had allowed to
Generated text 3: Harry couldn't help his grandfather had allowed towards Hermione with his grandfather had seemed to 


In [31]:
eval_gen(display=50, model=m, desired_length=250, starting_string="Harry ", k=100)

Generated text 1: Harry looked his grandfather had allowed towards Hermione with his grandfather had allowed that he couldn't help with his grandfather had allowed that he couldn't help with his grandfather. Harry couldn't help his grandfather had allowed through the 
Generated text 2: Harry looked his grandfather had allowed towards Hermione with his grandfather had allowed that he couldn't help with his grandfather had allowed that he couldn't help with his grandfather. Harry couldn't help his grandfather had allowed towards the 
Generated text 3: Harry looked his grandfather had allowed towards Hermione with his grandfather had allowed that he couldn't help with his grandfather had allowed that he couldn't help with his grandfather. Harry couldn't help his grandfather and Harry couldn't help 
Generated text 4: Harry looked his grandfather had allowed towards Hermione with his grandfather had allowed that he couldn't help with his grandfather had allowed that he couldn't help with hi

In [32]:
eval_gen(display=50, model=m, desired_length=250, starting_string="Arcturus ", k=100)

Generated text 1: Arcturus looked his grandfather had allowed towards Hermione with his grandfather had allowed towards Hermione. 'What happened?' asked Hermione with his grandfather had allowed towards Hermione with his grandfather had allowed towards Hermione with a
Generated text 2: Arcturus looked his grandfather had allowed towards Hermione with his grandfather had allowed towards Hermione. 'What happened?' asked Hermione with his grandfather had allowed towards Hermione with his grandfather had allowed towards Hermione. 'But 
Generated text 3: Arcturus stared his grandfather had allowed towards Hermione with his grandfather had allowed towards Hermione. 'What happened?' asked Hermione with his grandfather had allowed towards Hermione with his grandfather had allowed towards Hermione with a
Generated text 4: Arcturus stared his grandfather had allowed towards Hermione with his grandfather had allowed towards Hermione. 'What happened?' asked Hermione with his grandfather had allowe

In [33]:
eval_gen(display=50, model=m, desired_length=150, starting_string="Draco ", k=100)

Generated text 1: Draco looked his grandfather had allowed towards Hermione. 'What happened?' asked Hermione with his grandfather had allowed towards Hermione with the 
Generated text 2: Draco looked his grandfather had allowed towards Hermione. 'What happened?' asked Hermione, and Harry couldn't help with his grandfather had allowed t
Generated text 3: Draco looked his grandfather had allowed towards Hermione. 'What happened?' asked Hermione with his grandfather had allowed towards Hermione with his 
Generated text 4: Draco looked his grandfather had allowed towards Hermione. 'What happened?' asked Hermione with his grandfather had allowed towards Hermione with her 
Generated text 5: Draco looked his grandfather had allowed towards Hermione. 'What happened?' asked Hermione with his grandfather had allowed towards Hermione. 'That's 
Generated text 6: Draco looked his grandfather had allowed towards Hermione. 'What happened?' she asked with his grandfather had allowed that he couldn't h

In [34]:
eval_gen(display=50, model=m, desired_length=150, starting_string="Harry looked at ", k=100)

Generated text 1: Harry looked at his grandfather. 'What happened?' asked Hermione with his grandfather had allowed towards Hermione. 'What happened?' asked Hermione's 
Generated text 2: Harry looked at his grandfather. 'What happened?' she asked with his grandfather had allowed towards Hermione. 'But you know?' asked Hermione with the
Generated text 3: Harry looked at his grandfather. 'What happened?' she asked with his grandfather had allowed towards Hermione. 'But you know?' she asked, looking her 
Generated text 4: Harry looked at his grandfather. 'What happened?' asked Hermione with his grandfather had allowed towards Hermione. 'What happened?' she asked with a 
Generated text 5: Harry looked at his grandfather. 'What happened?' asked Hermione with his grandfather had allowed towards Hermione. 'What happened?' asked Hermione. '
Generated text 6: Harry looked at his grandfather. 'What happened?' asked Hermione with his grandfather had allowed towards Hermione. 'But you know?' asked