## Homework 4: Neural Language Models (& 🎃 SpOoKy 👻 authors 🧟 data) - Task 3


### Names

---

Names: Jason Cheung, Robert Levin


## Task 3: Feedforward Neural Language Model (80 points)

For this task, you will create and train neural LMs for both your word-based embeddings and your character-based ones. You should write functions when appropriate to avoid excessive copy+pasting.


In [1]:
# import your libraries here

import numpy as np

# if you want fancy progress bars
from tqdm.autonotebook import tqdm

# Remember to restart your kernel if you change the contents of this file!
import neurallm_utils as nutils

import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader, random_split
import torch.optim as optim

# This function gives us nice print-outs of our models.
from torchinfo import summary

  from tqdm.autonotebook import tqdm
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\blevi\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to
[nltk_data]     C:\Users\blevi\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


### a) First, encode your text into integers (5 points)


In [2]:
# Edit constants as you would like.
EMBEDDINGS_SIZE = 50
NGRAM = 3
NUM_SEQUENCES_PER_BATCH = 128

TRAIN_FILE = 'spooky_author_train.csv' # The file to train your language model on
OUTPUT_WORDS = 'generated_wordbased.txt' # The file to save your generated sentences for word-based LM
OUTPUT_CHARS = 'generated_charbased.txt' # The file to save your generated sentences for char-based LM

# you can update these file names if you want to depending on how you are exploring 
# hyperparameters
EMBEDDING_SAVE_FILE_WORD = f"spooky_embedding_word_{EMBEDDINGS_SIZE}.model" # The file to save your word embeddings to
EMBEDDING_SAVE_FILE_CHAR = f"spooky_embedding_char_{EMBEDDINGS_SIZE}.model" # The file to save your char embeddings to
MODEL_FILE_WORD = f'spooky_author_model_word_{NGRAM}.pt' # The file to save your trained word-based neural LM to
MODEL_FILE_CHAR = f'spooky_author_model_char_{NGRAM}.pt' # The file to save your trained char-based neural LM to


In [3]:
# load your word vectors that you made in your previous notebook AND 
# use the create_embedder function to make your pytorch embedder

word_embeddings = nutils.load_word2vec(EMBEDDING_SAVE_FILE_WORD)
char_embeddings = nutils.load_word2vec(EMBEDDING_SAVE_FILE_CHAR)
word_embedder = nutils.create_embedder(word_embeddings)
char_embedder = nutils.create_embedder(char_embeddings)

In [26]:
# you'll also need to re-load your text data

word_data = nutils.read_file_spooky(TRAIN_FILE, NGRAM)
char_data = nutils.read_file_spooky(TRAIN_FILE, NGRAM, by_character=True)
print(len(word_data), len(char_data))

19579 19579


In [5]:
# This function is used to vectorize a text corpus. 
# Here, it creates a mapping from word to that word's unique index.

# Hint: use one of the dicts from your embedding function.

def encode_tokens(data: list[list[str]], embedder: torch.nn.Embedding) -> list[list[int]]:
    """
    Replaces each natural-language token with its embedder index.

    e.g. [["<s>", "once", "upon", "a", "time"],
          ["there", "was", "a", ]]
        ->
        [[0, 59, 203, 1, 126],
         [26, 15, 1]]
        (The indices are arbitrary, as they are dependent on your embedder)

    Params:
        data: The corpus
        embedder: An embedder trained on the given data.
    """
    encoded_tokens = []

    for sentence in data:
        encoded_sentence = []
        for token in sentence:
            if token in embedder.token_to_index:
                encoded_sentence.append(embedder.token_to_index[token])
        encoded_tokens.append(encoded_sentence)

    return encoded_tokens

In [6]:
# encode your data from tokens to integers for both word and char embeddings

word_encoded = encode_tokens(word_data, word_embedder)
char_encoded = encode_tokens(char_data, char_embedder)

print(word_encoded[:2])

[[3, 3, 31, 2959, 0, 154, 0, 1405, 26, 43, 308, 2, 7506, 1, 2542, 2, 13, 4789, 14, 20, 8, 88, 192, 55, 4446, 0, 6, 306, 7, 1, 258, 2022, 8, 337, 84, 0, 145, 134, 905, 2, 1, 323, 14, 45, 1452, 5098, 109, 1, 442, 5, 4, 4], [3, 3, 15, 100, 135, 742, 7, 26, 12, 1, 6015, 88, 33, 9, 432, 2388, 5, 4, 4]]


In [7]:
# print out the size of the mappings for each of your embedders.
# these should match the vocab sizes you calculated in Task 2

print(f"Word embedder size: {len(word_embedder.token_to_index)}")
print(f"Char embedder size: {len(char_embedder.token_to_index)}")

Word embedder size: 25374
Char embedder size: 60


### b) Next, prepare the sequences to train your model from text (2 points)


#### Fixed n-gram based sequences

The training samples will be structured in the following format.
Depening on which ngram model we choose, there will be (n-1) tokens
in the input sequence (X) and we will need to predict the nth token (y).

Example: this process however afforded me

Would become:

```
X
[[this,    process]
[process, however]
[however, afforded]]

y
[however,
afforded,
me]
```

Our first step is to generate n-grams like we have always been doing. We'll just do this
on our encoded data instead of the raw text. (Feel free to consult your past HW here).


In [8]:
def generate_ngram_training_samples(encoded: list[list[int]], ngram: int) -> list:
    """
    Takes the **encoded** data (list of lists of ints) and 
    generates the training samples out of it.
    
    Parameters:
        up to you, we've put in what we used
        but you can add/remove as needed
    return: 
    list of lists in the format [[x1, x2, ... , x(n-1), y], ...]
    """
    # if you'd like to use tqdm, you can use it like this:
    # for i in tqdm(range(len(encoded))):

    ngrams = []

    for sentence in encoded:
        for i in range(len(sentence) - ngram + 1):
            ngrams.append(sentence[i:i + ngram])

    return ngrams


In [9]:
# generate your training samples for both word and character data
# print out the first 5 training samples for each
# we have displayed the number of sequences
# to expect for both characters and words
#
# Spooky data by words shoud give 634080 sequences
# [0, 0, 31]
# [0, 31, 2959]
# [31, 2959, 2]
# ...

# Spooky data by character should give 2957553 sequences
# [20, 20, 2]
# [20, 2, 8]
# [2, 8, 6]
# ...

# print out the first 5 training samples for each and make sure that the 
# windows are sliding one word at a time. These should be integers!
# make sure that they map to the correct words in your vocab
# Hint: what word maps to token 0?

word_training_samples = generate_ngram_training_samples(word_encoded, NGRAM)
print(f"Word sequences: {len(word_training_samples)}")
for i in range(5):
    print(word_training_samples[i])

char_training_samples = generate_ngram_training_samples(char_encoded, NGRAM)
print(f"Char sequences: {len(char_training_samples)}")
for i in range(5):
    print(char_training_samples[i])

Word sequences: 634080
[3, 3, 31]
[3, 31, 2959]
[31, 2959, 0]
[2959, 0, 154]
[0, 154, 0]
Char sequences: 2957553
[25, 25, 2]
[25, 2, 8]
[2, 8, 6]
[8, 6, 7]
[6, 7, 0]


### c) Then, split the sequences into X and y and create a DataLoader (10 points)


In [10]:
# Note here that each sequence we've created so far is in the form:
# sequence = [x1, x2, ... , x(n-1), y]
# We still need to separate them into [[x1, x2, ... , x(n-1)], ...], [y1, y2, ...]]
# do that here for both word and character data
# you can write a function to do this if you'd like (not required, might be helpful)

def separate_x_y(training_samples: list) -> tuple:
    X = []
    y = []

    for ngram in training_samples:
        X.append(ngram[:-1])
        y.append(ngram[-1])

    return X, y

word_X, word_Y = separate_x_y(word_training_samples)
char_X, char_Y = separate_x_y(char_training_samples)

# print out the shapes (or lengths to know how many sequences there are and how many
# elements each sub-list has) for word-based to verify that they are correct

# print out the shapes for char-based to verify that they are correct

print(f"Word X shape: ({len(word_X)}, {len(word_X[0])})")
print(f"Word Y length: ({len(word_Y)})")
print(f"Char X shape: ({len(char_X)}, {len(char_X[0])})")
print(f"Char Y length: ({len(char_Y)})")

Word X shape: (634080, 2)
Word Y length: (634080)
Char X shape: (2957553, 2)
Char Y length: (2957553)


In [11]:
def create_dataloaders(X: list, y: list, num_sequences_per_batch: int, 
                       test_pct: float = 0.1, shuffle: bool = True) -> tuple[torch.utils.data.DataLoader]:
    """
    Convert our data into a PyTorch DataLoader.    
    A DataLoader is an object that splits the dataset into batches for training.
    PyTorch docs: 
        https://pytorch.org/tutorials/beginner/basics/data_tutorial.html
        https://pytorch.org/docs/stable/data.html

    Note that you have to first convert your data into a PyTorch DataSet.
    You DO NOT have to implement this yourself, instead you should use a TensorDataset.

    You are in charge of splitting the data into train and test sets based on the given
    test_pct. There are several functions you can use to acheive this!

    The shuffle parameter refers to shuffling the data *in the loader* (look at the docs),
    not whether or not to shuffle the data before splitting it into train and test sets.
    (don't shuffle before splitting)

    Params:
        X: A list of input sequences
        Y: A list of labels
        num_sequences_per_batch: Batch size
        test_pct: The proportion of samples to use in the test set.
        shuffle: INSTRUCTORS ONLY

    Returns:
        One DataLoader for training, and one for testing.
    """
    X_tensor = torch.tensor(X)
    y_tensor = torch.tensor(y)
    dataset = TensorDataset(X_tensor, y_tensor)
    train_dataset, test_dataset = random_split(dataset, [1 - test_pct, test_pct])
    train_loader = DataLoader(train_dataset, batch_size=num_sequences_per_batch, shuffle=shuffle)
    test_loader = DataLoader(test_dataset, batch_size=num_sequences_per_batch, shuffle=shuffle)

    return train_loader, test_loader

### some definitions:

- a single **batch** is the number of sequences that your model will evaluate at once when it learns
- **steps per epoch** is the number of batches that your model will see in a single epoch (one pass through the data)-- your NUM_SEQUENCES_PER_BATCH constant is the batch size--you won't need this for pytorch but it's useful to know


In [12]:
# initialize your dataloaders for both word and character data
# print out the shapes of the first batch to verify that it is 
# correct for both word and character data
# note that your train data and your test data should have the same shapes!
# print enough information to verify that the shapes are correct

word_train_loader, word_test_loader = create_dataloaders(word_X, word_Y, NUM_SEQUENCES_PER_BATCH)
char_train_loader, char_test_loader = create_dataloaders(char_X, char_Y, NUM_SEQUENCES_PER_BATCH)

# Examples:
# Normally you would loop over your dataloader, but we just want to get a single batch to test it out:
# Every time you call next, you advance to the next batch

word_batch = next(iter(word_train_loader))
print(len(word_batch), [len(x) for x in word_batch])

char_batch = next(iter(char_train_loader))
print(len(char_batch), [len(x) for x in char_batch])

2 [128, 128]
2 [128, 128]


### d) Define, train & save your models (25 points)

Write the code to train feedforward neural language models for both word embeddings and character embeddings make sure not to just copy + paste to train your two models (define functions as needed).

Define your model architecture using PyTorch layers and activation functions. When training, use the Adam optimizer (https://pytorch.org/docs/stable/generated/torch.optim.Adam.html) instead of sgd (https://pytorch.org/docs/stable/generated/torch.optim.SGD.html#torch.optim.SGD).

add cells as desired :)

Your FFNN should have the following architecture:

- It should be a two layer neural net (one hidden layer, one output layer)
- It should use ReLU as its activation function

Our biggest piece of advice--make sure that you understand what dimensions each layer needs to be!


In [13]:
# 10 points

class FFNN(nn.Module):
    """
    A class representing our implementation of a Feed-Forward Neural Network.
    You will need to implement two methods:
        - A constructor to set up the architecture and hyperparameters of the model
        - The forward pass
    """
    
    def __init__(self, vocab_size: int, ngram: int, embedding_layer: torch.nn.Embedding, hidden_units=128):
        """
        Initialize a new untrained model. 
        
        You can change these parameters as you would like.
        Once you get a working model, you are encouraged to
        experiment with this constructor to improve performance.
        
        Params:
            vocab_size: The number of words in the vocabulary
            ngram: The value of N for training and prediction.
            embedding_layer: The previously trained embedder. 
            hidden_units: The size of the hidden layer.
        """        
        super().__init__()
        # YOUR CODE HERE
        # we recommend saving the parameters as instance variables
        # so you can access them later as needed
        # (in addition to anything else you need to do here)
        self.vocab_size = vocab_size
        self.ngram = ngram
        self.embedding_layer = embedding_layer
        self.hidden_units = hidden_units
        self.fc1 = nn.Linear((ngram-1) * embedding_layer.embedding_dim, hidden_units)
        self.fc2 = nn.Linear(hidden_units, vocab_size)
        
    
    def forward(self, X: list) -> torch.tensor:
        """
        Compute the forward pass through the network.
        This is not a prediction, and it should not apply softmax.

        Params:
            X: the input data

        Returns:
            The output of the model; i.e. its predictions.
        
        """
        # YOUR CODE HERE
        X = torch.stack([self.embedding_layer(torch.tensor(x)) for x in X], dim=0)
        X = X.view(X.size(0), -1)
        X = torch.relu(self.fc1(X))
        X = self.fc2(X)
        
        return X

In [14]:
# 10 points

def train(dataloader, model, epochs: int = 1, lr: float = 0.001) -> None:
    """
    Our model's training loop.
    Print the cross entropy loss every epoch.
    You should use the Adam optimizer instead of SGD.

    When looking for documentation, try to stay on PyTorch's website.
    This might be a good place to start: https://pytorch.org/tutorials/beginner/introyt/trainingyt.html 
    They should have plenty of tutorials, and we don't want you to get confused from other resources.

    Params:
        dataloader: The training dataloader
        model: The model we wish to train
        epochs: The number of epochs to train for
        lr: Learning rate 
    """
    # YOUR CODE HERE
    # you will need to initialize an optimizer and a loss function, which you should do
    # before the training loop
    model.to(torch.device('cpu'))
    optimizer = optim.Adam(model.parameters(), lr=lr)
    criterion = nn.CrossEntropyLoss()

    # print out the epoch number and the current average loss after each epoch
    # you can use tqdm to print out a progress bar
    
    for epoch in range(epochs):
        model.train()
        total_loss = 0

        for X, y in tqdm(dataloader, desc=f"Epoch {epoch + 1}"):
            optimizer.zero_grad()
            
            outputs = model(X)
            
            loss = criterion(outputs, y)
            loss.backward()
            
            optimizer.step()
            
            total_loss += loss.item()
            
        avg_loss = total_loss / len(dataloader)
        tqdm.write(f"Epoch {epoch + 1}, loss: {avg_loss}")
        

For the next part, we're testing our model's functions so we can see if it works.
No need to do this on both the word and character data, just one is fine.


In [15]:
# Create your model
# Print out its architecture (use the imported summary function)

model = FFNN(len(word_embedder.token_to_index), NGRAM, word_embedder)
summary(model)

Layer (type:depth-idx)                   Param #
FFNN                                     --
├─Embedding: 1-1                         (2,537,400)
├─Linear: 1-2                            25,728
├─Linear: 1-3                            3,273,246
Total params: 5,836,374
Trainable params: 3,298,974
Non-trainable params: 2,537,400

In [16]:
# 5 points

# train your models for 1 epoch
# see timing information posted on Canvas!

# re-create your data loader fresh

char_train_loader, char_test_loader = create_dataloaders(char_X, char_Y, NUM_SEQUENCES_PER_BATCH)

# train your model

model = FFNN(len(char_embedder.token_to_index), NGRAM, char_embedder)
train(char_train_loader, model)

Epoch 1:   0%|          | 0/20796 [00:00<?, ?it/s]

  X = torch.stack([self.embedding_layer(torch.tensor(x)) for x in X], dim=0)


Epoch 1, loss: 2.0226176410517938


10. You're reporting the loss after each epoch of training. What is the loss for your model after 1 epoch?

- word or character-based? **Word**
- loss? **5.8**

Loss isn't accuracy, but it does tell us whether or not the model is improving over time. For character-based, loss after one epoch should be ~2.1; for word-based it is ~5.9.


### e) create a full pipeline (13 points)

We've made all the pieces that you'll need for a full pipeline, now let's package everything together nicely.


In [17]:
# 3 points

# make a function that does your full *training* pipeline
# This is essentially pulling the pieces that you've done so far earlier in this 
# notebook into a single function that you can call to train your model


def full_pipeline(data: list[list[str]], word_embeddings_filename: str, 
                batch_size:int = NUM_SEQUENCES_PER_BATCH,
                ngram:int = NGRAM, hidden_units = 128, epochs = 1,
                lr = 0.001, test_pct = 0.1,
                ) -> FFNN:
    """
    Run the entire pipeline from loading embeddings to training.
    You won't use the test set for anything.

    Params:
        data: The raw data to train on, parsed as a list of lists of tokens
        word_embeddings_filename: The filename of the Word2Vec word embeddings
        batch_size: The batch size to use
        hidden_units: The number of hidden units to use
        epochs: The number of epochs to train for
        lr: The learning rate to use
        test_pct: The proportion of samples to use in the test set.

    Returns:
        The trained model.
    """
    embeddings = nutils.load_word2vec(word_embeddings_filename)
    embedder = nutils.create_embedder(embeddings)
    
    encoded = encode_tokens(data, embedder)
    training_samples = generate_ngram_training_samples(encoded, ngram)
    X, y = separate_x_y(training_samples)
    train_loader, _ = create_dataloaders(X, y, batch_size, test_pct)
    
    model = FFNN(len(embedder.token_to_index), ngram, embedder, hidden_units)
    train(train_loader, model, epochs, lr)
    
    return model

In [18]:
# 10 points

# Use your full pipeline to train models on the word data and the character data.
# Feel free to add cells if you'd like to.

# Train your models however you'd like. Play around with number of epochs, learning rate, etc.
# Do whatever you'd like to for exploring hyperparameters.
# You aren't required to hit a certain loss, but you should leave code here that shows
# that you explored effects of changing at least two of the different hyperparameters
# Please don't change the architecture of the model (keep it a 2-layer model with 1 hidden layer)

# You'll likely want to do this exploration AFTER completing your prediction and generation code, so start
# with just training for 1 - 5 epochs with default params.


# Word-based takes Felix's computer 7 - 8 min for 5 epochs with default params running on CPU
# Char-based Felix's computer ~1min 30sec - 2min for 5 epochs with default params running on CPU


word_model = full_pipeline(word_data, EMBEDDING_SAVE_FILE_WORD, epochs=8, lr=0.0001)
char_model = full_pipeline(char_data, EMBEDDING_SAVE_FILE_CHAR, epochs=8, lr=0.0001)



Epoch 1:   0%|          | 0/4459 [00:00<?, ?it/s]

  X = torch.stack([self.embedding_layer(torch.tensor(x)) for x in X], dim=0)


Epoch 1, loss: 6.271387271106871


Epoch 2:   0%|          | 0/4459 [00:00<?, ?it/s]

Epoch 2, loss: 5.622374712765872


Epoch 3:   0%|          | 0/4459 [00:00<?, ?it/s]

Epoch 3, loss: 5.447047889219268


Epoch 4:   0%|          | 0/4459 [00:00<?, ?it/s]

Epoch 4, loss: 5.34396797649848


Epoch 5:   0%|          | 0/4459 [00:00<?, ?it/s]

Epoch 5, loss: 5.2665869573168385


Epoch 6:   0%|          | 0/4459 [00:00<?, ?it/s]

Epoch 6, loss: 5.203503576408296


Epoch 7:   0%|          | 0/4459 [00:00<?, ?it/s]

Epoch 7, loss: 5.149578691518047


Epoch 8:   0%|          | 0/4459 [00:00<?, ?it/s]

Epoch 8, loss: 5.102282924592214


Epoch 1:   0%|          | 0/20796 [00:00<?, ?it/s]

  X = torch.stack([self.embedding_layer(torch.tensor(x)) for x in X], dim=0)


Epoch 1, loss: 2.2016810917824503


Epoch 2:   0%|          | 0/20796 [00:00<?, ?it/s]

Epoch 2, loss: 2.050284528289765


Epoch 3:   0%|          | 0/20796 [00:00<?, ?it/s]

Epoch 3, loss: 2.0178929970261685


Epoch 4:   0%|          | 0/20796 [00:00<?, ?it/s]

Epoch 4, loss: 2.001170437326567


Epoch 5:   0%|          | 0/20796 [00:00<?, ?it/s]

Epoch 5, loss: 1.9904255258805432


Epoch 6:   0%|          | 0/20796 [00:00<?, ?it/s]

Epoch 6, loss: 1.9828955295178083


Epoch 7:   0%|          | 0/20796 [00:00<?, ?it/s]

Epoch 7, loss: 1.9771065505840935


Epoch 8:   0%|          | 0/20796 [00:00<?, ?it/s]

Epoch 8, loss: 1.9725846427636642


In [19]:
# when you're happy with them, save both models
# Feel free to play around with any hyperparameters you'd like

# using torch.save and the model's state_dict
torch.save(word_model.state_dict(), MODEL_FILE_WORD)
torch.save(char_model.state_dict(), MODEL_FILE_CHAR)

### f) Generate Sentences (25 points)

Now that you have trained models, you'll work on the generation piece. Note that because you saved your models, even if you have to re-start your kernel, you should be able to re-load them without having to re-train them again.


In [20]:
# load the models in again with code like:
model = FFNN(len(word_embedder.token_to_index), NGRAM, word_embedder)
model.load_state_dict(torch.load(MODEL_FILE_WORD))
# then switch the model into evaluation mode
model.eval()

  model.load_state_dict(torch.load(MODEL_FILE_WORD))


FFNN(
  (embedding_layer): Embedding(25374, 100)
  (fc1): Linear(in_features=200, out_features=128, bias=True)
  (fc2): Linear(in_features=128, out_features=25374, bias=True)
)

In [21]:
# 10 points 

# Create a function that predicts the next token in a sequence.
def predict(model, input_tokens: list[str]) -> str:
    """
    Get the model's next word prediction for an input.
    This is where you'll use the softmax function!
    Assume that the input tokens do not contain any unknown tokens.

    Params:
        model: Your trained model
        input_tokens: A list of natural-language tokens. Must be length N-1.

    Returns:
        The predicted token (not the predicted index!)
    """
    model.eval()  # Set the model to evaluation mode if you haven't already
    # YOUR CODE HERE

    with torch.no_grad():
        expected_length = NGRAM - 1
        if len(input_tokens) < expected_length:
            input_tokens = ["<s>"] * (expected_length - len(input_tokens)) + input_tokens
        
        input_indices = [word_embedder.token_to_index[token] for token in input_tokens]
        
        input_tensor = torch.tensor([input_indices])
        
        output = model(input_tensor)
        
        probs = torch.softmax(output, dim=-1)
        
        predicted_index = torch.multinomial(probs[0], num_samples=1).item()
        
        predicted_token = word_embedder.index_to_token[predicted_index]
        
    return predicted_token
        

In [22]:
# 10 points

# Generate a sequence from the model until you get an end of sentence token.
def generate(model, seed: list[str], max_tokens: int = None) -> list[str]:
    """
    Use the trained model to generate a sentence.
    This should be somewhat similar to generation for HW2...
    Make sure to use your predict function!

    Params:
        model: Your trained model
        seed: [w_1, w_2, ..., w_(n-1)].
        max_tokens: The maximum number of tokens to generate. When None, should gener
            generate until the end of sentence token is reached.

    Return:
        A list of generated tokens.
    """
    context_length = NGRAM - 1

    while max_tokens is None or max_tokens > 0:
        # Use only the last (n-1) tokens as context
        context = seed[-context_length:]
        
        # Get the next token prediction
        next_token = predict(model, context)
        seed.append(next_token)
        
        # Check for end-of-sentence token. (Assuming "<s>" is your EOS marker.)
        if next_token == "<s>":
            break
        
        if max_tokens is not None:
            max_tokens -= 1

    return seed


In [23]:
# you might want to define some functions to help you format the text nicely
# and/or generate multiple sequences

In [24]:
# 2.5 points

# generate and display ten sequences from both your word model and your character model
# do not include <s> or </s> in your displayed sentences
# make sure that you can read the output easily (i.e. don't just print out a list of tokens)

# For character-based, replace _ with a space

print("Word Model Outputs:")
for i in range(10):
    word_sequence = generate(word_model, ["<s>"], 20)

    filtered_tokens = [token for token in word_sequence if token not in {"<s>", "</s>"}]

    sentence = " ".join(filtered_tokens)
    print(sentence)

print("\nCharacter Model Outputs:")
for i in range(10):
    char_sequence = generate(char_model, ["<s>"], 20)

    filtered_chars = [token for token in char_sequence if token not in {"<s>", "</s>"}]

    sentence = "".join(filtered_chars)

    sentence = sentence.replace("_", " ")
    print(sentence)

  X = torch.stack([self.embedding_layer(torch.tensor(x)) for x in X], dim=0)


Word Model Outputs:
the ground . to ever from the street
as he did therefore i had ever in shuddering no less without than the whole dreams which have wished to
in the last shook mind as i have already soon close , 'of of face small i screamed ye admit
often horses . the figure laboratory into the covers being room .
the ground bred on even to frequent .
there was precisely good strange much much dutch the edge was represented come at the rest dat tissue .
you feel the catastrophe of von of the pallid bewildered .
accompanied good in decomposition ratio of munificent around observation siroc as refused the dogma carved the whole rest , dazzled
the method type journey ; and why towards us , moved , would expect , a willing and machines could
she was to make her edinburgh , one one not , as the brilliancy a quarter green down , the

Character Model Outputs:
aathe,itthetheto,myhad,histhe
a,it,not
to,hisa,thenottheafrom,wasand.with,and.with
to,
aofi
was,my
ininandthe;ofandof,ofiandtoitthe;

In [25]:
# 2.5 points

# Generate 100 example sentences with each model and save them to two files, one sentence per line
# do not include <s> and </s> in your saved sentences (you'll use these sentences in your next task)
# this will produce two files, one for each model
# We've defined the filenames for you at the top of this notebook
# Do not print these sentences here :)

for i in range(100):
    word_sequence = generate(word_model, ["<s>"], 20)
    filtered_words = [token for token in word_sequence if token not in {"<s>", "</s>"}]
    word_sentence = " ".join(filtered_words)
    
    char_sequence = generate(char_model, ["<s>"], 20)
    filtered_chars = [token for token in char_sequence if token not in {"<s>", "</s>"}]
    char_sentence = "".join(filtered_chars)
    char_sentence = char_sentence.replace("_", " ")
    
    with open(OUTPUT_WORDS, "a") as file:
        file.write(word_sentence + "\n")
    with open(OUTPUT_CHARS, "a") as file:
        file.write(char_sentence + "\n")

  X = torch.stack([self.embedding_layer(torch.tensor(x)) for x in X], dim=0)


11. What were the final parameters that you used for your model?

- N: **3**
- embedding size: **50**
- epochs: **8**
- hidden units: **128**
- learning rate: **0.0001**
- training time + system you were running it on (operating system + chip/specs): **8 mins. Windows 11, intel core i7-12650H 2.3GHZ**

  - for pairs, you can either note both partners' training times or just one

- What was the word-based model's final loss? **5.1**
- Character based? **1.97**

If you used different parameters for your word-based and character-based models, note the different parameters clearly.
