# A2: Language Model

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import math

from datasets import load_dataset
from tqdm import tqdm

In [2]:
# Define the device (GPU if available, otherwise CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda


In [3]:
SEED = 1234
torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

## 1. Load data - AG News

The dataset is called AG's News Topic Classification Dataset. It consists of over 1 million news articles categorized into 4 classes: World, Sports, Business, and Sci/Tech. The dataset is used as a benchmark for text classification tasks and was originally gathered from over 2000 news sources. It includes both a training set (120,000 samples) and a testing set (7,600 samples), with 30,000 training samples and 1,900 testing samples per class.

#### **AG's News Topic Classification Dataset** 

- **Source**: The AG's News Topic Classification Dataset contains over 1 million news articles, categorized into 4 classes: World, Sports, Business, and Sci/Tech. It was created by **ComeToMyHead**, an academic news search engine.
  
- **Description**: The dataset includes 120,000 training samples and 7,600 testing samples. It is commonly used for text classification benchmarks.

- **Citation**:  
  Zhang, X., Zhao, J., & LeCun, Y. (2015). "Character-level Convolutional Networks for Text Classification," *Advances in Neural Information Processing Systems 28 (NIPS 2015)*.

- **Dataset Links**:  
  - [Original Dataset](http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html)  
  - [Hugging Face Dataset](https://huggingface.co/datasets/ag_news)


In [4]:
dataset = load_dataset('ag_news')
# Remove 'label' column from both the train and test datasets
dataset = dataset.remove_columns(['label'])

In [5]:
print(dataset['train'][550]['text'])

Genetic Material May Help Make Nano-Devices: Study (Reuters) Reuters - The genetic building blocks that\form the basis for life may also be used to build the tiny\machines of nanotechnology, U.S. researchers said on Thursday.


In [6]:
print(dataset['train'].shape)

(120000, 1)


## 2. Preprocessing

### Tokenizing

Using custom functions to tokenize, since torchtext is not comapatible with the current version of pytorch. It was decided to use custom function for tokenization.

In [7]:
# Simple whitespace-based tokenizer (no subword tokenization)
def simple_tokenize(texts):
    # Tokenize each text in the batch
    return [text.split() for text in texts]

# Tokenize data using simple whitespace tokenization (for batched processing)
def tokenize_data(batch):
    return {'tokens': simple_tokenize(batch['text'])}

In [8]:
# Apply tokenizer to the entire training dataset
tokenized_train = dataset['train'].map(tokenize_data, remove_columns=['text'], batched=True)

# Apply tokenizer to the entire test dataset
tokenized_test = dataset['test'].map(tokenize_data, remove_columns=['text'], batched=True)

# Manually create the validation set from the training set (10% of the training data)
tokenized_valid = tokenized_train.train_test_split(test_size=0.1)['test']


# Verify the tokenization
print(f"First 3 tokenized train examples: {tokenized_train['tokens'][:3]}")
print(f"First 3 tokenized validation examples: {tokenized_valid['tokens'][:3]}")
print(f"First 3 tokenized test examples: {tokenized_test['tokens'][:3]}")

First 3 tokenized train examples: [['Wall', 'St.', 'Bears', 'Claw', 'Back', 'Into', 'the', 'Black', '(Reuters)', 'Reuters', '-', 'Short-sellers,', 'Wall', "Street's", 'dwindling\\band', 'of', 'ultra-cynics,', 'are', 'seeing', 'green', 'again.'], ['Carlyle', 'Looks', 'Toward', 'Commercial', 'Aerospace', '(Reuters)', 'Reuters', '-', 'Private', 'investment', 'firm', 'Carlyle', 'Group,\\which', 'has', 'a', 'reputation', 'for', 'making', 'well-timed', 'and', 'occasionally\\controversial', 'plays', 'in', 'the', 'defense', 'industry,', 'has', 'quietly', 'placed\\its', 'bets', 'on', 'another', 'part', 'of', 'the', 'market.'], ['Oil', 'and', 'Economy', 'Cloud', "Stocks'", 'Outlook', '(Reuters)', 'Reuters', '-', 'Soaring', 'crude', 'prices', 'plus', 'worries\\about', 'the', 'economy', 'and', 'the', 'outlook', 'for', 'earnings', 'are', 'expected', 'to\\hang', 'over', 'the', 'stock', 'market', 'next', 'week', 'during', 'the', 'depth', 'of', 'the\\summer', 'doldrums.']]
First 3 tokenized validation

In [9]:
tokenized_train.shape

(120000, 1)

In [10]:
tokenized_test.shape

(7600, 1)

In [11]:
tokenized_valid.shape

(12000, 1)

In [12]:
print(tokenized_train[550]['tokens'])

['Genetic', 'Material', 'May', 'Help', 'Make', 'Nano-Devices:', 'Study', '(Reuters)', 'Reuters', '-', 'The', 'genetic', 'building', 'blocks', 'that\\form', 'the', 'basis', 'for', 'life', 'may', 'also', 'be', 'used', 'to', 'build', 'the', 'tiny\\machines', 'of', 'nanotechnology,', 'U.S.', 'researchers', 'said', 'on', 'Thursday.']


In [13]:
print(tokenized_test[550]['tokens'])

['Second', 'Prisoner', 'Abuse', 'Report', 'Expected', 'WASHINGTON', '-', 'Inattention', 'to', 'prisoner', 'issues', 'by', 'senior', 'U.S.', 'military', 'leaders', 'in', 'Iraq', 'and', 'at', 'the', 'Pentagon', 'was', 'a', 'key', 'factor', 'in', 'the', 'abuse', 'scandal', 'at', 'Abu', 'Ghraib', 'prison,', 'but', 'there', 'is', 'no', 'evidence', 'they', 'ordered', 'any', 'mistreatment,', 'an', 'independent', 'panel', 'concluded...']


In [14]:
print(tokenized_valid[550]['tokens'])

['STANFORD', 'NOTEBOOK', 'Just', 'go', 'away', 'quietly?', 'Not', 'Stanford', '#39;s', 'angry', '&lt;b&gt;...&lt;/b&gt;', 'Guard', 'Ismail', 'Simpson', 'of', 'Stanford', 'was', 'walking', 'off', 'the', 'field', 'at', 'the', 'end', 'of', 'the', 'game,', 'yelling', 'and', 'gesturing.', 'An', 'assistant', 'coach', 'had', 'an', 'arm', 'around', 'him,', 'pushing', 'him', 'in', 'the', 'direction', 'of', 'the', 'end', 'zone', 'tunnel', 'that', 'leads', 'to', 'the', 'locker', 'room.']


### Numericalizing

We will tell torchtext to add any word that has occurred at least ten times in the dataset to the vocabulary because otherwise it would be too big.  Also we shall make sure to add `unk` and `eos`.

In [15]:
from collections import Counter

# Flatten the list of tokens from the tokenized datasets (train, validation, and test)
flat_tokens = [token for dataset in [tokenized_train, tokenized_valid, tokenized_test]
               for tokens in dataset['tokens'] for token in tokens]

# Count the frequency of each token
token_counts = Counter(flat_tokens)

# Filter tokens by frequency (min_freq = 10)
filtered_tokens = [token for token, count in token_counts.items() if count >= 10]

# Add special tokens <unk> and <eos>
special_tokens = ['<unk>', '<eos>']
vocab_tokens = special_tokens + filtered_tokens

# Create vocabulary (mapping tokens to indices)
vocab = {token: idx for idx, token in enumerate(vocab_tokens)}

# Invert the vocab dictionary to get index-to-token mapping
itos_vocab = {idx: token for token, idx in vocab.items()}

In [16]:
# Print the vocabulary size and some sample tokens
print(f"Vocabulary size: {len(vocab)}")
print(f"Some sample tokens: {list(vocab.keys())[:10]}")

Vocabulary size: 31866
Some sample tokens: ['<unk>', '<eos>', 'Wall', 'St.', 'Bears', 'Back', 'Into', 'the', 'Black', '(Reuters)']


## 3. Prepare the batch loader

### Prepare data

Given "Chaky loves eating at AIT", and "I really love deep learning", and given batch size = 3, we will get three batches of data "Chaky loves eating at", "AIT `<eos>` I really", "love deep learning `<eos>`".  

In [17]:
def get_data(dataset, vocab, batch_size):
    data = []
    for example in dataset:
        if example['tokens']:
            # Append <eos> to the token list (correctly modify tokens)
            tokens = example['tokens'] + ['<eos>']              
            # Convert tokens to indices using vocab, using vocab.get() to handle OOV tokens
            tokens = [vocab.get(token, vocab['<unk>']) for token in tokens]
            # Extend the data with token indices
            data.extend(tokens)
            
    # Convert to a tensor
    data = torch.LongTensor(data).to(device)
    
    # Calculate the number of batches
    num_batches = data.shape[0] // batch_size
    
    # Trim data to fit into complete batches
    data = data[:num_batches * batch_size]
    
    # Reshape data to [batch_size, seq_len]
    data = data.view(batch_size, num_batches)
    
    return data  # [batch_size, seq_len]

In [18]:
batch_size = 128
train_data = get_data(tokenized_train, vocab, batch_size)
valid_data = get_data(tokenized_valid, vocab, batch_size)
test_data  = get_data(tokenized_test,  vocab, batch_size)

In [19]:
train_data.shape

torch.Size([128, 36419])

In [20]:
valid_data.shape

torch.Size([128, 3645])

In [21]:
test_data.shape

torch.Size([128, 2299])

## 4. Modeling 

# LSTM-based Language Model

This class defines an LSTM-based language model using PyTorch's `nn.Module`. It includes the following key components:

- **Embedding Layer**: Converts input tokens into dense vectors of a specified size (`emb_dim`).
- **LSTM Layer**: Processes the embedded input sequence with multiple layers, hidden state dimensions (`hid_dim`), and a dropout rate for regularization.
- **Dropout Layer**: Applied after LSTM to prevent overfitting.
- **Fully Connected Layer**: Maps the LSTM outputs to the vocabulary size for prediction.

### Initialization:
- Weights are initialized using uniform distributions for both the embedding and LSTM layers.
- The `init_weights` method customizes weight initialization.

### Forward Pass:
- The forward method processes the input sequence (`src`) through the embedding, LSTM, and dropout layers, followed by a final fully connected layer to predict the next token in the sequence.

### Hidden State Management:
- The `init_hidden` method initializes the hidden state and cell state for LSTM layers.
- The `detach_hidden` method detaches the hidden states from the computation graph to prevent gradient computation for subsequent batches.


In [22]:
class LSTMLanguageModel(nn.Module):
    def __init__(self, vocab_size, emb_dim, hid_dim, num_layers, dropout_rate):
        super().__init__()
        self.num_layers = num_layers
        self.hid_dim    = hid_dim
        self.emb_dim    = emb_dim
        
        
        self.embedding  = nn.Embedding(vocab_size, emb_dim)
        self.lstm       = nn.LSTM(emb_dim, hid_dim, num_layers=num_layers, dropout=dropout_rate, batch_first=True)
        self.dropout    = nn.Dropout(dropout_rate)
        self.fc         = nn.Linear(hid_dim, vocab_size)
        
        self.init_weights()
    
    def init_weights(self):
        init_range_emb = 0.1
        init_range_other = 1/math.sqrt(self.hid_dim)
        self.embedding.weight.data.uniform_(-init_range_emb, init_range_other)
        self.fc.weight.data.uniform_(-init_range_other, init_range_other)
        self.fc.bias.data.zero_()
        for i in range(self.num_layers):
            self.lstm.all_weights[i][0] = torch.FloatTensor(self.emb_dim,
                self.hid_dim).uniform_(-init_range_other, init_range_other) #We
            self.lstm.all_weights[i][1] = torch.FloatTensor(self.hid_dim,   
                self.hid_dim).uniform_(-init_range_other, init_range_other) #Wh
    
    def init_hidden(self, batch_size, device):
        hidden = torch.zeros(self.num_layers, batch_size, self.hid_dim).to(device)
        cell   = torch.zeros(self.num_layers, batch_size, self.hid_dim).to(device)
        return hidden, cell
        
    def detach_hidden(self, hidden):
        hidden, cell = hidden
        hidden = hidden.detach() #not to be used for gradient computation
        cell   = cell.detach()
        return hidden, cell
        
    def forward(self, src, hidden):
        #src: [batch_size, seq len]
        embedding = self.dropout(self.embedding(src)) #harry potter is
        #embedding: [batch-size, seq len, emb dim]
        output, hidden = self.lstm(embedding, hidden)
        #ouput: [batch size, seq len, hid dim]
        #hidden: [num_layers * direction, seq len, hid_dim]
        output = self.dropout(output)
        prediction =self.fc(output)
        #prediction: [batch_size, seq_len, vocab_size]
        return prediction, hidden

## 5. Training 

Follows very basic procedure.  One note is that some of the sequences that will be fed to the model may involve parts from different sequences in the original dataset or be a subset of one (depending on the decoding length). For this reason we will reset the hidden state every epoch, this is like assuming that the next batch of sequences is probably always a follow up on the previous in the original dataset.

In [23]:
vocab_size = len(vocab)
emb_dim = 512                # 400 in the paper
hid_dim = 512               # 1150 in the paper
num_layers = 2                # 3 in the paper
dropout_rate = 0.65             
lr = 1e-3                    

In [24]:
model      = LSTMLanguageModel(vocab_size, emb_dim, hid_dim, num_layers, dropout_rate).to(device)
optimizer  = optim.Adam(model.parameters(), lr=lr)
criterion  = nn.CrossEntropyLoss()
num_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f'The model has {num_params:,} trainable parameters')

The model has 36,865,146 trainable parameters


In [25]:
def get_batch(data, seq_len, idx):
    #data #[batch size, bunch of tokens]
    src    = data[:, idx:idx+seq_len]                   
    target = data[:, idx+1:idx+seq_len+1]  #target simply is ahead of src by 1            
    return src, target

In [26]:
def train(model, data, optimizer, criterion, batch_size, seq_len, clip, device):
    
    epoch_loss = 0
    model.train()
    # drop all batches that are not a multiple of seq_len
    # data #[batch size, seq len]
    num_batches = data.shape[-1]
    data = data[:, :num_batches - (num_batches -1) % seq_len]  #we need to -1 because we start at 0
    num_batches = data.shape[-1]
    
    #reset the hidden every epoch
    hidden = model.init_hidden(batch_size, device)
    
    for idx in tqdm(range(0, num_batches - 1, seq_len), desc='Training: ',leave=False):
        optimizer.zero_grad()
        
        #hidden does not need to be in the computational graph for efficiency
        hidden = model.detach_hidden(hidden)

        src, target = get_batch(data, seq_len, idx) #src, target: [batch size, seq len]
        src, target = src.to(device), target.to(device)
        batch_size = src.shape[0]
        prediction, hidden = model(src, hidden)               

        #need to reshape because criterion expects pred to be 2d and target to be 1d
        prediction = prediction.reshape(batch_size * seq_len, -1)  #prediction: [batch size * seq len, vocab size]  
        target = target.reshape(-1)
        loss = criterion(prediction, target)
        
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), clip)
        optimizer.step()
        epoch_loss += loss.item() * seq_len
    return epoch_loss / num_batches

In [27]:
def evaluate(model, data, criterion, batch_size, seq_len, device):
    epoch_loss = 0
    model.eval()
    num_batches = data.shape[-1]
    data = data[:, :num_batches - (num_batches -1) % seq_len]
    num_batches = data.shape[-1]

    hidden = model.init_hidden(batch_size, device)

    with torch.no_grad():
        for idx in range(0, num_batches - 1, seq_len):
            hidden = model.detach_hidden(hidden)
            src, target = get_batch(data, seq_len, idx)
            src, target = src.to(device), target.to(device)
            batch_size= src.shape[0]

            prediction, hidden = model(src, hidden)
            prediction = prediction.reshape(batch_size * seq_len, -1)
            target = target.reshape(-1)

            loss = criterion(prediction, target)
            epoch_loss += loss.item() * seq_len
    return epoch_loss / num_batches

Here we will be using a `ReduceLROnPlateau` learning scheduler which decreases the learning rate by a factor, if the loss don't improve by a certain epoch.

In [28]:
n_epochs = 25
seq_len  = 50 #<----decoding length
clip    = 0.25

lr_scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, factor=0.5, patience=0)

best_valid_loss = float('inf')

for epoch in range(n_epochs):
    train_loss = train(model, train_data, optimizer, criterion,  batch_size, seq_len, clip, device)
    valid_loss = evaluate(model, valid_data, criterion, batch_size, seq_len, device)

    lr_scheduler.step(valid_loss)

    if valid_loss < best_valid_loss:
        best_valid_loss = valid_loss
        torch.save(model.state_dict(), 'best-val-lstm_lm.pt')

    print(f'\tTrain Perplexity: {math.exp(train_loss):.3f}')
    print(f'\tValid Perplexity: {math.exp(valid_loss):.3f}')

                                                           

	Train Perplexity: 1188.215
	Valid Perplexity: 553.509


                                                           

	Train Perplexity: 506.114
	Valid Perplexity: 304.066


                                                           

	Train Perplexity: 337.084
	Valid Perplexity: 209.612


                                                           

	Train Perplexity: 264.895
	Valid Perplexity: 167.812


                                                           

	Train Perplexity: 226.492
	Valid Perplexity: 142.989


                                                           

	Train Perplexity: 201.566
	Valid Perplexity: 126.115


                                                           

	Train Perplexity: 183.928
	Valid Perplexity: 114.331


                                                           

	Train Perplexity: 170.931
	Valid Perplexity: 105.307


                                                           

	Train Perplexity: 160.634
	Valid Perplexity: 98.228


                                                           

	Train Perplexity: 152.442
	Valid Perplexity: 92.431


                                                           

	Train Perplexity: 145.823
	Valid Perplexity: 87.813


                                                           

	Train Perplexity: 140.106
	Valid Perplexity: 83.763


                                                           

	Train Perplexity: 135.322
	Valid Perplexity: 80.062


                                                           

	Train Perplexity: 131.229
	Valid Perplexity: 77.121


                                                           

	Train Perplexity: 127.613
	Valid Perplexity: 74.783


                                                           

	Train Perplexity: 124.378
	Valid Perplexity: 72.109


                                                           

	Train Perplexity: 121.695
	Valid Perplexity: 70.242


                                                           

	Train Perplexity: 119.119
	Valid Perplexity: 68.329


                                                           

	Train Perplexity: 116.819
	Valid Perplexity: 66.668


                                                           

	Train Perplexity: 114.826
	Valid Perplexity: 65.364


                                                           

	Train Perplexity: 112.847
	Valid Perplexity: 63.834


                                                           

	Train Perplexity: 111.112
	Valid Perplexity: 62.610


                                                           

	Train Perplexity: 109.650
	Valid Perplexity: 61.325


                                                           

	Train Perplexity: 108.175
	Valid Perplexity: 60.251


                                                           

	Train Perplexity: 106.821
	Valid Perplexity: 59.407


## 6. Testing

In [29]:
model.load_state_dict(torch.load('best-val-lstm_lm.pt',  map_location=device))
test_loss = evaluate(model, test_data, criterion, batch_size, seq_len, device)
print(f'Test Perplexity: {math.exp(test_loss):.3f}')

  model.load_state_dict(torch.load('best-val-lstm_lm.pt',  map_location=device))


Test Perplexity: 89.354


## 7. Real-world inference

Here we take the prompt, tokenize, encode and feed it into the model to get the predictions.  We then apply softmax while specifying that we want the output due to the last word in the sequence which represents the prediction for the next word.  We divide the logits by a temperature value to alter the model’s confidence by adjusting the softmax probability distribution.

Once we have the Softmax distribution, we randomly sample it to make our prediction on the next word. If we get <unk> then we give that another try.  Once we get <eos> we stop predicting.
    
We decode the prediction back to strings last lines.

In [44]:
def generate(prompt, max_seq_len, temperature, model, vocab, itos_vocab, device, seed=None):
    if seed is not None:
        torch.manual_seed(seed)
    model.eval()
    
    # Use your custom tokenizer to tokenize the prompt
    tokens = simple_tokenize([prompt])[0]  # Tokenize the prompt and get the first (only) sentence
    indices = [vocab.get(t, vocab['<unk>']) for t in tokens]  # Convert tokens to indices (using <unk> for unknown tokens)
    
    batch_size = 1
    hidden = model.init_hidden(batch_size, device)
    
    with torch.no_grad():
        for i in range(max_seq_len):
            # Convert the token indices to tensor and move to the device
            src = torch.LongTensor([indices]).to(device)
            prediction, hidden = model(src, hidden)
            
            # Get the probabilities for the last token in the sequence
            probs = torch.softmax(prediction[:, -1] / temperature, dim=-1)
            
            # Sample the next token based on the probabilities
            prediction = torch.multinomial(probs, num_samples=1).item()
            
            # If the prediction is <unk>, sample again
            while prediction == vocab.get('<unk>', -1): 
                prediction = torch.multinomial(probs, num_samples=1).item()

            # If the prediction is <eos>, stop generating
            if prediction == vocab.get('<eos>', -1):    
                break

            # Add the predicted token index to the list for the next iteration
            indices.append(prediction)

    # Decode the generated token indices back to tokens (words)
    tokens = [itos_vocab[i] for i in indices]
    return tokens

#### World News

In [45]:
prompt = 'Climate change is affecting countries like '
max_seq_len = 30
seed = 0

temperatures = [0.5, 0.7, 0.75, 0.8, 1.0]

# Generate text with each temperature value
for temperature in temperatures:
    generation = generate(prompt, max_seq_len, temperature, model, vocab, itos_vocab, device, seed)
    print(f"Temperature {temperature}:\n{' '.join(generation)}\n")

Temperature 0.5:
Climate change is affecting countries like the new

Temperature 0.7:
Climate change is affecting countries like the palm of the new local

Temperature 0.75:
Climate change is affecting countries like the Americans, the White House said yesterday.

Temperature 0.8:
Climate change is affecting countries like the Americans, the palm of the new local

Temperature 1.0:
Climate change is affecting countries like U.S. quot;



#### Sports

In [35]:
prompt = 'Athletes are preparing for the upcoming '
max_seq_len = 30
seed = 0

temperatures = [0.5, 0.7, 0.75, 0.8, 1.0]

# Generate text with each temperature value
for temperature in temperatures:
    generation = generate(prompt, max_seq_len, temperature, model, vocab, itos_vocab, device, seed)
    print(f"Temperature {temperature}:\n{' '.join(generation)}\n")

Temperature 0.5:
Athletes are preparing for the upcoming Olympic Games

Temperature 0.7:
Athletes are preparing for the upcoming U.S. Olympic Committee in Canada.

Temperature 0.75:
Athletes are preparing for the upcoming U.S. Olympic Committee in Canada.

Temperature 0.8:
Athletes are preparing for the upcoming U.S. Olympic title in Athens.

Temperature 1.0:
Athletes are preparing for the upcoming U.S. Open.



#### Business

In [40]:
prompt = 'The stock market is seeing an uptrend due to'
max_seq_len = 30
seed = 0

temperatures = [0.5, 0.7, 0.75, 0.8, 1.0]

# Generate text with each temperature value
for temperature in temperatures:
    generation = generate(prompt, max_seq_len, temperature, model, vocab, itos_vocab, device, seed)
    print(f"Temperature {temperature}:\n{' '.join(generation)}\n")

Temperature 0.5:
The stock market is seeing an <unk> due to the new

Temperature 0.7:
The stock market is seeing an <unk> due to the impact of the global economy.

Temperature 0.75:
The stock market is seeing an <unk> due to the impact of the US oil prices because his growing dollar.

Temperature 0.8:
The stock market is seeing an <unk> due to the impact of the US oil prices because his growing dollar.

Temperature 1.0:
The stock market is seeing an <unk> due to U.S. winter picture prices.



#### Sci/Tech

In [41]:
prompt = 'Scientists have made a breakthrough in '
max_seq_len = 30
seed = 0

temperatures = [0.5, 0.7, 0.75, 0.8, 1.0]

# Generate text with each temperature value
for temperature in temperatures:
    generation = generate(prompt, max_seq_len, temperature, model, vocab, itos_vocab, device, seed)
    print(f"Temperature {temperature}:\n{' '.join(generation)}\n")

Temperature 0.5:
Scientists have made a breakthrough in the evolution of the new species of humans, according to a study released yesterday.

Temperature 0.7:
Scientists have made a breakthrough in the evolution of the new

Temperature 0.75:
Scientists have made a breakthrough in the evolution of the planet that works on his

Temperature 0.8:
Scientists have made a breakthrough in the evolution of the planet that works on his side and

Temperature 1.0:
Scientists have made a breakthrough in the evolution of palm bodies from new spacecraft, researchers said.



#### Exporting 

In [33]:
import pickle

model_file = "lstm_model.pth"
vocab_file = "vocab.pkl"

# Save the model
torch.save(model.state_dict(), model_file)
print(f"Model saved to {model_file}")

# Save the vocabulary
with open(vocab_file, 'wb') as f:
    pickle.dump(vocab, f)
print(f"Vocabulary saved to {vocab_file}")

Model saved to lstm_model.pth
Vocabulary saved to vocab.pkl
