<a href="https://colab.research.google.com/github/STSBIZ/documentation/blob/master/Recipe_Generation_with_Seq2Seq_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


####  Abdul Sabry   
####  Learining how Recipe Generation with Seq2Seq Models


# 1. Introduction

# Recipe Generation Model Training
In this notebook, we train two types of sequence-to-sequence models to convert ingredients into cooking recipes. The models are:
1. A basic sequence-to-sequence model without attention.
2. An enhanced sequence-to-sequence model with attention.

The goal is to explore how the inclusion of attention mechanisms affects the performance and quality of the generated recipes.


# 2. Setup and Imports

In [1]:
# Install necessary libraries if not already installed (uncomment if needed)
# !pip install torch numpy matplotlib nltk

# Import necessary libraries
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import os
import numpy as np
import matplotlib.pyplot as plt


# Ensure that all operations are deterministic on GPU (if used) for reproducibility
torch.manual_seed(0)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(0)

# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')


# Check if CUDA is available and set device to GPU if it is
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")


Mounted at /content/drive
Using device: cuda


## Dataset Loading
Here we load the dataset from the provided directories. We assume the data is split into training, development, and testing sets.


In [2]:
train_dir = '/content/drive/MyDrive/Colab Notebooks/Cooking_Dataset/train'
dev_dir = '/content/drive/MyDrive/Colab Notebooks/Cooking_Dataset/dev'
test_dir = '/content/drive/MyDrive/Colab Notebooks/Cooking_Dataset/test'

# Function to load data
def load_data(directory):
    texts = []
    for filename in os.listdir(directory):
        full_path = os.path.join(directory, filename)
       # print(f"Checking file: {full_path}")  # Debug: print the file path being checked
        if os.path.isfile(full_path):
           # print(f"Loading file: {full_path}")  # Debug: confirm the file is being loaded
            with open(full_path, 'r', encoding='utf-8') as file:
                content = file.read()
                if content:  # Check if file is not empty
                    texts.append(content)

    return texts

# Load datasets
train_data = load_data(train_dir)
dev_data = load_data(dev_dir)
test_data = load_data(test_dir)


In [3]:
# Function to print first few entries of the dataset to verify loading
def print_sample_data(data, num_samples=1):
    for i in range(min(num_samples, len(data))):
        print(f"Sample {i+1}:")
        print(data[i][:500])  # Print first 500 characters of each sample
        print("..." + "\n" * 2)  # Add ellipsis and space between samples

# Example usage:
print("Train Data Samples:")
print_sample_data(train_data)

print("Development Data Samples:")
print_sample_data(dev_data)

print("Test Data Samples:")
print_sample_data(test_data)


Train Data Samples:
Sample 1:
Title: basic biscotti
categories:	cookies	italian
servings: 36 servings
ingredients: 4 oz blanched almonds	2 1/2 c  flour	1 3/4 c  granulated sugar	1/4 ts salt	1/4 ts baking soda	3    eggs
spread almonds on a baking sheet and toast them in oven until lightly golden .
let cool .
coarsely chop half the nuts .
butter 2 large baking sheets .
mix flour , sugar , salt and baking soda .
beat in eggs , then whole and chopped nuts .
mix to obtain a firm dough .
knead briefly , then divide dough into 2 pi
...


Development Data Samples:
Sample 1:
Title: hazelnut fudge
categories:	candy
servings: 1 batch
ingredients: 3 c  sugar	1 c  milk	1/2 c  corn syrup	3 oz unsweetened chocolate	1 c  butter	2 ts vanilla	1 c  oregon hazelnuts
cook sugar , milk , corn syrup and butter to 238 .
pour into mixing bowl ;
add vanilla ;
cool 15 minutes .
beat until thick .
stir in nuts and pour into buttered pan .
hazelnut industry and the hazelnut marketing board
END RECIPE

Title: hazel

## Data Preprocessing
Data preprocessing is crucial for transforming raw text data into a structured format suitable for training machine learning models. This section includes:
1. **Tokenization**: Converting sentences into words or meaningful tokens.
2. **Vocabulary Building**: Creating a comprehensive list of unique words used across the training dataset.
3. **Text to Sequence Conversion**: Transforming textual data into numerical form using the vocabulary index.


In [4]:
# Sample data loading and preprocessing function
def load_and_preprocess_data(filepath):
    with open(filepath, 'r', encoding='utf-8') as file:
        lines = file.readlines()

    ingredients = []
    recipes = []
    for line in lines:
        if 'ingredients:' in line:
            ingredients.append(line.split('ingredients:')[1].strip())
        elif 'recipe:' in line:
            recipe_content = line.split('recipe:')[1].strip().replace('END RECIPE', '').strip()
            recipes.append(recipe_content)

    return ingredients, recipes


 Building the Language Vocabularies
Task: Construct vocabularies for both the ingredients and recipes which will include conversion from text to indices necessary for model processing.

In [5]:
class Vocabulary:
    def __init__(self):
        self.word2index = {}
        self.index2word = {0: "<sos>", 1: "<eos>", 2: "<pad>"}
        self.word2count = {}
        self.n_words = 3  # Start counting from 3 to account for SOS, EOS, PAD

    def add_sentence(self, sentence):
        for word in sentence.split(' '):
            self.add_word(word)

    def add_word(self, word):
        if word not in self.word2index:
            self.word2index[word] = self.n_words
            self.index2word[self.n_words] = word
            self.word2count[word] = 1
            self.n_words += 1
        else:
            self.word2count[word] += 1


Model 1: Seq2Seq without Attention

In [6]:
class EncoderRNN(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(EncoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.embedding = nn.Embedding(input_size, hidden_size)
        self.lstm = nn.LSTM(hidden_size, hidden_size)

    def forward(self, input, hidden):
        embedded = self.embedding(input).view(1, 1, -1)
        output, hidden = self.lstm(embedded, hidden)
        return output, hidden

    def init_hidden(self):
        return (torch.zeros(1, 1, self.hidden_size), torch.zeros(1, 1, self.hidden_size))

class DecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size):
        super(DecoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.embedding = nn.Embedding(output_size, hidden_size)
        self.lstm = nn.LSTM(hidden_size, hidden_size)
        self.out = nn.Linear(hidden_size, output_size)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, input, hidden):
        output = self.embedding(input).view(1, 1, -1)
        output = F.relu(output)
        output, hidden = self.lstm(output, hidden)
        output = self.softmax(self.out(output[0]))
        return output, hidden

    def init_hidden(self):
        return (torch.zeros(1, 1, self.hidden_size), torch.zeros(1, 1, self.hidden_size))


In [8]:
def load_data_from_directory(directory_path):
    """ Loads data from all files in the specified directory and extracts relevant content. """
    ingredients_list = []
    recipes_list = []
    for subdir, _, files in os.walk(directory_path):
        for file in files:
            if file.endswith('.txt'):
                file_path = os.path.join(subdir, file)
                with open(file_path, 'r', encoding='utf-8') as f:
                    text = f.read()
                ingredients, recipe = extract_ingredients_and_recipe(text)
                if ingredients and recipe:
                    ingredients_list.append(ingredients)
                    recipes_list.append(recipe)
    return ingredients_list, recipes_list

# Specify the paths to your dataset directories
train_dir = '/content/drive/MyDrive/Colab Notebooks/Cooking_Dataset/train'
dev_dir = '/content/drive/MyDrive/Colab Notebooks/Cooking_Dataset/dev'
test_dir = '/content/drive/MyDrive/Colab Notebooks/Cooking_Dataset/test'

# Load the datasets
train_ingredients, train_recipes = load_data_from_directory(train_dir)
dev_ingredients, dev_recipes = load_data_from_directory(dev_dir)
test_ingredients, test_recipes = load_data_from_directory(test_dir)


NameError: name 'extract_ingredients_and_recipe' is not defined

## Data Preprocessing

Before training our models, it's crucial to convert the raw text data into a format that can be efficiently processed by the neural network. This involves:
- **Tokenization**: Converting text into sequences of integers.
- **Building Vocabulary**: Creating a mapping from words to unique indices.
- **Padding**: Standardizing the length of each sequence to allow batching of data.

We will preprocess both the ingredients and recipe texts separately to maintain their distinct characteristics.


In [9]:
# Sample data loading and preprocessing function
def load_and_preprocess_data(filepath):
    with open(filepath, 'r', encoding='utf-8') as file:
        lines = file.readlines()

    ingredients = []
    recipes = []
    for line in lines:
        if 'ingredients:' in line:
            ingredients.append(line.split('ingredients:')[1].strip())
        elif 'recipe:' in line:
            recipe_content = line.split('recipe:')[1].strip().replace('END RECIPE', '').strip()
            recipes.append(recipe_content)

    return ingredients, recipes



## Model Architecture

We implement a Sequence to Sequence (Seq2Seq) model, which is composed of two main components:

1. **Encoder**: Processes the input sequence and compresses the information into a context vector.
2. **Attention Mechanism** (optional): Helps the decoder focus on relevant parts of the input sequence for better translation accuracy.
3. **Decoder**: Generates the output sequence from the context vector.

Both the encoder and decoder are based on the LSTM architecture, which is suitable for handling sequences due to its ability to maintain long-term dependencies.


In [10]:
class Encoder(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(Encoder, self).__init__()
        self.hidden_size = hidden_size
        self.embedding = nn.Embedding(input_size, hidden_size)
        self.lstm = nn.LSTM(hidden_size, hidden_size)

    def forward(self, input, hidden):
        embedded = self.embedding(input).view(1, 1, -1)
        output, hidden = self.lstm(embedded, hidden)
        return output, hidden

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)

# Assuming the decoder follows a similar pattern, with potential additions for attention


### Decoder with Attention

The attention mechanism allows the decoder to focus on different parts of the input sequence for each step of the output sequence. This is particularly useful in tasks like translation, where different parts of the input are relevant at different stages of decoding.


In [11]:
class AttnDecoderRNN(nn.Module):
    def __init__(self, hidden_size, output_size, dropout_p=0.1, max_length=10):
        super(AttnDecoderRNN, self).__init__()
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.dropout_p = dropout_p
        self.max_length = max_length

        self.embedding = nn.Embedding(self.output_size, self.hidden_size)
        self.attn = nn.Linear(self.hidden_size * 2, self.max_length)
        self.attn_combine = nn.Linear(self.hidden_size * 2, self.hidden_size)
        self.dropout = nn.Dropout(self.dropout_p)
        self.lstm = nn.LSTM(self.hidden_size, self.hidden_size)
        self.out = nn.Linear(self.hidden_size, self.output_size)

    def forward(self, input, hidden, encoder_outputs):
        embedded = self.embedding(input).view(1, 1, -1)
        embedded = self.dropout(embedded)

        attn_weights = F.softmax(
            self.attn(torch.cat((embedded[0], hidden[0]), 1)), dim=1)
        attn_applied = torch.bmm(attn_weights.unsqueeze(0),
                                 encoder_outputs.unsqueeze(0))

        output = torch.cat((embedded[0], attn_applied[0]), 1)
        output = self.attn_combine(output).unsqueeze(0)

        output = F.relu(output)
        output, hidden = self.lstm(output, hidden)

        output = F.log_softmax(self.out(output[0]), dim=1)
        return output, hidden, attn_weights

    def initHidden(self):
        return torch.zeros(1, 1, self.hidden_size, device=device)


## Training Configuration

Set up the optimizer, loss function, and define the training loop. We will use the Cross-Entropy Loss and the Adam optimizer for training our Seq2Seq model.


In [12]:
# Define the maximum length of the sequences your model will handle
MAX_LENGTH = 10  # You can adjust this based on your specific dataset or requirements


In [13]:
def train(input_tensor, target_tensor, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion):
    max_length = MAX_LENGTH  # Use the global variable
    encoder_hidden = encoder.initHidden()

    encoder_optimizer.zero_grad()
    decoder_optimizer.zero_grad()

    input_length = input_tensor.size(0)
    target_length = target_tensor.size(0)

    encoder_outputs = torch.zeros(max_length, encoder.hidden_size, device=device)

    loss = 0

    for ei in range(input_length):
        encoder_output, encoder_hidden = encoder(input_tensor[ei], encoder_hidden)
        encoder_outputs[ei] = encoder_output[0, 0]

    decoder_input = torch.tensor([[SOS_token]], device=device)
    decoder_hidden = encoder_hidden

    use_teacher_forcing = True if random.random() < 0.5 else False

    if use_teacher_forcing:
        for di in range(target_length):
            decoder_output, decoder_hidden, _ = decoder(
                decoder_input, decoder_hidden, encoder_outputs)
            loss += criterion(decoder_output, target_tensor[di])
            decoder_input = target_tensor[di]  # Teacher forcing

    else:
        for di in range(target_length):
            decoder_output, decoder_hidden, _ = decoder(
                decoder_input, decoder_hidden, encoder_outputs)
            topv, topi = decoder_output.topk(1)
            decoder_input = topi.squeeze().detach()

            loss += criterion(decoder_output, target_tensor[di])
            if decoder_input.item() == EOS_token:
                break

    loss.backward()

    encoder_optimizer.step()
    decoder_optimizer.step()

    return loss.item() / target_length


## Training and Evaluation

### Training Loop
The training loop will run for a predefined number of epochs. During each epoch, the model will train on the entire training dataset and then validate its performance on the development set. This allows us to monitor the model's ability to generalize to new data and adjust training parameters accordingly.

### Evaluation Metrics
We will use metrics such as the loss and potentially the BLEU score for more insightful evaluation of the translation quality. These metrics will help us understand how well the model is performing and where it might need adjustments.


In [14]:
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

def create_tokenizer(lines):
    tokenizer = Tokenizer()
    tokenizer.fit_on_texts(lines)
    return tokenizer

def encode_sequences(tokenizer, length, lines):
    # Tokenize the lines
    seq = tokenizer.texts_to_sequences(lines)
    # Pad the sequences with zeros
    seq = pad_sequences(seq, maxlen=length, padding='post')
    return seq

# Choose a consistent sequence length based on your data analysis
max_length_ingredients = 50
max_length_recipes = 150

# Create tokenizer
ingredients_tokenizer = create_tokenizer(train_ingredients)
recipes_tokenizer = create_tokenizer(train_recipes)

# Prepare training, validation, and test data
trainX = encode_sequences(ingredients_tokenizer, max_length_ingredients, train_ingredients)
trainY = encode_sequences(recipes_tokenizer, max_length_recipes, train_recipes)
devX = encode_sequences(ingredients_tokenizer, max_length_ingredients, dev_ingredients)
devY = encode_sequences(recipes_tokenizer, max_length_recipes, dev_recipes)
testX = encode_sequences(ingredients_tokenizer, max_length_ingredients, test_ingredients)
testY = encode_sequences(recipes_tokenizer, max_length_recipes, test_recipes)


NameError: name 'train_ingredients' is not defined

In [15]:
ingredients_tokenizer

NameError: name 'ingredients_tokenizer' is not defined

 ## Model Building

In this section, we will define two types of Seq2Seq models:
1. **Seq2Seq Model without Attention**: This traditional model architecture uses an encoder to compress all input information into a context vector, which the decoder then uses to reconstruct the output sequence.
2. **Seq2Seq Model with Attention**: This enhanced model architecture allows the decoder to focus on different parts of the input sequence during the decoding process, which helps in handling long input sequences effectively and improves model performance by retaining more contextual information.


In [16]:
class DecoderRNN(nn.Module):
    def __init__(self, output_dim, emb_dim, hidden_dim, dropout):
        super(DecoderRNN, self).__init__()
        self.output_dim = output_dim
        self.embedding = nn.Embedding(output_dim, emb_dim)
        self.lstm = nn.LSTM(emb_dim, hidden_dim, batch_first=True)
        self.fc_out = nn.Linear(hidden_dim, output_dim)
        self.dropout = nn.Dropout(dropout)

    def forward(self, input, hidden, cell):
        input = input.unsqueeze(1)  # Adjusting for batch_first=True in LSTM
        embedded = self.dropout(self.embedding(input))
        output, (hidden, cell) = self.lstm(embedded, (hidden, cell))
        prediction = self.fc_out(output.squeeze(1))
        return prediction, hidden, cell


class DecoderRNN(nn.Module):
    def __init__(self, output_dim, emb_dim, hidden_dim, dropout):
        super(DecoderRNN, self).__init__()
        self.embedding = nn.Embedding(output_dim, emb_dim)
        self.lstm = nn.LSTM(emb_dim, hidden_dim, batch_first=True)
        self.fc_out = nn.Linear(hidden_dim, output_dim)
        self.dropout = nn.Dropout(dropout)

    def forward(self, input, hidden, cell):
        input = input.unsqueeze(1)
        embedded = self.dropout(self.embedding(input))
        output, (hidden, cell) = self.lstm(embedded, (hidden, cell))
        prediction = self.fc_out(output.squeeze(1))
        return prediction, hidden, cell

class AttnDecoderRNN(nn.Module):
    def __init__(self, output_dim, emb_dim, hidden_dim, dropout):
        super(AttnDecoderRNN, self).__init__()
        self.embedding = nn.Embedding(output_dim, emb_dim)
        self.attn = nn.Linear(hidden_dim * 2, hidden_dim)
        self.attn_combine = nn.Linear(hidden_dim * 2, emb_dim)
        self.dropout = nn.Dropout(dropout)
        self.lstm = nn.LSTM(emb_dim, hidden_dim, batch_first=True)
        self.fc_out = nn.Linear(hidden_dim, output_dim)

    def forward(self, input, hidden, encoder_outputs):
        input = input.unsqueeze(1)
        embedded = self.dropout(self.embedding(input))
        attn_weights = torch.softmax(self.attn(torch.cat((embedded, hidden), dim=2)), dim=2)
        attn_applied = torch.bmm(attn_weights, encoder_outputs)
        output = self.attn_combine(torch.cat((embedded, attn_applied), dim=2))
        output, (hidden, _) = self.lstm(output, (hidden, cell))
        prediction = self.fc_out(output.squeeze(1))
        return prediction, hidden, attn_weights


## Training Setup

For training our Seq2Seq models, we'll use the following configuration:
- **Optimizer**: Adam, with a learning rate of 0.001. Adam is chosen for its adaptive learning rate properties, which makes it suitable for this task.
- **Loss Function**: CrossEntropyLoss, which is standard for classification tasks.
- **Hyperparameters**:
  - **Learning Rate**: 0.001
  - **Hidden Dimensions**: 256
  - **Embedding Dimensions**: 256
  - **Dropout**: 0.1, to help prevent overfitting
  - **Epochs**: 10
  - **Batch Size**: 32, a standard size that balances speed and performance.
  - **Clip**: 1, to prevent gradient explosion.


In [17]:
def train(model, iterator, optimizer, criterion, clip):
    model.train()
    epoch_loss = 0

    for i, (src, trg) in enumerate(iterator):
        src, trg = src.to(device), trg.to(device)
        optimizer.zero_grad()

        output = model(src, trg[:-1])

        # Assume output shape is [trg_len, batch_size, output_dim]
        output_dim = output.shape[-1]  # This should correctly fetch output_dim

        output = output[1:].view(-1, output_dim)
        trg = trg[1:].view(-1)

        loss = criterion(output, trg)
        loss.backward()

        torch.nn.utils.clip_grad_norm_(model.parameters(), clip)
        optimizer.step()

        epoch_loss += loss.item()

    return epoch_loss / len(iterator)


In [18]:
print("Decoder output dimension:", model.decoder.output_dim)


NameError: name 'model' is not defined

## Execute Training and Monitor Progress

Now that our models are set up and the training configurations are defined, we'll execute the training loops. We will train the models for a predetermined number of epochs and monitor the training progress. Additionally, we'll evaluate the performance on the development set after each epoch to monitor how well our models generalize over time.


In [19]:
# Example: Load and prepare your dataset
# This is a placeholder - you'll need to adapt this according to your actual data source and format.

# Simulated data: Lists of indices representing words in a vocabulary
ingredients_data = [torch.randint(0, 1000, (10,)).tolist() for _ in range(100)]  # 100 samples, each is a list of indices
recipes_data = [torch.randint(0, 1000, (15,)).tolist() for _ in range(100)]      # Each recipe can have a different length

# Ensure data is appropriate for your application and matches the expected input format for RecipeDataset.


In [20]:
from torch.utils.data import Dataset, DataLoader
from torch.nn.utils.rnn import pad_sequence

class RecipeDataset(Dataset):
    def __init__(self, ingredients, recipes):
        self.ingredients = [torch.tensor(ing) for ing in ingredients]
        self.recipes = [torch.tensor(rec) for rec in recipes]

    def __len__(self):
        return len(self.ingredients)

    def __getitem__(self, idx):
        return self.ingredients[idx], self.recipes[idx]

    @staticmethod
    def collate_fn(batch):
        ingredients, recipes = zip(*batch)
        ingredients_padded = pad_sequence(ingredients, batch_first=True, padding_value=0)  # Assuming 0 is PAD_IDX
        recipes_padded = pad_sequence(recipes, batch_first=True, padding_value=0)
        return ingredients_padded, recipes_padded


In [21]:
# Instantiate the dataset with your data
dataset = RecipeDataset(ingredients_data, recipes_data)

# Create a DataLoader
train_iterator = DataLoader(dataset, batch_size=32, shuffle=True, collate_fn=RecipeDataset.collate_fn)


## Implement Training and Evaluation Loops

In this section, we'll implement the training and evaluation loops for our models. We'll train the models using the training dataset and periodically evaluate them on the development set to monitor performance and adjust our training strategy as needed. This approach helps in understanding the model's learning over time and ensuring it generalizes well to new, unseen data.


In [22]:
def train(model, iterator, optimizer, criterion, clip):
    model.train()
    epoch_loss = 0

    for i, (src, trg) in enumerate(iterator):
        src, trg = src.to(device), trg.to(device)
        optimizer.zero_grad()

        output = model(src, trg[:-1])  # Assumes the model is set up for input-output offset
        output_dim = output.shape[-1]
        output = output[1:].contiguous().view(-1, output_dim)
        trg = trg[1:].contiguous().view(-1)

        loss = criterion(output, trg)
        loss.backward()

        torch.nn.utils.clip_grad_norm_(model.parameters(), clip)
        optimizer.step()

        epoch_loss += loss.item()

    return epoch_loss / len(iterator)

def evaluate(model, iterator, criterion):
    model.eval()
    epoch_loss = 0

    with torch.no_grad():
        for i, (src, trg) in enumerate(iterator):
            src, trg = src.to(device), trg.to(device)
            output = model(src, trg[:-1])  # Forward pass without teacher forcing
            output_dim = output.shape[-1]
            output = output[1:].contiguous().view(-1, output_dim)
            trg = trg[1:].contiguous().view(-1)

            loss = criterion(output, trg)
            epoch_loss += loss.item()

    return epoch_loss / len(iterator)

# Train and evaluate the model
for epoch in range(NUM_EPOCHS):
    train_loss = train(model, train_iterator, optimizer, criterion, 1)  # Assume gradient clipping value is 1
    valid_loss = evaluate(model, dev_iterator, criterion)

    print(f'Epoch: {epoch + 1}')
    print(f'\tTrain Loss: {train_loss:.3f} | Train PPL: {math.exp(train_loss):7.3f}')
    print(f'\tValid Loss: {valid_loss:.3f} | Valid PPL: {math.exp(valid_loss):7.3f}')


NameError: name 'NUM_EPOCHS' is not defined

## Reporting and Visualization of Training Outcomes

Upon completing the training and evaluation cycles, it's important to report and visualize the outcomes. We'll plot the training and validation losses to identify patterns such as overfitting or underfitting and discuss any potential improvements or adjustments needed.


In [23]:
import matplotlib.pyplot as plt

# Assume train_losses and valid_losses are collected during the epochs
plt.figure(figsize=(8, 5))
plt.plot(train_losses, label='Training loss')
plt.plot(valid_losses, label='Validation loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training and Validation Loss Over Epochs')
plt.legend()
plt.show()


NameError: name 'train_losses' is not defined

<Figure size 800x500 with 0 Axes>