<a href="https://colab.research.google.com/github/Sri-Pooja00/2203A51341_NLP/blob/main/2203A51341_08_NLP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [5]:

import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer
from torch.utils.data import Dataset, DataLoader
from transformers import Trainer, TrainingArguments

# Initialize the tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained("gpt2")

# Define example text data
text_data = """Once upon a time, there was a little girl named Red Riding Hood. She loved to visit her grandmother, who lived in the woods. One day, her mother asked her to take a basket of goodies to her grandmother. On her way through the woods, she met a big bad wolf who wanted to eat her."""

tokenizer.pad_token = tokenizer.eos_token

# Tokenize the text data
tokens = tokenizer(text_data, return_tensors='pt', max_length=512, truncation=True, padding="max_length")


class TextDataset(Dataset):
    def __init__(self, tokens):
        self.tokens = tokens

    def __len__(self):
        return self.tokens["input_ids"].size(0)

    def __getitem__(self, idx):
        return {
            "input_ids": self.tokens["input_ids"][idx],
            "attention_mask": self.tokens["attention_mask"][idx],
            "labels": self.tokens["input_ids"][idx]
        }

# Load data into the dataset
dataset = TextDataset(tokens)


In [8]:

import os
os.environ["WANDB_DISABLED"] = "true"
def train_model(epochs):
    # Define training arguments
    training_args = TrainingArguments(
        output_dir="./results",
        overwrite_output_dir=True,
        num_train_epochs=epochs,
        per_device_train_batch_size=1,
        save_steps=10,
        save_total_limit=1,
        logging_dir="./logs",
        logging_steps=10
    )

    # Trainer to manage the training loop
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=dataset
    )

    # Train the model
    trainer.train()

# Train the model with different epochs
for epochs in [20, 60, 70]:
    print(f"\nTraining model with {epochs} epochs...")
    train_model(epochs)



Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).



Training model with 20 epochs...


Step,Training Loss
10,1.8766
20,0.1518


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).



Training model with 60 epochs...


Step,Training Loss
10,0.04
20,0.0066
30,0.0064
40,0.0019
50,0.0022
60,0.0023


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).



Training model with 70 epochs...


Step,Training Loss
10,0.0046
20,0.0028
30,0.0037
40,0.0011
50,0.0014
60,0.001
70,0.0029


In [9]:
def generate_text(seed_text):
    # Encode the input text and set the pad token
    input_ids = tokenizer.encode(seed_text, return_tensors="pt", padding=True)
    model.config.pad_token_id = model.config.eos_token_id

    # Generate text
    output = model.generate(
        input_ids,
        max_length=100,
        num_return_sequences=1,
        no_repeat_ngram_size=2,
        top_k=50,
        top_p=0.95,
        attention_mask=(input_ids != model.config.pad_token_id)  # Handle attention mask
    )
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Example generation
seed_text = "Once upon a time"
generated_text = generate_text(seed_text)
print("\nGenerated Text:", generated_text)


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Generated Text: Once upon a time, there was a little girl named Red Riding Hood. She loved to visit her grandmother, who lived in the woods. One day, her mother asked her to take a basket of goodies to her aunt. On her way through the forest, she met a big bad wolf who wanted to eat her.
