In [None]:
!pip install transformers datasets --quiet


We install the transformers library for using GPT-2 and datasets for loading and managing datasets.

In [None]:
import os
import torch
from datasets import load_dataset
from transformers import (
    GPT2Tokenizer, GPT2LMHeadModel,
    DataCollatorForLanguageModeling,
    Trainer, TrainingArguments)


We import essential tools:
GPT2Tokenizer & GPT2LMHeadModel for tokenization and model usage
Trainer and TrainingArguments for model training (even though training isn't fully shown)
load_dataset to read our custom text data

In [None]:
os.environ["WANDB_DISABLED"] = "true"


Disabling  Weights & Biases logging

In [None]:
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token
model = GPT2LMHeadModel.from_pretrained("gpt2")


We load the pre-trained GPT-2 model and tokenizer. Since GPT-2 has no padding token by default, we set the padding token to the end-of-sequence token.

In [None]:
lyrics_file = "lyrics.txt"
if not os.path.exists(lyrics_file):
    sample_lyrics = [
        "You're the one that I want\n",
        "Hello from the other side\n",
        "Cause baby you're a firework\n",
        "Let it go, let it go\n",
        "We will, we will rock you\n"
    ]
    with open(lyrics_file, "w", encoding="utf-8") as f:
        f.writelines(sample_lyrics)


We create a simple lyrics file with 5 famous English song lines. This serves as a small sample dataset.

In [None]:
dataset = load_dataset("text", data_files={"train": lyrics_file})


We load the lyrics text file into Hugging Face’s datasets format so we can tokenize and preprocess it

In [None]:
def tokenize_text(example):
    return tokenizer(example["text"], truncation=True)

tokenized_data = dataset.map(tokenize_text, batched=True)


We tokenize the text into numerical input IDs using GPT-2’s tokenizer. Tokenization is required to convert human-readable text into a format the model can understand.

In [None]:
training_args = TrainingArguments(
    output_dir="./gpt2-lyrics-output",
    overwrite_output_dir=True,
    num_train_epochs=10,
    per_device_train_batch_size=2,
    save_steps=100,
    logging_steps=10,
    save_total_limit=1,
    prediction_loss_only=True
)


Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


This sets up training configurations. Even though actual training isn’t performed in this version, this step is included for completeness in case training is added later.

In [None]:
def generate_lyrics(prompt, max_new_tokens=100):
    input_ids = tokenizer.encode(prompt, return_tensors="pt")
    output = model.generate(
        input_ids=input_ids,
        max_new_tokens=max_new_tokens,
        temperature=1.0,
        top_k=50,
        top_p=0.95,
        repetition_penalty=1.2,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id
    )
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    return generated_text


This function takes a song prompt and generates song lyrics using the GPT-2 model.

temperature, top_k, top_p control randomness and diversity

repetition_penalty reduces repetitive words

max_new_tokens is the number of new tokens to generate

In [None]:
model.save_pretrained("gpt2-lyrics-model")
tokenizer.save_pretrained("gpt2-lyrics-model")


('gpt2-lyrics-model/tokenizer_config.json',
 'gpt2-lyrics-model/special_tokens_map.json',
 'gpt2-lyrics-model/vocab.json',
 'gpt2-lyrics-model/merges.txt',
 'gpt2-lyrics-model/added_tokens.json')

Saves the current model and tokenizer locally so they can be reused without re-downloading or re-training.

In [None]:
user_prompt = input(" Enter your song prompt: ")
lyrics = generate_lyrics(user_prompt)
print("\n Generated Lyrics:")
print(lyrics)


 Enter your song prompt: Under the moonlight, we dance alone

 Generated Lyrics:
Under the moonlight, we dance alone. We feel that in our hearts it is necessary for us to live on Earth and have an emotional life with a loving partner; after all this time has passed I am happy because God gave me there-for free from hunger but now he keeps feeding every drop of food off my plate." In other words:
It's been 3 weeks since her husband left Hawaii — she can't walk (even though they are pretty far along), so when does another illness catch up? She still knows what will


Takes input from the user (e.g., a line like “Under the moonlight, we dance alone  
The stars above are shining bright  
I feel your hand, I hear your tone  
Together we escape the night...
”) and generates a continuation using the GPT-2 model.