# 🎶 Song Lyrics Generation with GPT-2 and a GPT class

This notebook fine-tunes a GPT-2 language model to generate original song lyrics in the style of specific artists (e.g., Blink-182, Weird Al Yankovic) using their actual lyrics as training data.

---

## 🔧 What This Notebook Does

1. **Preprocesses Lyrics**  
   - Cleans and formats raw lyrics (removes headers, symbols, excess whitespace, etc.)

2. **Fine-Tunes GPT-2**  
   - Uses Hugging Face Transformers to fine-tune GPT-2 on cleaned lyrics  
   - Training parameters like number of epochs and batch size are adjustable

3. **Generates New Lyrics**  
   - Prompts the model with a custom lyric or phrase  
   - Outputs new lyrics in the trained artist's style  
   - Optionally formats output into sections (e.g., `[Verse]`, `[Chorus]`)

4. **Supports Multiple Artists**  
   - Easily swap in different lyric datasets (e.g., Blink-182, Weird Al, etc.)  
   - Modular structure for scraping, preprocessing, training, and generation

---

## 📦 Requirements

- `transformers`  
- `datasets`  
- `torch`  

---

💡 It is not too difficult to generate songs from other artists.


We start by importing the necessary libraries.

In [2]:
%%capture
!pip install datasets

In [3]:
import requests
import time
import os
import re
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import torch
import torch.nn as nn
import random
import datasets
from datasets import Dataset
from transformers import Trainer, TrainingArguments, DataCollatorForLanguageModeling
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")

We also log into Hugging Face.

In [4]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

We then create a function to get lyrics through [lyrics.ovh](https://lyrics.ovh/).

In [5]:
def get_lyrics(artist, title):
    """
    Fetch lyrics for a given song using the lyrics.ovh API.

    Parameters
    ----------
    artist : str
        The name of the artist (e.g., "Weird Al Yankovic").
    title : str
        The title of the song (e.g., "Amish Paradise").

    Returns
    -------
    str or None
        The song lyrics as a string if found; otherwise, None.
    """
    # Construct the API URL with artist and title
    url = f"https://api.lyrics.ovh/v1/{artist}/{title}"

    # Make a GET request to the API
    res = requests.get(url)

    # If successful, extract lyrics from JSON
    if res.status_code == 200:
        lyrics = res.json().get("lyrics", None)
        return lyrics

    # Return None if lyrics not found or request failed
    return None

# Blink 182 Song Generator

Our first example will generate songs from [Blink 182](https://en.wikipedia.org/wiki/Blink-182). We first provided a list of all known songs.

In [None]:
blink_182_songs = [
    "All the Small Things",
    "What's My Age Again?",
    "I Miss You",
    "Dammit",
    "First Date",
    "The Rock Show",
    "Feeling This",
    "Adam's Song",
    "Stay Together for the Kids",
    "Down",
    "Man Overboard",
    "Josie",
    "Aliens Exist",
    "Anthem Part Two",
    "Reckless Abandon",
    "Dumpweed",
    "Not Now",
    "Always",
    "Bored to Death",
    "Ghost on the Dance Floor",
    "Up All Night",
    "After Midnight",
    "Darkside",
    "Carousel",
    "M+M's",
    "Pathetic",
    "Stockholm Syndrome",
    "Violence",
    "Asthenia",
    "Go",
    "Another Girl Another Planet",
    "Natives",
    "Wishing Well",
    "Kaleidoscope",
    "Hearts All Gone",
    "Even If She Falls",
    "She's Out of Her Mind",
    "Los Angeles",
    "Sober",
    "Home Is Such a Lonely Place",
    "Kings of the Weekend",
    "Rabbit Hole",
    "San Diego",
    "Built This Pool",
    "No Future",
    "Teenage Satellites",
    "Left Alone",
    "Bottom of the Ocean",
    "Long Lost Feeling",
    "Wildfire",
    "6/8",
    "Parking Lot",
    "Misery",
    "Good Old Days",
    "Don't Mean Anything",
    "Hey I'm Sorry",
    "Last Train Home",
    "California",
    "The Only Thing That Matters",
    "Brohemian Rhapsody",
    "Don't Leave Me",
    "Happy Holidays, You Bastard",
    "Story of a Lonely Guy",
    "Give Me One Good Reason",
    "Please Take Me Home",
    "The Party Song",
    "Online Songs",
    "Shut Up",
    "Roller Coaster",
    "Time",
    "Degenerate",
    "Lemmings",
    "Waggy",
    "Enthused",
    "Emo",
    "Apple Shampoo",
    "Untitled",
    "Voyeur",
    "I'm Sorry",
    "Fentoozler",
    "Romeo and Rebecca",
    "Ben Wah Balls",
    "Strings",
    "Toast and Bananas",
    "The Girl Next Door",
    "Sometimes",
    "TV",
    "Depends",
    "21 Days",
    "Does My Breath Smell?",
    "Cacophony",
    "Zulu",
    "Red Skies",
    "Marlboro Man",
    "The Family Next Door",
    "Transvestite",
    "Time to Break Up"
]


We then write the text and title of those songs into a text file.

In [None]:
output_path = os.path.abspath("blink_182_lyrics.txt")
print(f"Writing lyrics to: {output_path}")

with open("blink_182_lyrics.txt", "w", encoding="utf-8") as f:
    for title in blink_182_songs:
        print(f"Fetching: {title}")
        lyrics = get_lyrics("Blink 182", title)

        if lyrics:
            f.write(f"### {title} ###\n{lyrics}\n\n")
            print(f"✅ Wrote: {title}")
        else:
            f.write(f"### {title} ###\nLyrics not found.\n\n")
            print(f"❌ Not found: {title}")
        time.sleep(1)


Writing lyrics to: /content/blink_182_lyrics.txt
Fetching: All the Small Things
✅ Wrote: All the Small Things
Fetching: What's My Age Again?
✅ Wrote: What's My Age Again?
Fetching: I Miss You
✅ Wrote: I Miss You
Fetching: Dammit
✅ Wrote: Dammit
Fetching: First Date
✅ Wrote: First Date
Fetching: The Rock Show
✅ Wrote: The Rock Show
Fetching: Feeling This
✅ Wrote: Feeling This
Fetching: Adam's Song
✅ Wrote: Adam's Song
Fetching: Stay Together for the Kids
✅ Wrote: Stay Together for the Kids
Fetching: Down
✅ Wrote: Down
Fetching: Man Overboard
✅ Wrote: Man Overboard
Fetching: Josie
✅ Wrote: Josie
Fetching: Aliens Exist
✅ Wrote: Aliens Exist
Fetching: Anthem Part Two
❌ Not found: Anthem Part Two
Fetching: Reckless Abandon
✅ Wrote: Reckless Abandon
Fetching: Dumpweed
✅ Wrote: Dumpweed
Fetching: Not Now
✅ Wrote: Not Now
Fetching: Always
✅ Wrote: Always
Fetching: Bored to Death
✅ Wrote: Bored to Death
Fetching: Ghost on the Dance Floor
✅ Wrote: Ghost on the Dance Floor
Fetching: Up All Night


Next, we preprocessed the raw lyrics and use the Hugging Face transformers library to fine-tune GPT-2.

In [6]:
def preprocess_lyrics(lyrics):
    """
    Clean and normalize song lyrics for training.

    This function removes section headers (e.g., [Chorus]),
    extra newlines, and punctuation (except apostrophes),
    and converts all text to lowercase.

    Parameters
    ----------
    lyrics : str
        The raw lyrics text to preprocess.

    Returns
    -------
    str
        A cleaned, normalized string of lyrics.
    """
    lyrics = re.sub(r'\[.*?\]', '', lyrics)  # Remove [Verse], [Chorus], etc.
    lyrics = re.sub(r'\n{2,}', '\n', lyrics)  # Replace multiple newlines with a single newline
    lyrics = re.sub(r'[^\w\s\']', '', lyrics)  # Remove punctuation except apostrophes
    lyrics = lyrics.lower()  # Convert all text to lowercase
    return lyrics


def fine_tune_gpt2(lyrics_text, output_dir):
    """
    Fine-tune a GPT-2 language model on provided song lyrics.

    This function tokenizes the input lyrics, creates a dataset,
    and uses Hugging Face's Trainer API to fine-tune a GPT-2 model.
    The model and tokenizer are saved to the specified output directory.

    Parameters
    ----------
    lyrics_text : str
        The preprocessed lyrics text used for training.

    output_dir : str
        Path to the directory where the fine-tuned model will be saved.

    Returns
    -------
    None
    """
    # Load GPT-2 tokenizer and set pad token
    tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
    tokenizer.pad_token = tokenizer.eos_token

    # Split lyrics into chunks (each chunk is a training sample)
    chunks = lyrics_text.strip().split("\n")
    chunks = [c.strip() for c in chunks if len(c.strip()) > 20]

    # Tokenize the text chunks into input tensors
    encodings = tokenizer(chunks, return_tensors="pt", padding="max_length", truncation=True, max_length=128)

    # Convert encodings to Hugging Face Dataset
    dataset = Dataset.from_dict({
        "input_ids": encodings["input_ids"],
        "attention_mask": encodings["attention_mask"]
    })

    # Load base GPT-2 model
    model = GPT2LMHeadModel.from_pretrained("gpt2")

    # Setup data collator for causal language modeling (no masking)
    data_collator = DataCollatorForLanguageModeling(
        tokenizer=tokenizer,
        mlm=False  # Masked LM is for BERT-like models; GPT uses causal LM
    )

    # Define training configuration
    training_args = TrainingArguments(
        output_dir=output_dir,
        overwrite_output_dir=True,
        num_train_epochs=10,
        per_device_train_batch_size=4,
        save_steps=500,
        save_total_limit=2,
        logging_dir="./logs",
        logging_steps=100,
        report_to="none"  # Disable default logging to Weights and Biases or TensorBoard
    )

    # Initialize Hugging Face Trainer for model training
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=dataset,
        data_collator=data_collator
    )

    # Run training
    trainer.train()

    # Save the fine-tuned model to the output directory
    trainer.save_model(output_dir)
    print(f"✅ Model saved to: {output_dir}")

The commands below fine tune the GPT-2 model on the preprocessed text.

In [None]:
# Load the file content
with open("blink_182_lyrics.txt", "r", encoding="utf-8") as f:
    blink_lyrics_raw = f.read()

# Preprocess the text
blink_lyrics_clean = preprocess_lyrics(blink_lyrics_raw)

# Fine-tune GPT-2
fine_tune_gpt2(blink_lyrics_clean, "blink_gpt2_model")

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.


Step,Training Loss
100,3.8945
200,3.6534
300,3.4227
400,3.3683
500,3.1883
600,3.1028
700,2.5303
800,2.3937
900,2.2465
1000,2.3446


✅ Model saved to: blink_gpt2_model


We then create multiple helper functions in an attempt to mimic the style of Blink 182.

In [14]:
def split_into_lyric_lines(text):
    """
    Split raw generated text into structured lyric lines.

    This function uses sentence-ending punctuation to break the text into
    lines. If a sentence is too long, it is further split into smaller
    chunks of about 10 words each.

    Parameters
    ----------
    text : str
        Raw text output from the language model.

    Returns
    -------
    list of str
        A list of cleaned and trimmed lyric lines.
    """
    # Split text on sentence boundaries (periods, question marks, exclamations)
    lines = re.split(r'(?<=[.!?])\s+', text)
    result = []

    # Further chunk long lines into smaller pieces
    for line in lines:
        if len(line.split()) > 14:
            chunks = line.split()
            for i in range(0, len(chunks), 10):
                result.append(" ".join(chunks[i:i+10]))
        else:
            result.append(line.strip())

    # Return non-empty lines, trimmed of whitespace
    return [l.strip() for l in result if l.strip()]


def clean_lyrics_lines(lines):
    """
    Remove any duplicated section headers from the list of lyrics.

    Parameters
    ----------
    lines : list of str
        List of lyric lines possibly containing section headers like [Verse].

    Returns
    -------
    list of str
        Cleaned lines without section header duplicates.
    """
    return [line for line in lines if not re.match(r'\[.*?\]', line.strip())]


def format_as_blink_song(raw_text, lines_per_block=(4, 6)):
    """
    Format raw text into a Blink-182 style song with section headers.

    The song is divided into typical Blink-style sections like Verse,
    Chorus, Bridge, and Outro, using a set number of lines per section.

    Parameters
    ----------
    raw_text : str
        Raw generated lyrics text from the language model.
    lines_per_block : tuple of int, optional
        Range (min, max) for how many lines appear in each section.

    Returns
    -------
    str
        Formatted song text with section headers and lyric lines.
    """
    lines = split_into_lyric_lines(raw_text)
    lines = clean_lyrics_lines(lines)

    # Define a typical Blink-182 song structure
    section_template = ["[Verse 1]", "[Chorus]", "[Verse 2]", "[Chorus]", "[Bridge]", "[Chorus]", "[Outro]"]
    output = []

    i = 0
    for section in section_template:
        block_size = random.randint(*lines_per_block)
        block_lines = lines[i:i+block_size]

        if not block_lines:
            break

        output.append(section)
        output.extend(block_lines)
        output.append("")  # Add empty line between sections
        i += block_size

    return "\n".join(output)


def trim_repetition(text, word="yeah", limit=8):
    """
    Trim long sequences of repeating words in generated text.

    This helps prevent runaway loops like 'yeah yeah yeah yeah...'.

    Parameters
    ----------
    text : str
        The raw generated lyrics text.
    word : str
        The word to limit repetitions of.
    limit : int
        Maximum number of allowed repeated instances.

    Returns
    -------
    str
        The cleaned text with repeated words trimmed.
    """
    pattern = r"\b(" + word + r"\s*){" + str(limit) + r",}"
    return re.sub(pattern, f"{word} " * 2, text, flags=re.IGNORECASE)


def generate_blink_song(prompt="i never thought that this would hurt", model_dir="blink_gpt2_model",
                        output_file="generated_blink_song.txt", max_length=300, temperature=0.9):
    """
    Generate a Blink-182 style song from a GPT-2 model.

    This function uses a fine-tuned GPT-2 model to generate lyrics based on a given
    prompt. It formats the output into a structured pop-punk song and saves it to a file.

    Parameters
    ----------
    prompt : str, optional
        Initial seed text to start the song.
    model_dir : str
        Directory where the fine-tuned model is stored.
    output_file : str
        File path to save the generated lyrics.
    max_length : int, optional
        Maximum token length for the generated output.
    temperature : float, optional
        Sampling temperature for creativity. Higher = more random.

    Returns
    -------
    None
    """
    # Load model and tokenizer from the specified directory
    tokenizer = GPT2Tokenizer.from_pretrained(model_dir)
    model = GPT2LMHeadModel.from_pretrained(model_dir)
    model.eval()

    if torch.cuda.is_available():
        model.to("cuda")

    # Tokenize the prompt
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"]
    attention_mask = inputs["attention_mask"]

    if torch.cuda.is_available():
        input_ids = input_ids.to("cuda")
        attention_mask = attention_mask.to("cuda")

    # Generate text from the model
    with torch.no_grad():
        output = model.generate(
            input_ids=input_ids,
            attention_mask=attention_mask,
            max_length=max_length,
            do_sample=True,
            top_k=50,
            top_p=0.95,
            temperature=temperature,
            repetition_penalty=1.2,
            num_return_sequences=1,
            pad_token_id=tokenizer.eos_token_id
        )

    # Decode and post-process the output
    raw_text = tokenizer.decode(output[0], skip_special_tokens=True)
    clean_text = trim_repetition(raw_text)
    formatted_song = format_as_blink_song(clean_text)

    # Save to file
    with open(output_file, "w", encoding="utf-8") as f:
        f.write(formatted_song)

    print(f"✅ Blink-182-style song saved to '{output_file}'\n")
    print("🎤 The song:\n")
    print(formatted_song)

We then generate two Blink 182 songs.

In [None]:
generate_blink_song(
    prompt="I spoke with my best friend about my new song",
    model_dir="blink_gpt2_model",
    output_file="my_blink_song.txt",
    max_length=500,
    temperature=0.8
)

✅ Blink-182-style song saved to 'my_blink_song.txt'

🎤 The song:

[Verse 1]
I spoke with my best friend about my new song
'cause it's called Young Thug and he just doesn't listen
to me anymore cause i hate his music now and
then i saw the lyrics on his album cover ooh
oh yeah thats what happened when we fell in love
but that shit never ends even if she falls in

[Chorus]
love you still keep hating me forever and afterand its
over there is nothing left for me to grow up
about it no future at all i donhope this won
change your heart only thing will be rightthis hurts like
hell every time i start crying why can people ignore
me because of my mistakes did you hear those words

[Verse 2]
spoken by everyone around me how could you not understand
id rather act different so instead i'm a jerk i'm
a punk rock star everything has gone wrong always trying
hard enough to impress girls who look down on me

[Chorus]
they try too hard to please me as well im
getting older than anyone else their brain

In [None]:
generate_blink_song(
    prompt="I like to hang out with my friends",
    model_dir="blink_gpt2_model",
    output_file="my_blink_song_2.txt",
    max_length=500,
    temperature=0.4
)

✅ Blink-182-style song saved to 'my_blink_song_2.txt'

🎤 The song:

[Verse 1]
I like to hang out with my friends who are
more prodigal than me so i could have some fun
at the risk of sounding rude towards you if you'd
tell me what you wanted to know about me i

[Chorus]
guess you best be on your way home from school
anyway cause that's where i grew up and now im
living life is better then it was when i was
in a bar fight maybe its just another night alone
for sure but thats all id ever hoped for oh

[Verse 2]
yeah well ill meet these guys once again why can't
we go date night let's make this last forever rather
than waste time together how did we get here teenage
haze tonight got caught urinating on his pants they dragged
him down to the edge of townshe said dont listen

[Chorus]
to me son i'm not listening to you no future
there will be no future there won every time i
grow up feeling scared of what you think of me
oh wait until i'm older then i'm too scared to

[Bridge]
move cause to

# Weird Al Song Generator

We now analyze songs from [Weird Al](https://en.wikipedia.org/wiki/%22Weird_Al%22_Yankovic).

In [None]:
weird_al_songs = [
    "Achy Breaky Song",
    "Airline Amy",
    "Albuquerque",
    "All About the Pentiums",
    "Amish Paradise",
    "Another One Rides the Bus",
    "Attack of the Radioactive Hamsters from a Planet near Mars",
    "Bedrock Anthem",
    "Biggest Ball of Twine in Minnesota",
    "Bob",
    "Bohemian Polka",
    "Buckingham Blues",
    "Callin' in Sick",
    "Canadian Idiot",
    "Cavity Search",
    "Christmas at Ground Zero",
    "Close but No Cigar",
    "CNR",
    "Craigslist",
    "Dare to Be Stupid",
    "Don't Download This Song",
    "Do I Creep You Out",
    "Eat It",
    "eBay",
    "Everything You Know Is Wrong",
    "Fat",
    "First World Problems",
    "Foil",
    "Frank's 2000' TV",
    "Genius in France",
    "Germs",
    "Good Enough for Now",
    "Grapefruit Diet",
    "Gump",
    "Handy",
    "Hardware Store",
    "Headline News",
    "I Can't Watch This",
    "I Love Rocky Road",
    "I Lost on Jeopardy",
    "I Remember Larry",
    "I Think I'm a Clone Now",
    "I Want a New Duck",
    "I'll Sue Ya",
    "If That Isn't Love",
    "Inactive",
    "It's All About the Pentiums",
    "Jackson Park Express",
    "Jerry Springer",
    "Jurassic Park",
    "King of Suede",
    "Lasagna",
    "Let Me Be Your Hog",
    "Like a Surgeon",
    "Livin' in the Fridge",
    "Living with a Hernia",
    "Midnight Star",
    "Money for Nothing/Beverly Hillbillies*",
    "Mr. Frump in the Iron Lung",
    "My Baby's in Love with Eddie Vedder",
    "My Bologna",
    "My Own Eyes",
    "Nature Trail to Hell",
    "One More Minute",
    "Pancreas",
    "Party at the Leper Colony",
    "Party in the CIA",
    "Perform This Way",
    "Polka Face",
    "Polka Your Eyes Out",
    "Polkas on 45",
    "Pretty Fly for a Rabbi",
    "Ricky",
    "She Drives Like Crazy",
    "Skipper Dan",
    "Smells Like Nirvana",
    "Spam",
    "Sports Song",
    "Stop Forwarding That Crap to Me",
    "Stuck in a Closet with Vanna White",
    "Such a Groovy Guy",
    "Taco Grande",
    "Tacky",
    "Talk Soup",
    "The Alternative Polka",
    "The Biggest Ball of Twine in Minnesota",
    "The Check's in the Mail",
    "The Hamilton Polka",
    "The Night Santa Went Crazy",
    "The Plumbing Song",
    "The Saga Begins",
    "The White Stuff",
    "This Is the Life",
    "Traffic Jam",
    "Trapped in the Drive-Thru",
    "Trigger Happy",
    "Twister",
    "UHF",
    "Velvet Elvis",
    "Virus Alert",
    "Wanna B Ur Lovr",
    "Weasel Stomping Day",
    "Whatever You Like",
    "When I Was Your Age",
    "White & Nerdy",
    "Why Does This Always Happen to Me?",
    "Word Crimes",
    "You Don't Love Me Anymore",
    "Young, Dumb & Ugly",
    "Your Horoscope for Today",
    "Yoda"
]

We also write the text and title of those songs into a text file.

In [None]:
output_path = os.path.abspath("weird_al_lyrics.txt")
print(f"Writing lyrics to: {output_path}")

with open("weird_al_lyrics.txt", "w", encoding="utf-8") as f:
    for title in weird_al_songs:
        print(f"Fetching: {title}")
        lyrics = get_lyrics("Weird Al Yankovic", title)

        if lyrics:
            f.write(f"### {title} ###\n{lyrics}\n\n")
            print(f"✅ Wrote: {title}")
        else:
            f.write(f"### {title} ###\nLyrics not found.\n\n")
            print(f"❌ Not found: {title}")
        time.sleep(1)

Writing lyrics to: /content/weird_al_lyrics.txt
Fetching: Achy Breaky Song
✅ Wrote: Achy Breaky Song
Fetching: Airline Amy
✅ Wrote: Airline Amy
Fetching: Albuquerque
✅ Wrote: Albuquerque
Fetching: All About the Pentiums
❌ Not found: All About the Pentiums
Fetching: Amish Paradise
✅ Wrote: Amish Paradise
Fetching: Another One Rides the Bus
✅ Wrote: Another One Rides the Bus
Fetching: Attack of the Radioactive Hamsters from a Planet near Mars
✅ Wrote: Attack of the Radioactive Hamsters from a Planet near Mars
Fetching: Bedrock Anthem
❌ Not found: Bedrock Anthem
Fetching: Biggest Ball of Twine in Minnesota
❌ Not found: Biggest Ball of Twine in Minnesota
Fetching: Bob
✅ Wrote: Bob
Fetching: Bohemian Polka
❌ Not found: Bohemian Polka
Fetching: Buckingham Blues
✅ Wrote: Buckingham Blues
Fetching: Callin' in Sick
❌ Not found: Callin' in Sick
Fetching: Canadian Idiot
✅ Wrote: Canadian Idiot
Fetching: Cavity Search
❌ Not found: Cavity Search
Fetching: Christmas at Ground Zero
✅ Wrote: Christmas

We use the previous defined functions to preprocessed the raw lyrics and use the Hugging Face transformers library to fine-tune GPT-2.

In [None]:
# Load the file content
with open("weird_al_lyrics.txt", "r", encoding="utf-8") as f:
    weird_al_lyrics_raw = f.read()

# Preprocess the text
weird_al_lyrics_clean = preprocess_lyrics(weird_al_lyrics_raw)

# Fine-tune GPT-2
fine_tune_gpt2(weird_al_lyrics_clean, "weird_al_gpt2_model")

Step,Training Loss
100,4.2159
200,4.2061
300,4.0978
400,3.7707
500,3.8004
600,3.6859
700,3.6799
800,3.7223
900,3.5897
1000,3.5786


✅ Model saved to: weird_al_gpt2_model


We will make some custom functions to generate Weird Al songs.

In [15]:
def format_as_weird_al_song(raw_text, lines_per_block=(3, 5)):
    """
    Format generated text into a Weird Al-style song structure.

    This function breaks the raw generated text into structured sections
    commonly found in Weird Al-style songs, such as "[Weird Fact]" or
    "[Punchline]", using line blocks of variable length.

    Parameters
    ----------
    raw_text : str
        The raw output from the language model.

    lines_per_block : tuple of int, optional
        A range (min, max) for how many lines appear in each song section.
        Defaults to (3, 5).

    Returns
    -------
    str
        The formatted Weird Al-style song with section headers.
    """
    # Split the text into structured lines and remove duplicate section labels
    lines = split_into_lyric_lines(raw_text)
    lines = clean_lyrics_lines(lines)

    # Weird Al-style section labels with comedic/narrative flair
    section_template = ["[Verse 1]", "[Verse 2]", "[Weird Fact]", "[Bridge]", "[Punchline]", "[Outro]"]

    output = []
    i = 0

    # Loop through each song section and assign a random number of lines
    for section in section_template:
        block_size = random.randint(*lines_per_block)
        block_lines = lines[i:i+block_size]

        if not block_lines:
            break

        output.append(section)
        output.extend(block_lines)
        output.append("")  # Add space between sections
        i += block_size

    return "\n".join(output)


def trim_repeated_words(text, limit=4):
    """
    Detect and trim excessive repeated short words in text.

    This function scans the input for any short word (2–5 characters) that
    repeats consecutively beyond a given limit, and reduces it to two instances.

    Parameters
    ----------
    text : str
        The text (typically model-generated lyrics) to be cleaned.

    limit : int, optional
        The number of consecutive repetitions allowed before trimming.
        Default is 4.

    Returns
    -------
    str
        Cleaned text with excessive short word repetitions trimmed.
    """
    # Match any 2–5 letter word repeated more than `limit` times
    pattern = r"\b(\w{2,5})(\s+\1){" + str(limit) + r",}\b"

    # Replace with just two repetitions
    return re.sub(pattern, lambda m: f"{m.group(1)} {m.group(1)}", text, flags=re.IGNORECASE)



def generate_weird_al_song(prompt="white and nerdy", model_dir="weird_al_gpt2_model",
                           output_file="generated_weird_al_song.txt", max_length=400, temperature=1.0):
    """
    Generate a Weird Al-style song using a fine-tuned GPT-2 model.

    This function takes an initial prompt and produces a formatted comedic
    song by generating text from a fine-tuned GPT-2 model and structuring
    the output into custom song sections.

    Parameters
    ----------
    prompt : str, optional
        Initial seed phrase to guide the song generation. Default is
        "white and nerdy".

    model_dir : str
        Path to the fine-tuned GPT-2 model directory.

    output_file : str
        Destination file path to save the generated song.

    max_length : int, optional
        Maximum number of tokens to generate. Default is 400.

    temperature : float, optional
        Sampling temperature for text generation. Higher values produce
        more randomness. Default is 1.0.

    Returns
    -------
    None
    """
    # Load model and tokenizer from fine-tuned directory
    tokenizer = GPT2Tokenizer.from_pretrained(model_dir)
    model = GPT2LMHeadModel.from_pretrained(model_dir)
    model.eval()

    # Move model to GPU if available
    if torch.cuda.is_available():
        model.to("cuda")

    # Tokenize the prompt input
    inputs = tokenizer(prompt, return_tensors="pt")
    input_ids = inputs["input_ids"]
    attention_mask = inputs["attention_mask"]

    if torch.cuda.is_available():
        input_ids = input_ids.to("cuda")
        attention_mask = attention_mask.to("cuda")

    # Generate song lyrics from the model
    with torch.no_grad():
        output = model.generate(
            input_ids=input_ids,
            attention_mask=attention_mask,
            max_length=max_length,
            do_sample=True,
            top_k=50,
            top_p=0.95,
            temperature=temperature,
            repetition_penalty=1.5,
            num_return_sequences=1,
            pad_token_id=tokenizer.eos_token_id
        )

    # Decode generated tokens into text and clean up any excessive repetition
    raw_text = tokenizer.decode(output[0], skip_special_tokens=True)
    clean_text = trim_repetition(raw_text, limit=1)
    clean_text = trim_repeated_words(clean_text, limit=1)
    formatted_song = format_as_weird_al_song(raw_text)

    # Save the song to file
    with open(output_file, "w", encoding="utf-8") as f:
        f.write(formatted_song)

    print(f"✅ Weird Al-style song saved to '{output_file}'\n")
    print("🎤 The song:\n")
    print(formatted_song)

And then generate three Weird Al songs.

In [None]:
generate_weird_al_song(
    prompt="I like earl grey tea and minesweeper",
    model_dir="weird_al_gpt2_model",
    output_file="my_weird_al_song.txt",
    max_length=500,
    temperature=0.5
)

✅ Weird Al-style song saved to 'my_weird_al_song.txt'

🎤 The song:

[Verse 1]
I like earl grey tea and minesweeper hamster bells ooh
hoo hoonie helicopters yah i like zelda's song 'cause i
love to dance like a leper colony on st trope

[Verse 2]
oh yeah ethel gone well heh heh sheh baby don't
you know it lucy too much time in the sun
when my family is playing jolly oldies game of ping

[Weird Fact]
pong at the beach ooow woo hoot wangee hey ho
hoo hooray for that eddie vedder mny buddy now you
go out with me haw tullys day and night like
your parents gonna kick us down the stairs ah right
before dawn today they wanna tear us apart oh no

[Bridge]
we can never play this game of ping pols again
let them rip our brains out into ribbons just so
badly they won' break their necks cause they found some
new way ta beat the game of hockey yeah what

[Punchline]
do ya want darlin' dingbat man back home runnin over
his head twice as hard as nails but he still
gettin' gold star for that very first

In [None]:
generate_weird_al_song(
    prompt="I never wear buttons but I got a cool hat",
    model_dir="weird_al_gpt2_model",
    output_file="my_weird_al_song_2.txt",
    max_length=500,
    temperature=0.7
)

✅ Weird Al-style song saved to 'my_weird_al_song_2.txt'

🎤 The song:

[Verse 1]
I never wear buttons but I got a cool hat
that said twine baller even if you didn't win the
grand prize just a year ago yeah it's in my
national library at china i guess that ties me to

[Verse 2]
nature trailblazer forever ooh hoo hoot hah hooray it makes
me kinda wanna runnin' yeshope somebody tries ta loot my
hard drive this week hey yoda ya know what else
am i supposed not to do lucy too luke don't
want no crummy music for you like some twisted band

[Weird Fact]
of madonna gettin' paid for their lousy shows mr frump
is an old man with a bad case eighty pounds
body odor and he talks smack dab dabba every night

[Bridge]
till dawn 'till dawn come day dawn finally i'm gonna
be a star witness on camera prove they're nazi all
right now remember when they put up those giant display
gates cause we lost our tails yesterday afternoon gavin' jack

[Punchline]
flash and everybody was jerry springering ricky huh w

In [None]:
generate_weird_al_song(
    prompt="Our stats are thoroughly impressive",
    model_dir="weird_al_gpt2_model",
    output_file="my_weird_al_song_3.txt",
    max_length=500,
    temperature=0.6
)

✅ Weird Al-style song saved to 'my_weird_al_song_3.txt'

🎤 The song:

[Verse 1]
Our stats are thoroughly impressive we're second only to mighty
hot sauce champion mizouse nazi favorito by a hefty 3x
gold mine worth of diamonds and an icecold keg from

[Verse 2]
our neighbor's pet store oh yeah you should check it
out don't forget that we've got tons of fun in
return for your hardy stare after dinner tonight at the
steakhouse forty miles wide wide wide wide wide wide hole

[Weird Fact]
in the ground next door is completely covered with dental
floss and cracked glass shards everywhere i go see 'em
playin' hockey on thirty percent off sale now right here
in this lousy windhampton hamlin state half as much money

[Bridge]
as any other business in town makes in excessofpremiums today
thanks so very highly ladenly diane why you gotta pay
for what little junk they sell us on ebay well
you know they really suck no matter how good their

[Punchline]
food grade cling wrap isn drivin' competition

# Green Day Song Generator

As a third example, we generate songs from the band [Green Day](https://en.wikipedia.org/wiki/Green_Day).

In [1]:
green_day_songs = [
    "21 Guns",
    "2000 Light Years Away",
    "80",
    "86",
    "American Idiot",
    "Android",
    "Are We the Waiting",
    "Armatage Shanks",
    "At the Library",
    "Basket Case",
    "Before the Lobotomy",
    "Blood, Sex and Booze",
    "Boulevard of Broken Dreams",
    "Brain Stew",
    "Brat",
    "Christie Road",
    "Church on Sunday",
    "Coming Clean",
    "Deadbeat Holiday",
    "Disappearing Boy",
    "Dominated Love Slave",
    "Don't Leave Me",
    "East Jesus Nowhere",
    "Emenius Sleepus",
    "Extraordinary Girl",
    "F.O.D.",
    "Fashion Victim",
    "Favorite Son",
    "Geek Stink Breath",
    "Give Me Novacaine",
    "Good Riddance (Time of Your Life)",
    "Green Day",
    "Haushinka",
    "Hitchin' a Ride",
    "Hold On",
    "Holiday",
    "Homecoming",
    "Horseshoes and Handgrenades",
    "I Fought the Law",
    "I Want to Be Alone",
    "I Want to Be on TV",
    "In the End",
    "J.A.R. (Jason Andrew Relva)",
    "Jackass",
    "Jaded",
    "Jesus of Suburbia",
    "King for a Day",
    "Know Your Enemy",
    "Last Night on Earth",
    "Last of the American Girls",
    "Letterbomb",
    "Longview",
    "Macy's Day Parade",
    "Maria",
    "Minority",
    "Misery",
    "Nice Guys Finish Last",
    "No One Knows",
    "Nuclear Family",
    "One for the Razorbacks",
    "Only of You",
    "Panic Song",
    "Paper Lanterns",
    "Peacemaker",
    "Platypus (I Hate You)",
    "Poprocks & Coke",
    "Prosthetic Head",
    "Pulling Teeth",
    "Redundant",
    "Restless Heart Syndrome",
    "Revolution Radio",
    "Road to Acceptance",
    "Scattered",
    "She",
    "St. Jimmy",
    "Still Breathing",
    "Stuck with Me",
    "Stuart and the Ave.",
    "Take Back",
    "The Forgotten",
    "The Grouch",
    "The Judge's Daughter",
    "The One I Want",
    "The Static Age",
    "The Time of Your Life (Good Riddance)",
    "Tight Wad Hill",
    "Troublemaker",
    "Uptight",
    "Waiting",
    "Wake Me Up When September Ends",
    "Walking Alone",
    "Walking Contradiction",
    "Warning",
    "Welcome to Paradise",
    "Westbound Sign",
    "Whatsername",
    "When I Come Around",
    "Who Wrote Holden Caulfield?",
    "Why Do You Want Him?",
    "Worry Rock",
    "X-Kid",
    "You Lied"
]

We then write the text and title of those songs into a text file.

In [9]:
output_path = os.path.abspath("green_day_lyrics.txt")
print(f"Writing lyrics to: {output_path}")

with open("green_day_lyrics.txt", "w", encoding="utf-8") as f:
    for title in green_day_songs:
        print(f"Fetching: {title}")
        lyrics = get_lyrics("Green Day", title)

        if lyrics:
            f.write(f"### {title} ###\n{lyrics}\n\n")
            print(f"✅ Wrote: {title}")
        else:
            f.write(f"### {title} ###\nLyrics not found.\n\n")
            print(f"❌ Not found: {title}")
        time.sleep(1)

Writing lyrics to: /content/green_day_lyrics.txt
Fetching: 21 Guns
✅ Wrote: 21 Guns
Fetching: 2000 Light Years Away
✅ Wrote: 2000 Light Years Away
Fetching: 80
✅ Wrote: 80
Fetching: 86
✅ Wrote: 86
Fetching: American Idiot
✅ Wrote: American Idiot
Fetching: Android
✅ Wrote: Android
Fetching: Are We the Waiting
✅ Wrote: Are We the Waiting
Fetching: Armatage Shanks
✅ Wrote: Armatage Shanks
Fetching: At the Library
✅ Wrote: At the Library
Fetching: Basket Case
✅ Wrote: Basket Case
Fetching: Before the Lobotomy
✅ Wrote: Before the Lobotomy
Fetching: Blood, Sex and Booze
❌ Not found: Blood, Sex and Booze
Fetching: Boulevard of Broken Dreams
✅ Wrote: Boulevard of Broken Dreams
Fetching: Brain Stew
✅ Wrote: Brain Stew
Fetching: Brat
✅ Wrote: Brat
Fetching: Christie Road
✅ Wrote: Christie Road
Fetching: Church on Sunday
✅ Wrote: Church on Sunday
Fetching: Coming Clean
✅ Wrote: Coming Clean
Fetching: Deadbeat Holiday
✅ Wrote: Deadbeat Holiday
Fetching: Disappearing Boy
✅ Wrote: Disappearing Boy
F

We use the previous defined functions to preprocessed the raw lyrics and use the Hugging Face transformers library to fine-tune GPT-2.

In [11]:
# Load the file content
with open("green_day_lyrics.txt", "r", encoding="utf-8") as f:
    green_day_lyrics_raw = f.read()

# Preprocess the text
green_day_lyrics_clean = preprocess_lyrics(green_day_lyrics_raw)

# Fine-tune GPT-2
fine_tune_gpt2(green_day_lyrics_clean, "green_day_gpt2_model")

Step,Training Loss
100,4.0949
200,3.7018
300,3.5175
400,3.4868
500,3.2147
600,3.2617
700,2.6859
800,2.3879
900,2.3455
1000,2.3617


✅ Model saved to: green_day_gpt2_model


As before, we make some custom functions to generate Green Day songs.

In [24]:
def format_as_green_day_song(raw_text, lines_per_block=(4, 6)):
    """
    Format generated text into a structured Green Day-style song.

    Cleans and splits raw model output into sections mimicking a Green Day track:
    Verse, Chorus, Bridge, Outro. Automatically trims broken phrases and uses
    variable-length blocks to mimic human-written structure.

    Parameters
    ----------
    raw_text : str
        Unformatted text output from the language model.

    lines_per_block : tuple of int, optional
        The (min, max) range of line counts per section block. Default is (4, 6).

    Returns
    -------
    str
        The cleaned and formatted lyrics with labeled song sections.
    """
    lines = split_into_lyric_lines(raw_text)
    lines = clean_lyrics_lines(lines)

    # Remove junk lines (e.g., single-word lines or malformed tokens)
    lines = [
        l for l in lines
        if len(l.split()) >= 3 and not re.search(r"[^\w\s']", l)
    ]

    section_template = [
        "[Verse 1]", "[Chorus]", "[Verse 2]",
        "[Chorus]", "[Bridge]", "[Chorus]", "[Outro]"
    ]

    output = []
    i = 0

    for section in section_template:
        block_size = random.randint(*lines_per_block)
        block_lines = lines[i:i + block_size]

        # If there aren’t enough lines left, finish early
        if len(block_lines) < 2:
            break

        output.append(section)
        output.extend(block_lines)
        output.append("")  # Add blank line between sections
        i += block_size

    return "\n".join(output)

def generate_green_day_song(prompt="i walk this lonely road", model_dir="green_day_gpt2_model",
                          output_file="generated_green_day_song.txt", max_length=400, temperature=0.95):
  """
  Generate a Green Day-style song using a fine-tuned GPT-2 model.

  This function takes a prompt and generates structured lyrics in the
  style of Green Day. The lyrics are formatted into song sections and
  repetitive phrases are trimmed.

  Parameters
  ----------
  prompt : str, optional
      A seed phrase to guide the song generation. Default is
      "i walk this lonely road".

  model_dir : str
      Directory path to the fine-tuned Green Day GPT-2 model.

  output_file : str
      File path where the generated song will be saved.

  max_length : int, optional
      Maximum number of tokens to generate. Default is 400.

  temperature : float, optional
      Sampling temperature. Higher values result in more randomness.
      Default is 0.95.

  Returns
  -------
  None
  """
  # Load the tokenizer and model
  tokenizer = GPT2Tokenizer.from_pretrained(model_dir)
  model = GPT2LMHeadModel.from_pretrained(model_dir)
  model.eval()

  if torch.cuda.is_available():
      model.to("cuda")

  # Tokenize input prompt
  inputs = tokenizer(prompt, return_tensors="pt")
  input_ids = inputs["input_ids"]
  attention_mask = inputs["attention_mask"]

  if torch.cuda.is_available():
      input_ids = input_ids.to("cuda")
      attention_mask = attention_mask.to("cuda")

  # Generate lyrics
  with torch.no_grad():
      output = model.generate(
          input_ids=input_ids,
          attention_mask=attention_mask,
          max_length=max_length,
          do_sample=True,
          top_k=50,
          top_p=0.95,
          temperature=temperature,
          repetition_penalty=1.5,
          num_return_sequences=1,
          pad_token_id=tokenizer.eos_token_id
      )

  # Clean up the output
  raw_text = tokenizer.decode(output[0], skip_special_tokens=True)
  clean_text = trim_repetition(raw_text, limit=1)
  clean_text = trim_repeated_words(clean_text, limit=1)
  formatted_song = format_as_green_day_song(clean_text)

  # Save to file
  with open(output_file, "w", encoding="utf-8") as f:
      f.write(formatted_song)

  print(f"✅ Green Day-style song saved to '{output_file}'\n")
  print("🎤 The song:\n")
  print(formatted_song)

We then create two new Green Day songs.

In [34]:
generate_green_day_song(
    prompt="In the summertime",
    model_dir="green_day_gpt2_model",
    output_file="my_green_day_song.txt",
    max_length=500,
    temperature=0.85
)

✅ Green Day-style song saved to 'my_green_day_song.txt'

🎤 The song:

[Verse 1]
In the summertime revolution dance on eggshells in my old
stomping ground yet i can't seem to get up and
walk away from you it seems like forever ago just
for a bit of fun again daddy threw me out
here's some loose ends so don your not in shape

[Chorus]
now momma ain gotta clean house hey dad got somethin'
going on but no one likes ya mother damn well
did because she love wasin' around before he went crazy
yeah yeah what do you think about momma getting mad

[Verse 2]
at daddy whatsername is really all that i need right
now isnthe last moment thats gonna happen cause everything belongs
toyou alright boy am i retarded son dont know how
man made this decision okay maybe if you wanna leave
me alone then why ohwhy should anyone care baby look
into my eyesight as they snotfilled their brains up with

[Chorus]
lies once again young man learned to live by his
own rules good riddance time new york city lights come

In [27]:
generate_green_day_song(
    prompt="it is no longer cold outside",
    model_dir="green_day_gpt2_model",
    output_file="my_green_day_song_2.txt",
    max_length=500,
    temperature=0.4
)

✅ Green Day-style song saved to 'my_green_day_song_2.txt'

🎤 The song:

[Verse 1]
it is no longer cold outside and it's now your
favorite haunt with its resident assassin known as the assassin
queen bee st jimmy hey yeah yeah i'm a nag
shitface yeah yeah i walk alone sometimes even get drunk
on my knees before going away maybe hitch a ride
to someplace nice gettin' hot again daddy ain't got none

[Chorus]
right don you're not so clean looking but he's still
getting lazy oh well thats what we used to call
our lives just like dad was hey whatsername whatsername whatsername
whatsitter whatsittingername whatsingle allthepeople that you know isnt really his

[Verse 2]
name dont you exist because of you nobody knows where
maria went wrong hey she disappeared in 2000 miles an
hour ago why didshe go nowhere fast enough for christie
roadster hey lookin sharp man someting isgethernowhere something else entirely
different now pathetically left without any trace due diligence nothing

[Chorus]
new

# Taylor Swift Songs

We will now create some Taylor Swift songs through our own GPT model. Let's define a GPT class using all of the material we learned so far.

In [None]:
class GPT(nn.Module):
    """
    A GPT-like transformer model.

    Parameters
    ----------
    vocab_size : int
        The size of the vocabulary.
    context_length : int
        The length of the input context.
    model_dim : int
        The dimensionality of the model.
    num_blocks : int
        The number of transformer blocks.
    num_heads : int
        The number of attention heads.
    """
    class TransformerBlock(nn.Module):
        """
        A single transformer block consisting of multi-headed self-attention
        and a feedforward neural network.

        Parameters
        ----------
        model_dim : int
            The dimensionality of the model.
        num_heads : int
            The number of attention heads.
        """
        class MultiHeadedSelfAttention(nn.Module):
            """
            Multi-headed self-attention mechanism.

            Parameters
            ----------
            model_dim : int
                The dimensionality of the model.
            num_heads : int
                The number of attention heads.
            """
            class SingleHeadAttention(nn.Module):
                """
                Single head attention mechanism.

                Parameters
                ----------
                model_dim : int
                    The dimensionality of the model.
                head_size : int
                    The size of each attention head.
                """
                def __init__(self, model_dim: int, head_size: int):
                    super().__init__()
                    self.key_layer = nn.Linear(model_dim, head_size, bias=False)
                    self.query_layer = nn.Linear(model_dim, head_size, bias=False)
                    self.value_layer = nn.Linear(model_dim, head_size, bias=False)

                def forward(self, embedded):
                    """
                    Forward pass for single-head self-attention.

                    Parameters
                    ----------
                    embedded : torch.Tensor
                        The input tensor of shape (batch_size, context_length, model_dim).

                    Returns
                    -------
                    torch.Tensor
                        The attention-weighted values.
                    """
                    k = self.key_layer(embedded)
                    q = self.query_layer(embedded)
                    v = self.value_layer(embedded)

                    scores = q @ torch.transpose(k, 1, 2)  # Compute attention scores
                    context_length, attention_dim = k.shape[1], k.shape[2]
                    scores = scores / (attention_dim ** 0.5)  # Scale scores

                    # Create a lower triangular mask for causal attention
                    lower_triangular = torch.tril(torch.ones(context_length, context_length))
                    mask = (lower_triangular == 0).to(device)
                    scores = scores.masked_fill(mask, float('-inf'))
                    scores = nn.functional.softmax(scores, dim=2)

                    return scores @ v  # Weighted sum of values

            def __init__(self, model_dim: int, num_heads: int):
                super().__init__()
                self.attention_heads = nn.ModuleList()
                for _ in range(num_heads):
                    self.attention_heads.append(self.SingleHeadAttention(model_dim, model_dim // num_heads))
                self.compute = nn.Linear(model_dim, model_dim)
                self.dropout = nn.Dropout(0.2)

            def forward(self, embedded):
                """
                Forward pass for multi-headed self-attention.

                Parameters
                ----------
                embedded : torch.Tensor
                    The input tensor of shape (batch_size, context_length, model_dim).

                Returns
                -------
                torch.Tensor
                    The output tensor after multi-headed attention.
                """
                head_outputs = [head(embedded) for head in self.attention_heads]
                concatenated = torch.cat(head_outputs, dim=2)
                return self.dropout(self.compute(concatenated))

        class VanillaNeuralNetwork(nn.Module):
            """
            A simple feedforward neural network used within the transformer block.

            Parameters
            ----------
            model_dim : int
                The dimensionality of the model.
            """
            def __init__(self, model_dim: int):
                super().__init__()
                self.first_linear_layer = nn.Linear(model_dim, model_dim * 4)
                self.relu = nn.ReLU()
                self.second_linear_layer = nn.Linear(model_dim * 4, model_dim)
                self.dropout = nn.Dropout(0.2)

            def forward(self, x):
                """
                Forward pass for the feedforward network.

                Parameters
                ----------
                x : torch.Tensor
                    Input tensor.

                Returns
                -------
                torch.Tensor
                    Output tensor.
                """
                return self.dropout(self.second_linear_layer(self.relu(self.first_linear_layer(x))))

        def __init__(self, model_dim: int, num_heads: int):
            super().__init__()
            self.mhsa = self.MultiHeadedSelfAttention(model_dim, num_heads)
            self.vanilla_nn = self.VanillaNeuralNetwork(model_dim)
            self.layer_norm_one = nn.LayerNorm(model_dim)
            self.layer_norm_two = nn.LayerNorm(model_dim)

        def forward(self, embedded):
            """
            Forward pass for the transformer block.

            Parameters
            ----------
            embedded : torch.Tensor
                Input tensor.

            Returns
            -------
            torch.Tensor
                Processed tensor.
            """
            embedded = embedded + self.mhsa(self.layer_norm_one(embedded))  # Skip connection
            embedded = embedded + self.vanilla_nn(self.layer_norm_two(embedded))  # Another skip connection
            return embedded

    def __init__(self, vocab_size: int, context_length: int, model_dim: int, num_blocks: int, num_heads: int):
        super().__init__()
        self.token_embedding = nn.Embedding(vocab_size, model_dim)
        self.pos_embedding = nn.Embedding(context_length, model_dim)
        self.transformer_blocks = nn.Sequential(*[self.TransformerBlock(model_dim, num_heads) for _ in range(num_blocks)])
        self.layer_norm_three = nn.LayerNorm(model_dim)
        self.vocab_projection = nn.Linear(model_dim, vocab_size)

    def forward(self, context):
        """
        Forward pass for the GPT model.

        Parameters
        ----------
        context : torch.Tensor
            Input tensor of token indices.

        Returns
        -------
        torch.Tensor
            The logits for the next token prediction.
        """
        embedded = self.token_embedding(context)
        context_length = context.shape[1]
        positions = torch.arange(context_length).to(device)
        embedded = embedded + self.pos_embedding(positions)

        raw_output = self.vocab_projection(self.layer_norm_three(self.transformer_blocks(embedded)))
        return raw_output

We then load and preprocess lyric data.

In [None]:
# Load the file content
with open('/content/TaylorLyrics.txt', 'r', encoding='utf-8') as f:
    lyrics = f.read()

Next, we a create character-level vocabulary.

In [None]:
unique_chars = sorted(set(lyrics))
char_to_int = {ch: i for i, ch in enumerate(unique_chars)}
int_to_char = {i: ch for ch, i in char_to_int.items()}

And encode the lyrics data.

In [None]:
encoded_lyrics = [char_to_int[ch] for ch in lyrics]

We also prepare input-target sequences.

In [None]:
def create_sequences(data, seq_length):
    """Generates input-target sequences from a dataset for training.

    Parameters
    ----------
    data : list or numpy.ndarray
        Input data from which sequences are generated.
    seq_length : int
        Length of each input sequence.

    Returns
    -------
    tuple of torch.Tensor
        A tuple (inputs, targets), where:
        - inputs: Tensor of shape (num_samples, seq_length) representing input sequences.
        - targets: Tensor of shape (num_samples, seq_length) representing target sequences.

    Notes
    -----
    - Each input sequence consists of `seq_length` consecutive elements from the input data.
    - Each target sequence is the corresponding next `seq_length` elements, offset by one.
    - Useful for training sequence models like RNNs or Transformers.
    """
    inputs, targets = [], []  # Initialize lists to store input and target sequences.

    # Iterate through the data to extract sequences.
    for i in range(len(data) - seq_length):
        inputs.append(data[i:i + seq_length])  # Input sequence of length `seq_length`.
        targets.append(data[i + 1:i + seq_length + 1])  # Corresponding target sequence.

    # Convert lists to tensors for use with PyTorch models.
    return torch.tensor(inputs), torch.tensor(targets)

seq_length = 128
X_train, y_train = create_sequences(encoded_lyrics, seq_length)

In [None]:
print(f' Number of sequences in training data: {len(X_train)}')

 Number of sequences in training data: 278642


And train the model by using a subset of all available song text.

In [None]:
subset_size = 10_000
X_train, y_train = X_train[:subset_size], y_train[:subset_size]

# Batch size and training epochs
batch_size = 128
epochs = 100

# Initialize model and optimizer
model = GPT(vocab_size=len(unique_chars), context_length=128, model_dim=252, num_blocks=6, num_heads=6).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=3e-4)
criterion = nn.CrossEntropyLoss()

# AMP API
scaler = torch.amp.GradScaler()

# Training loop with batching and mixed precision
for epoch in range(epochs):
    model.train()
    total_loss = 0

    for i in range(0, len(X_train), batch_size):
        context = X_train[i:i + batch_size].to(device)
        target = y_train[i:i + batch_size].to(device)

        optimizer.zero_grad()

        with torch.amp.autocast("cuda"):
            output = model(context)
            loss = criterion(output.view(-1, len(unique_chars)), target.view(-1))

        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()

        total_loss += loss.item()

    print(f"Epoch {epoch + 1}, Loss: {total_loss / (len(X_train) // batch_size)}")

# Save the improved model
torch.save(model.state_dict(), 'taylor_swift_tuned_weights.pt')

Epoch 1, Loss: 2.8765149208215566
Epoch 2, Loss: 2.5246675381293664
Epoch 3, Loss: 2.461440086364746
Epoch 4, Loss: 2.4223521397664
Epoch 5, Loss: 2.3872444262871375
Epoch 6, Loss: 2.344626871439127
Epoch 7, Loss: 2.2772603310071506
Epoch 8, Loss: 2.1744229961664248
Epoch 9, Loss: 2.058120913994618
Epoch 10, Loss: 1.9431725113819807
Epoch 11, Loss: 1.827827050135686
Epoch 12, Loss: 1.7117471175316052
Epoch 13, Loss: 1.5888461669286091
Epoch 14, Loss: 1.4607404898374508
Epoch 15, Loss: 1.3378432866854546
Epoch 16, Loss: 1.2253655997606425
Epoch 17, Loss: 1.1202915494258587
Epoch 18, Loss: 1.0049482954618258
Epoch 19, Loss: 0.8862171387061094
Epoch 20, Loss: 0.7867188285558652
Epoch 21, Loss: 0.6935804948592798
Epoch 22, Loss: 0.6094872997357295
Epoch 23, Loss: 0.5407178201354467
Epoch 24, Loss: 0.49186816868873745
Epoch 25, Loss: 0.43516856279128635
Epoch 26, Loss: 0.37898659591491407
Epoch 27, Loss: 0.33241895825052875
Epoch 28, Loss: 0.29611976482929325
Epoch 29, Loss: 0.2567835998458

Now we will create a generate lyrics function and use it to write Taylor Swift songs.

In [None]:
def generate_lyrics(model, new_chars, context, context_length, int_to_char, temperature=1.0):
    """Generates lyrics using a trained character-level language model.

    Parameters
    ----------
    model : torch.nn.Module
        The trained language model for character generation.
    new_chars : int
        Number of new characters to generate.
    context : torch.Tensor
        Input tensor representing the initial context (shape: [1, sequence_length]).
    context_length : int
        Maximum length of context to retain during generation.
    int_to_char : dict
        Mapping from integer indices to characters.
    temperature : float, optional
        Sampling temperature controlling randomness. Lower values make predictions more deterministic,
        while higher values increase diversity. Default is 1.0.

    Returns
    -------
    str
        Generated lyrics as a string.

    Notes
    -----
    - The function performs autoregressive generation by sampling one character at a time.
    - Uses softmax with a temperature parameter to control the randomness of predictions.
    - Context is truncated to `context_length` to prevent memory overflow.
    """
    model.eval()  # Set the model to evaluation mode (disables dropout, etc.).
    res = []  # Store generated characters.

    with torch.no_grad():  # Disable gradient computation for faster inference.
        for _ in range(new_chars):
            # Keep only the last `context_length` characters.
            if context.shape[1] > context_length:
                context = context[:, -context_length:]

            # Forward pass: generate model output (logits).
            output = model(context)

            # Extract logits for the last time step and apply temperature scaling.
            logits = output[:, -1, :] / temperature

            # Convert logits to probabilities using softmax.
            probs = torch.nn.functional.softmax(logits, dim=-1)

            # Sample the next character index from the probability distribution.
            next_char = torch.multinomial(probs, 1)

            # Append the new character to the context.
            context = torch.cat((context, next_char), dim=-1)

            # Map the character index to its corresponding character and store it.
            res.append(int_to_char[next_char.item()])

    return ''.join(res)  # Return the generated lyrics as a string.

In [None]:
seed_text = "On a park bench"
start_context = torch.tensor([[char_to_int[c] for c in seed_text]], dtype=torch.int64).to(device)

new_lyrics = generate_lyrics(model, new_chars=2000, context=start_context, context_length=128, int_to_char=int_to_char, temperature=1.0)
print(new_lyrics)

 he's all the song
The the scars you ming my heart on my sleeve
Feeling lucky today, got the sunshine
Could you tell me what more do I need
And tomorrow's just a mystery, oh yeah
But that's ok

[Chorus]

Maybe I'm just a girl on a mission
But I'm ready to fly

I'm alone, on my own, and that's all I know
I'll be strong, I'll be wrong, oh but life goes on
Oh, I'm just a girl, trying to find a place in this world

Got the radio on, my old blue jeans
And I'm wearing my heart on my sleeve
Feeling lucky today, got the sunshine
Could you tell me what more do I need
And tomorrow's just a mystery, oh yeah
But that's ok

[Chorus]

Maybe I'm just a girl on a mission
But I'm ready to fly

I'm alone, on my own, and that's all I know
I'll be strong, I'll be wrong, oh but life goes on
Oh I'm alone, on my own, and that's all I know
Oh I'm just a girl, trying to find a place in this world

Oh I'm just a girl
Oh I'm just a girl, oh, oh,
Oh I'm just a girl

You have a way of coming easily to me
And when 

In [None]:
seed_text = "In the middle of town"
start_context = torch.tensor([[char_to_int[c] for c in seed_text]], dtype=torch.int64).to(device)

new_lyrics = generate_lyrics(model, new_chars=4000, context=start_context, context_length=128, int_to_char=int_to_char, temperature=0.8)
print(new_lyrics)

, and that's all I know
Oh I'm just a girl, trying to find a place in this world

Oh I'm just a girl
Oh I'm just a girl, oh, oh,
Oh I'm just a girl

You have a way of coming easily to me
And when you take, you take the very best of me
So I start a fight 'cause I need to feel something
And you do what you want 'cause I'm not what you wanted

Oh, what a shame, what a rainy ending given to a perfect day
Just walk away, no use defending words that you will never say
And now that I'm sitting here thinking it through
I've never been anywhere cold as you

You put up walls and paint them all a shade of gray
And I stood there loving you and wished them all away
And you come away with a great little story
Of a mess of a dreamer with the nerve to adore you

Oh, what a shame, what a rainy ending given to a perfect day
So, just walk away, no use defending words that you will never say
And now that I'm sitting here thinking it through
I've never been anywhere cold as you

You put up walls and paint 

In [None]:
seed_text = "I found the perfect pair of shoes"
start_context = torch.tensor([[char_to_int[c] for c in seed_text]], dtype=torch.int64).to(device)

new_lyrics = generate_lyrics(model, new_chars=4000, context=start_context, context_length=128, int_to_char=int_to_char, temperature=0.8)
print(new_lyrics)

s
And I'm just a girl, oh, oh,
Oh I'm just a girl

You have a way of coming easily to me
And when you take, you take the very best of me
So I start a fight 'cause I need to feel something
And you do what you want 'cause I'm not what you wanted

Oh, what a shame, what a rainy ending given to a perfect day
Just walk away, no use defending words that you will never say
And now that I'm sitting here thinking it through
I've never been anywhere cold as you

You put up walls and paint them all a shade of gray
And I stood there loving you and wished them all away
And you come away with a great little story
Of a mess of a dreamer with the nerve to adore you

Oh, what a shame, what a rainy ending given to a perfect day
So, just walk away, no use defending words that you will never say
And now that I'm sitting here thinking it through
I've never been anywhere cold as you

You put up walls and paint them all a shade of gray
And I stood there loving you and wished them all away
And you come away w