# Lyric Generation

The code in this notebook generates the lyrics using the model from the previous notebooks. The code works by receiving a user input and then generating a continuation of the user's prompt.

## Load the model

Import the necessary libraries.

In [1]:
import torch
import numpy as np
from torch import nn
from transformers import GPT2Tokenizer, GPT2Config, GPT2Model, GPT2PreTrainedModel
from torch.optim import AdamW
from datasets import load_dataset
from tqdm import tqdm
from torch.nn import functional as F
import pandas as pd

device = 'cuda' if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_built() else 'cpu'

  from .autonotebook import tqdm as notebook_tqdm


Load the model and define the generation function.

In [2]:
class GPT2_Model(GPT2PreTrainedModel):

    def __init__(self, config):

        super().__init__(config)

        self.transformer = GPT2Model.from_pretrained('gpt2')
        tokenizer = GPT2Tokenizer.from_pretrained('gpt2', pad_token='<|pad|>')

        # this is necessary since we add a new unique token for pad_token
        self.transformer.resize_token_embeddings(len(tokenizer))

        self.lm_head = nn.Linear(config.n_embd, len(tokenizer), bias=False)

    def forward(self, input_ids, attention_mask=None, token_type_ids=None):

        x = self.transformer(input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)[0]
        x = self.lm_head(x)

        return x

In [3]:
#Load model
configuration = GPT2Config()
gpt_model = GPT2_Model(configuration).to(device)
gpt_model.load_state_dict(torch.load('GPT-Trained-Model'))
gpt_model.eval()
#Load tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2', pad_token='<|pad|>') 

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Define the generation function. The model samples probable continuations to the user's input. You can play with the top_k and top_p parameters to change the outputs of the model. Lowering p and increasing k makes it so the model outputs more "out there" word combinations.

In [4]:
#Dedfine generation function
def generate(idx, max_new_tokens, context_size, tokenizer, model, top_k=10, top_p=0.95):

        for _ in range(max_new_tokens):
            if idx[:,-1].item() != tokenizer.encode(tokenizer.eos_token)[0]:
                # crop idx to the last block_size tokens
                idx_cond = idx[:, -context_size:]
                # get the predictions
                logits = model(idx_cond)
                # focus only on the last time step
                logits = logits[:, -1, :]
                # apply softmax to get probabilities
                probs = F.softmax(logits, dim=-1)
                # sort probabilities in descending order
                sorted_probs, indices = torch.sort(probs, descending=True)
                # compute cumsum of probabilities
                probs_cumsum = torch.cumsum(sorted_probs, dim=1)
                # choose only top_p tokens
                sorted_probs, indices = sorted_probs[:, :probs_cumsum[[probs_cumsum < top_p]].size()[0] + 1], indices[:, :probs_cumsum[[probs_cumsum < top_p]].size()[0] +1]
                # choose only top_k tokens
                sorted_probs, indices = sorted_probs[:,:top_k], indices[:,:top_k]
                # sample from the distribution
                sorted_probs = F.softmax(sorted_probs, dim=-1)
                idx_next = indices[:, torch.multinomial(sorted_probs, num_samples=1)].squeeze(0)
                # append new token ids
                idx = torch.cat((idx, idx_next), dim=1)
            else:
                break

        return idx

## Lyric Generation

After all the preliminaries, it is time to generate some lyrics! You can change the prompt variable to a string that the model will try to continue.

In [5]:
gpt_model.eval()

prompt = "i stand up"
generated = torch.tensor(tokenizer.encode(prompt)).unsqueeze(0)
generated = generated.to(device)

sample_outputs = generate(generated,
                         max_new_tokens=200,
                         context_size=400,
                         tokenizer=tokenizer,
                         model=gpt_model,
                         top_k=10,
                         top_p=0.95)

for i, sample_output in enumerate(sample_outputs):
    print(f"{tokenizer.decode(sample_output, skip_special_tokens=True)}")

i stand up. cham spreadrefart.2 hill country woman pind, p sit up we fair there steady singin' but no funky park folks walk keep tonight avenue blues i give myself long've it been blond hair high hair floet john pap ah man i dream good boy i wish i was cold ain't got the heart i dream good boy i wish i was cold i'm old kav're i wish i was cold i'm old kbetter live christ i wish i was cold chalt live christ i wish i was coldy rock i wish i was cool listen me wallc differentе so much  [ blow who said destroy yourself do not move open go open anyone dare fleetwood mac liveget tickets as low as $38[ blow who say goodbye away live frank liv freeze that seems y'all make yourself good boy i wish i was cold i'm old k dollar, makes me world american i wish i was cold i'm old k accept angry choices i wish i was


If you care about song structure, you can start the prompt with _[verse]_ (or _[chorus]_ if you are feeling more adventurous) so that the model has a higher chance of outputting more structured lyrics.

In [6]:
gpt_model.eval()

prompt = "[verse] i stand up"
generated = torch.tensor(tokenizer.encode(prompt)).unsqueeze(0)
generated = generated.to(device)

sample_outputs = generate(generated,
                         max_new_tokens=200,
                         context_size=400,
                         tokenizer=tokenizer,
                         model=gpt_model,
                         top_k=10,
                         top_p=0.95)

for i, sample_output in enumerate(sample_outputs):
    print(f"{tokenizer.decode(sample_output, skip_special_tokens=True)}")

[verse] i stand up early 'cause i like living in or around the garden gin for my raincoat no more it's at home in a sunday's air because i ain't no fighter a cousin he came home on a jethead that's fine i hear him smoky 'cause he can't control his music  [verse] he grown up, woke up in the pan though he won't light the horse, he's no good although he's having a lot to confess he's no beast, he's sorry to heal he comes home, feeling like japanese water he's laughing in the ice because he can't control his music  [chorus] he'll say he's cute and he'll show you plenty and he won't cry when you leave his door on a fling i might picking, gonna watch you run  [verse] some folks get up singing, some folks get in south rock there's no sensation till you reach the border until you've much created the fruit lable thistle


If you care about formatting, you can output your generated songs using the following function. Note that this only works with songs that have a song structure.

In [7]:
import re

#Define capitalization functions
def custom_capitalize(match):
    return match.group(1).capitalize()

def capitalize_string(input_string):
    # Capitalize every first letter after "\n\n "
    result = re.sub(r'\n\n\s*([a-zA-Z])', lambda x: '\n\n' + x.group(1).upper(), input_string)
    # Capitalize every first letter in every word inside brackets ([ ])
    result = re.sub(r'\[([^\]]*)\]', lambda x: '[' + ' '.join(word.capitalize() for word in x.group(1).split()) + ']', result)
    # Capitalize every instance of the letter i by itself
    result = re.sub(r'\bi\b', 'I', result)
    return result

#Format the string
def format_string(input_string):
    inside_brackets = False
    result = ''.join([char + ('\n\n' if char == ']' else '') for char in input_string.replace('\n', '')])
    result = result.replace('[', '\n\n[')
    #Capitalize for good measure
    result = capitalize_string(result)
    return result

In [8]:
lyric = tokenizer.decode(sample_output, skip_special_tokens=True)
lyric = format_string(lyric)
print(lyric)



[Verse]

I stand up early 'cause I like living in or around the garden gin for my raincoat no more it's at home in a sunday's air because I ain't no fighter a cousin he came home on a jethead that's fine I hear him smoky 'cause he can't control his music  

[Verse]

He grown up, woke up in the pan though he won't light the horse, he's no good although he's having a lot to confess he's no beast, he's sorry to heal he comes home, feeling like japanese water he's laughing in the ice because he can't control his music  

[Chorus]

He'll say he's cute and he'll show you plenty and he won't cry when you leave his door on a fling I might picking, gonna watch you run  

[Verse]

Some folks get up singing, some folks get in south rock there's no sensation till you reach the border until you've much created the fruit lable thistle


It looks more and more like actual lyrics! I do not include line breaks for each line in each song section because that all depends on the song structure, and so it gives the user more freedom to decide how to structure their song.

I hope this model works for you! There's a lot to explore by generating lyrics with this method. While this generator is far from perfect, it can give you a great starting point for your next composition. If you come up with anything, please send it my way to give it a spin!