[Using Deep Learning and Tensorflow to generate new song lyrics in the style of Weezer](https://pieriantraining.com/tensorflow-weezer-lyrics/)

[Genius API](https://docs.genius.com/)

In [1]:
from lyricsgenius import Genius
import numpy as np
import json
import tensorflow as tf


In [4]:
text = ''
# This will remove special characters for songs that are not in english
special_chars = set([ 'ँ', 'आ', 'ए', 'क', 'ग', 'ज', 'त',
       'द', 'ध', 'न', 'प', 'ब', 'म', 'य', 'र', 'व', 'श', 'ह', 'ा', 'ि',
       'ी', 'ू', 'े', 'ै', 'ो', '्', '\u2005', '\u200c', '—', '‘', '’',
       '\u205f', '느', '사', '어', '이', '제', '죠', '품', '회','\u0435','\xa0', '\u2019'])

artist_name = 'Beatles'
max_songs = 20
lyrics_filename = f'{artist_name}{max_songs}.json'
lyrics_text_filename = f'{artist_name}{max_songs}_Song_Lyrics.txt'

In [5]:
token = input("RKpHLmBrGEPoqrfH9_1UiowUxmyltYHRaqzaHCO1QKClJTKHxNrr49uCBsQTtcD1")

genius = Genius(token)
# copy the token and paste into the text prompt. The token may change
artist = genius.search_artist(artist_name ,max_songs=max_songs)
artist.save_lyrics(lyrics_filename)


RKpHLmBrGEPoqrfH9_1UiowUxmyltYHRaqzaHCO1QKClJTKHxNrr49uCBsQTtcD1RKpHLmBrGEPoqrfH9_1UiowUxmyltYHRaqzaHCO1QKClJTKHxNrr49uCBsQTtcD1
Searching for songs by Beatles...

Changing artist name to 'The Beatles'
Song 1: "Yesterday"
Song 2: "Let It Be"


Timeout: Request timed out:
HTTPSConnectionPool(host='genius.com', port=443): Read timed out. (read timeout=5)

In [45]:
#print(artist.songs[1].lyrics)

In [17]:
lyrics = 'I look just like Buddy Holly'
tokens = lyrics.split()
tokens

['I', 'look', 'just', 'like', 'Buddy', 'Holly']

In [50]:
text = ""
for song in artist.songs:
    chars =set(song.lyrics)
    if len(chars.intersection(special_chars)) == 0:
        text += song.lyrics
        text += "\r\r"
        #print(text)
    else:
        pass

f = open("5_Weezer_Song_Lyrics.txt","w")
f.write(text)
f.close()


## Text Processing
We know a neural network can’t take in the raw string data, we need to assign numbers to each character. Let’s create two dictionaries that can go from numeric index to character and character to numeric index.

In [3]:
#
# Methods to get songs and lyrics from JSON file
#
import sys
def get_songs(filename:str) -> dict:
    '''Get titles and lyrics from a artist JSON file
        
        Arguments:
            filename - the name of an Artist JSON file created by genius.save_lyrics()
        Returns:
            A dict where the keys are the song title, and the value is the lyrics
    '''
    songs_dict = {}
    with open(filename) as fp:
        artist_dict = json.load(fp)
        if artist_dict is not None and 'songs' in artist_dict:
            for song_ in artist_dict['songs']:
                songs_dict[song_['title']] = song_['lyrics']
    fp.close()
    return songs_dict

def get_lyrics_for(songs_dict:dict, title:str, strip_title=False, strip_markers=False, strip_embed=True) -> str:
    lyrics = None
    if title in songs_dict:
        lyrics = songs_dict[title]
    return lyrics

def get_lyrics(json_filename:str) -> str:
    '''Gets the lyrics from a artist JSON file
        Arguments:
            json_filename - the name of an Artist JSON file created by genius.save_lyrics()
        Returns: the lyrics as a text string 
    '''
    songs_dict = get_songs(json_filename)
    text = ""
    for title in songs_dict.keys():
        lyrics = songs_dict[title]
        chars =set(lyrics)  # unique characters
        sc = chars.intersection(special_chars)
        if len(sc) > 0:
            for c in sc:
                lyrics = lyrics.replace(c, '')
        text += lyrics
        text += "\r\r"
    return text
    
def save_lyrics(json_filename:str, out_filename:str):
    '''Save the lyrics from a artist JSON file to a text file
        Arguments:
            json_filename - the name of an Artist JSON file created by genius.save_lyrics()
            lyrics_filname - output filename
        Returns: the text string saved
    '''
    text = get_lyrics(json_filename)
    f = open(out_filename,"w", encoding="utf-8")
    
    try:
        f.write(text)
    except UnicodeError as ue:
        print(f'UnicodeError on chars {ue.start} to {ue.end-1}', sys.stderr)
        print(f' "{text[ue.start:ue.end]}"')
        print(ue)

    f.close()
    return text

In [5]:
wd = get_songs('resources/text/Weezer200.json')

In [4]:
#text = save_lyrics('resources/text/Weezer200.json', 'resources/text/Weezer200_Song_Lyrics.txt')
text = get_lyrics('resources/text/Weezer200.json')

In [36]:
#
# get saved lyrics
#
f = open('resources/text/Weezer200_Song_Lyrics.txt', 'r', encoding="utf-8")
text = f.read()
f.close()
#text

In [37]:
# The unique characters in the file
vocab = sorted(set(text))
char_to_ind = {u:i for i, u in enumerate(vocab)}
ind_to_char = np.array(vocab)
# encode chars to numerics
encoded_text = np.array([char_to_ind[c] for c in text])


In [38]:
sample = text[:20]
print(sample)
encoded_text[:20]


Say It Aint So Lyric


array([45, 54, 78,  1, 35, 73,  1, 27, 62, 67, 73,  1, 45, 68,  1, 38, 78,
       71, 62, 56])

## Creating Batches

Overall what we are trying to achieve is to have the model predict the next highest probability character given a historical sequence of characters. Its up to us (the user) to choose how long that historic sequence. Too short a sequence and we don’t have enough information (e.g. given the letter “a” , what is the next character) , too long a sequence and training will take too long and most likely overfit to sequence characters that are irrelevant to characters farther out. </br>While there is no correct sequence length choice, you should consider the text itself, how long normal phrases are in it, and a reasonable idea of what characters/words are relevant to each other.

Similar to Markov chain where the length of the historic sequence is the order of the chain.

In [2]:
print(tf.config.list_physical_devices())


[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]


In [41]:
seq_len = 120
total_num_seq = len(text)//(seq_len+1)
total_num_seq

2072

In [42]:
# Create Training Sequences
char_dataset = tf.data.Dataset.from_tensor_slices(encoded_text)

#for i in char_dataset.take(500):
#    print(ind_to_char[i.numpy()])

The **batch** method converts these individual character calls into sequences we can feed in as a batch. We use seq_len+1 because of zero indexing. Here is what drop_remainder means:

**drop_remainder**: (Optional.) A tf.bool scalar tf.Tensor, representing whether the last batch should be dropped in the case it has fewer than batch_size elements; the default behavior is not to drop the smaller batch.

In [43]:
sequences = char_dataset.batch(seq_len+1, drop_remainder=True)

Now that we have our sequences, we will perform the following steps for each one to create our target text sequences:

1. Grab the input text sequence
2. Assign the target text sequence as the input text sequence shifted by one step forward
3. Group them together as a tuple



In [44]:
sequences

<BatchDataset element_spec=TensorSpec(shape=(121,), dtype=tf.int32, name=None)>

In [45]:
def create_seq_targets(seq):
    input_txt = seq[:-1]
    target_txt = seq[1:]
    return input_txt, target_txt

input_txt, target_txt = create_seq_targets("abcdef")
print(input_txt, target_txt)

abcde bcdef


In [46]:
dataset = sequences.map(create_seq_targets)

In [47]:
dataset

<MapDataset element_spec=(TensorSpec(shape=(120,), dtype=tf.int32, name=None), TensorSpec(shape=(120,), dtype=tf.int32, name=None))>

In [48]:
dataset.take(1)

<TakeDataset element_spec=(TensorSpec(shape=(120,), dtype=tf.int32, name=None), TensorSpec(shape=(120,), dtype=tf.int32, name=None))>

In [49]:
for input_txt, target_txt in  dataset.take(1):
    print(input_txt.numpy())
    print(''.join(ind_to_char[input_txt.numpy()]))
    print('\n')
    print(target_txt.numpy())
    # There is an extra whitespace!
    print(''.join(ind_to_char[target_txt.numpy()]))

[45 54 78  1 35 73  1 27 62 67 73  1 45 68  1 38 78 71 62 56 72 52 35 67
 73 71 68 53  0 41 61 10  1 78 58 54 61  0 27 65 71 62 60 61 73  0  0 52
 48 58 71 72 58  1 15 53  0 45 68 66 58 55 68 57 78  6 72  1 34 58 62 67
 58  0 35 72  1 56 71 68 76 57 62 67 60  1 66 78  1 62 56 58 55 68 77  0
 45 68 66 58 55 68 57 78  6 72  1 56 68 65 57  1 68 67 58  0 35 72  1 60]
Say It Aint So Lyrics[Intro]
Oh, yeah
Alright

[Verse 1]
Somebody's Heine
Is crowding my icebox
Somebody's cold one
Is g


[54 78  1 35 73  1 27 62 67 73  1 45 68  1 38 78 71 62 56 72 52 35 67 73
 71 68 53  0 41 61 10  1 78 58 54 61  0 27 65 71 62 60 61 73  0  0 52 48
 58 71 72 58  1 15 53  0 45 68 66 58 55 68 57 78  6 72  1 34 58 62 67 58
  0 35 72  1 56 71 68 76 57 62 67 60  1 66 78  1 62 56 58 55 68 77  0 45
 68 66 58 55 68 57 78  6 72  1 56 68 65 57  1 68 67 58  0 35 72  1 60 62]
ay It Aint So Lyrics[Intro]
Oh, yeah
Alright

[Verse 1]
Somebody's Heine
Is crowding my icebox
Somebody's cold one
Is gi


### Creating Training Batches
Now that we have the actual sequences, we will create the batches, we want to shuffle these sequences into a random order, so the model doesn’t overfit to any section of the text, but can instead generate characters given any seed text.

In [50]:
# Batch size
batch_size = 128

# Buffer size to shuffle the dataset so it doesn't attempt to shuffle
# the entire sequence in memory. Instead, it maintains a buffer in which it shuffles elements
buffer_size = 10000

dataset = dataset.shuffle(buffer_size).batch(batch_size, drop_remainder=True)

In [51]:
dataset

<BatchDataset element_spec=(TensorSpec(shape=(128, 120), dtype=tf.int32, name=None), TensorSpec(shape=(128, 120), dtype=tf.int32, name=None))>

## Creating the Model

We will use an [LSTM (Long short-term memory)](https://en.wikipedia.org/wiki/Long_short-term_memory) based model with a few extra features, including an embedding layer to start off with and two LSTM layers. We based this model architecture off the [DeepMoji](https://deepmoji.mit.edu/) and the original source code can be found in [GitHub - bfelbo/DeepMoji](https://github.com/bfelbo/DeepMoji). (Note - DeepMoji is a State-of-the-art deep learning model for analyzing sentiment, emotion, sarcasm, etc.)

The embedding layer will serve as the input layer, which essentially creates a lookup table that maps the numbers indices of each character to a vector with “embedding dim” number of dimensions. As you can imagine, the larger this embedding size, the more complex the training. This is similar to the idea behind word2vec, where words are mapped to some n-dimensional space. Embedding before feeding straight into the LSTM usually leads to more realisitic results.

From Wikipedia:</br>Long short-term memory (LSTM) is an [artificial neural network (ANN)](https://en.wikipedia.org/wiki/Artificial_neural_network)  used in the fields of artificial intelligence and deep learning. Unlike standard feedforward neural networks, LSTM has feedback connections. Such a [recurrent neural network (RNN)](https://en.wikipedia.org/wiki/Recurrent_neural_network) can process not only single data points (such as images), but also entire sequences of data (such as speech or video). For example, LSTM is applicable to tasks such as unsegmented, connected handwriting recognition, speech recognition, machine translation, robot control, video games, and healthcare. LSTM has become the most cited neural network of the 20th century.


In [52]:
# Length of the vocabulary in chars
vocab_size = len(vocab)
print(f'vocab_size: {vocab_size}')

# The embedding dimension
embed_dim = 64

# Number of RNN units
rnn_neurons = 1026

# create a function that easily adapts to different variables as shown above.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM,Dense,Embedding,Dropout,GRU


vocab_size: 82


### Setting up Loss Function

For our loss we will use [sparse categorical crossentropy](https://keras.io/api/losses/probabilistic_losses/#sparse_categorical_crossentropy-function), which we can import from Keras. We will also set this as logits=True


In [53]:
from tensorflow.keras.losses import sparse_categorical_crossentropy

In [54]:
help(sparse_categorical_crossentropy)

Help on function sparse_categorical_crossentropy in module keras.losses:

sparse_categorical_crossentropy(y_true, y_pred, from_logits=False, axis=-1)
    Computes the sparse categorical crossentropy loss.
    
    Standalone usage:
    
    >>> y_true = [1, 2]
    >>> y_pred = [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]
    >>> loss = tf.keras.losses.sparse_categorical_crossentropy(y_true, y_pred)
    >>> assert loss.shape == (2,)
    >>> loss.numpy()
    array([0.0513, 2.303], dtype=float32)
    
    Args:
      y_true: Ground truth values.
      y_pred: The predicted values.
      from_logits: Whether `y_pred` is expected to be a logits tensor. By default,
        we assume that `y_pred` encodes a probability distribution.
      axis: Defaults to -1. The dimension along which the entropy is
        computed.
    
    Returns:
      Sparse categorical crossentropy loss value.



In [55]:
def sparse_cat_loss(y_true, y_pred):
    return sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)

def create_model(vocab_size, embed_dim, rnn_neurons, batch_size):
    model = Sequential()
    model.add(Embedding(vocab_size, embed_dim,batch_input_shape=[batch_size, None]))
    model.add(GRU(rnn_neurons,return_sequences=True,stateful=True,recurrent_initializer='glorot_uniform'))
    # Final Dense Layer to Predict
    model.add(Dense(vocab_size))
    model.compile(optimizer='adam', loss=sparse_cat_loss) 
    return model


In [56]:
model = create_model(
  vocab_size = vocab_size,
  embed_dim=embed_dim,
  rnn_neurons=rnn_neurons,
  batch_size=batch_size)

In [57]:
model.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (128, None, 64)           5248      
                                                                 
 gru_1 (GRU)                 (128, None, 1026)         3361176   
                                                                 
 dense_1 (Dense)             (128, None, 82)           84214     
                                                                 
Total params: 3,450,638
Trainable params: 3,450,638
Non-trainable params: 0
_________________________________________________________________


## Training the Model
Let’s make sure everything is ok with our model before we spend too much time training! Let’s pass in a batch to confirm the model currently predicts random characters without any training.

In [58]:
for input_example_batch, target_example_batch in dataset.take(1):

  # Predict off some random batch
  example_batch_predictions = model(input_example_batch)

  # Display the dimensions of the predictions
  print(example_batch_predictions.shape, " <=== (batch_size, sequence_length, vocab_size)")

(128, 120, 82)  <=== (batch_size, sequence_length, vocab_size)


In [59]:
# example_batch_predictions
sampled_indices = tf.random.categorical(example_batch_predictions[0], num_samples=1)

# Reformat to not be a lists of lists
sampled_indices = tf.squeeze(sampled_indices, axis=-1).numpy()
sampled_indices

array([70,  9, 16, 12, 28, 29, 43, 39, 23, 70,  0, 48, 37,  7, 56, 31, 71,
       78, 24, 41, 72, 32, 58, 65, 50, 63, 16, 61, 63, 30,  5, 71, 29, 72,
       74,  4, 22, 40, 59, 30, 74, 27, 75, 78, 17, 17, 81, 53, 52, 34, 37,
       29, 58, 79, 17, 21, 52, 40, 24,  6, 70, 58, 34, 41, 35, 36, 34, 45,
       77, 32,  0, 29, 56, 62, 52, 45, 59, 58, 22, 47, 22, 56, 79, 76, 44,
       29, 53,  7, 19, 14, 75, 77, 69, 11, 13, 64, 73, 78, 40, 48, 16, 74,
       57, 43, 80,  2, 29, 60, 78, 33, 77, 46, 26, 33, 19, 56, 44, 80, 28,
       33], dtype=int64)

In [60]:
print("Given the input seq: \n")
print("".join(ind_to_char[input_example_batch[0]]))
print('\n')
print("Next Char Predictions: \n")
print("".join(ind_to_char[sampled_indices ]))



Given the input seq: 

Uh-huh

[Outro]
(This is beginning to hurt) Oh-ho
(This is beginning to hurt) Oh-ho
(This is beginning to hurt) Oh-ho-ho


Next Char Predictions: 

q*2.BCQM9q
VK(cEry:OsFelYj2hjD&rCsu$8NfDuAvy33ó][HKCez37[N:'qeHOIJHSxF
Cci[Sfe8U8czwRC](50vxp-/ktyNV2udQ¡!CgyGxT?G5cR¡BG


Alright, looks like everything’s working, we just need to train the network to learn from our small dataset, let’s train it!

In [3]:
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Num GPUs Available:  0


In [None]:
epochs = 100
model.fit(dataset,epochs=epochs)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
 2/16 [==>...........................] - ETA: 1:15 - loss: 3.1588

## Generating Text
Currently our model only expects 128 sequences at a time. We can create a new model that only expects a batch_size=1. We can create a new model with this batch size, then load our saved models weights. Then call .build() on the model:

In [None]:
model.save('weezer_gen.h5') 

In [None]:
from tensorflow.keras.models import load_model

In [None]:
model = create_model(vocab_size, embed_dim, rnn_neurons, batch_size=1)

model.load_weights('weezer_gen.h5')

model.build(tf.TensorShape([1, None]))

model.summary()

In [None]:
def generate_text(model, start_seed, gen_size=100, temp=1.0):
    
    '''
    model: Trained Model to Generate Text
    start_seed: Intial Seed text in string form
    gen_size: Number of characters to generate

    Basic idea behind this function is to take in some seed text, format it so
    that it is in the correct shape for our network, then loop the sequence as
    we keep adding our own predicted characters. Similar to our work in the RNN
    time series problems.
    '''

    # Number of characters to generate
    num_generate = gen_size

    # Vecotrizing starting seed text
    input_eval = [char_to_ind[s] for s in start_seed]

    # Expand to match batch format shape
    input_eval = tf.expand_dims(input_eval, 0)

    # Empty list to hold resulting generated text
    text_generated = []

    # Temperature effects randomness in our resulting text
    # The term is derived from entropy/thermodynamics.
    # The temperature is used to effect probability of next characters.
    # Higher probability == lesss surprising/ more expected
    # Lower temperature == more surprising / less expected

    temperature = temp

    # Here batch size == 1
    model.reset_states()

    for i in range(num_generate):

        # Generate Predictions
        predictions = model(input_eval)

        # Remove the batch shape dimension
        predictions = tf.squeeze(predictions, 0)

        # Use a cateogircal disitribution to select the next character
        predictions = predictions / temperature
        predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

        # Pass the predicted charracter for the next input
        input_eval = tf.expand_dims([predicted_id], 0)

        # Transform back to character letter
        text_generated.append(ind_to_char[predicted_id])

    return (start_seed + ''.join(text_generated))

In [None]:
print(generate_text(model,"Hey",gen_size=1000))

Now you may not be very impressed by the results, but take a closer look at the output and remember that the model is predicting character by character! This means its actually starting to learn the structure of a song, you can see it begin to learn concepts of structure by utilizing whitespace and markers such as **\[Verse\]**, **\[Bridge\]**, and **\[Chorus\]** , which is incredible given how small the dataset is. We also see some evidence of overfitting as the model begins to just duplicate existing song lyrics (this is only something you can tell if you’re quite the Weezer fan and recognize the lyrics of multiple songs starting to merge).

Try taking this further by building out an even larger data set that includes more bands, or play around with the model hyperparameters or training epochs!
