# Hi!
For this project, I decided to take neural nets a step further by using a generative recurring neural net to generate song lyrics. Using the same data from my first project, I trained my recurrent neural net on the lyrics for each year (stopped at 1999 for time) before uploading the weights to [this github repo](https://github.com/bgtripp/Lyrics-Neural-Net). The recurrent network has successive layers of nodes with weights that are impacted by the distance between layers allowing for machine learning based on sequences. Because song lyrics are just characters arranged in complex sequences of words, this type of network is perfect for the task. In this way, the network reads the input text as a sequence of characters, as opposed to an image recognition neural net that would read a whole bitmap array at once. This also solves the issue of all of the input text vectors being different sizes. In training, the networks attempt to predict the next character given the previous 30 characters. This means that for the first couple of epochs (rounds of training) the networks' attempts at generating lyrics are just jumbles of random characters, but it's fascinating to watch its progression below. The "temperature" values referenced below refer to the creativity of the network. The higher the value, the less likely the network is going to choose THE most predicted next character, and will come up with some more interesting, while sometimes nonsensical, results. Enjoy!

# **Initial Prep**

In [None]:
#Copyright Benjamin Tripp 2020
import pandas as pd 
import numpy as np 
!pip install -q textgenrnn #Install the text generation rnn library
from google.colab import files
from textgenrnn import textgenrnn
from datetime import datetime
import os

#Data set of billboard lyrics
url = 'https://raw.githubusercontent.com/walkerkq/musiclyrics/master/billboard_lyrics_1964-2015.csv'

song_data = pd.read_csv(url, encoding = "latin-1") #Pandas dataframe
song_data.head() #Shows the first 5 lines of the dataframe

Using TensorFlow backend.


Unnamed: 0,Rank,Song,Artist,Year,Lyrics,Source
0,1,wooly bully,sam the sham and the pharaohs,1965,sam the sham miscellaneous wooly bully wooly b...,3.0
1,2,i cant help myself sugar pie honey bunch,four tops,1965,sugar pie honey bunch you know that i love yo...,1.0
2,3,i cant get no satisfaction,the rolling stones,1965,,1.0
3,4,you were on my mind,we five,1965,when i woke up this morning you were on my mi...,1.0
4,5,youve lost that lovin feelin,the righteous brothers,1965,you never close your eyes anymore when i kiss...,1.0


#**Preparing Data**

In [None]:
def getLyrics(year): #Make function that gets lyrics given a year

  subset = song_data[song_data['Year'] == year]
  lyrics = subset['Lyrics']
  lyrics.to_csv(r'lyrics.txt', header=None, index=None, sep=' ', mode='a') #Saves lyrics as text files with one song per line

year = 1965
file_name = 'lyrics.txt'
model_name = '{}PopLyrics'.format(year)

#**Configuring Recurrent Neural Network**

In [None]:
model_cfg = {
    'word_level': False,   # set to True if want to train a word-level model (requires more data and smaller max_length)
    'rnn_size': 128,   # number of LSTM cells of each layer (128/256 recommended)
    'rnn_layers': 3,   # number of LSTM layers (>=2 recommended)
    'rnn_bidirectional': False,   # consider text both forwards and backward, can give a training boost
    'max_length': 30,   # number of tokens to consider before predicting the next (20-40 for characters, 5-10 for words recommended)
    'max_words': 10000,   # maximum number of words to model; the rest will be ignored (word-level model only)
}

train_cfg = {
    'line_delimited': True,   # set to True if each text has its own line in the source file
    'num_epochs': 20,   # set higher to train the model for longer default 20
    'gen_epochs': 1,   # generates sample text from model after given number of epochs
    'train_size': 0.8,   # proportion of input data to train on: setting < 1.0 limits model from learning perfectly
    'dropout': 0.0,   # ignore a random proportion of source tokens each epoch, allowing model to generalize better
    'validation': False,   # If train__size < 1.0, test on holdout dataset; will make overall training slower
    'is_csv': False   # set to True if file is a CSV exported from Excel/BigQuery/pandas
}

  """


5000     this hit that ice cold michelle pfeiffer that...
5001     when your legs dont work like they used to be...
5002     its been a long day without you my friend and...
5003     im like hey wassup hello seen yo pretty ass s...
5004     im hurting baby im broken down i need your lo...
Name: Lyrics, dtype: object

#**Creating and Training Neural Network**

In [None]:
textgen = textgenrnn(name=model_name) #New instance of text generating recurrent neural network

train_function = textgen.train_from_file if train_cfg['line_delimited'] else textgen.train_from_largetext_file

train_function(                 #If you're wondering what some of these settings do, see Configuring Recurrent Neural Network above
    file_path=file_name,
    new_model=True,
    num_epochs=train_cfg['num_epochs'],
    gen_epochs=train_cfg['gen_epochs'],
    batch_size=1024,
    train_size=train_cfg['train_size'],
    dropout=train_cfg['dropout'],
    validation=train_cfg['validation'],
    is_csv=train_cfg['is_csv'],
    rnn_layers=model_cfg['rnn_layers'],
    rnn_size=model_cfg['rnn_size'],
    rnn_bidirectional=model_cfg['rnn_bidirectional'],
    max_length=model_cfg['max_length'],
    dim_embeddings=100,
    word_level=model_cfg['word_level'])












99 texts collected.
Training new model w/ 3-layer, 128-cell LSTMs
Training on 151,426 character sequences.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


Epoch 1/20
####################
Temperature: 0.2
####################
""      ea        o                                           it                 a          e                         e                             ri   i                          ia a                    o  o                   o   a     e         a            at        e        t       o      oo  

"""            i                 a   a  o  a        e  t                   o         o                              t                       o                            o           e     e     h         oo                    i   i     e        o     h    ee     o n o                 e      n  s   

"""   "                             o          t         e   o               oe                o  a    e          t 

#**Fine-tuning Models on New Data**

See [Fine-Tuning Colab](https://colab.research.google.com/drive/1NBB7SVIxOis6xObINngwWBKIXdlvuvFf)

# **Testing Pre-trained model**

See [Testing Colab](https://colab.research.google.com/drive/1QsxYmfo_zgbFdnBEg7xcMxaAhZYDhkOL)

