# Genius Lyrics Data Mining 

goal: find a way to mine the necessary data efficiently and without much user input

**areas of interest**
1. can we query music by genre?
2. if we give it a list of songs, is there an easy way to get certain elements from it:
    > lyrics
    
    > artists 
    
    > year released 
    
    > where the artist is from 
3. can we get this information without downloading json files per artist?


**current bugs**
1. find a way to get artist and song id as opposed to the names and titles

In [1]:
# install if this next cell doesn't run:
# !pip install lyricgenius

In [6]:
# importing libraries: 

import lyricsgenius
import pandas as pd
import json

I told my group to give me a list of songs to refer to, so i think i'll create a sample table to work with. 

In [3]:
titles = {'Artist':['Queen','Taylor Swift','Cavetown','Ricky Montgomery','Cavetown'],
          'Song':["Don't Stop Me Now",'Blank Space','Juliet','Get Used to It','Bad Dream Baby']}

In [4]:
test_data = pd.DataFrame.from_dict(titles)

test_data

Unnamed: 0,Artist,Song
0,Queen,Don't Stop Me Now
1,Taylor Swift,Blank Space
2,Cavetown,Juliet
3,Ricky Montgomery,Get Used to It
4,Cavetown,Bad Dream Baby


In [5]:
# now let's pass some of this into the genius thing: 

# !pip install lyricsgenius

In [6]:
# importing libraries

import lyricsgenius

# visualizations
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

In [7]:
token = 'HLRORyLWz4776M-RCyF_dAOMBQe9q84xenZadvnQgSO-nO9XOhlqkz5Bc-LH5PlY'
genius = (lyricsgenius.Genius(token, 
                              skip_non_songs = True, 
                              verbose = False, # turns on or off the text responses, should turn off when done debugging
                              remove_section_headers = False, # for the lyrics: no chorus or bridge listed
                              retries = 3) ) # the number of times search should try in event of crashes or timeouts 

**you can search by artist name:**


genius.search_artist(_artists name_ )

In [8]:
# lets use the artist class

# genius.artist('00343')

In [9]:
test = genius.search_artist('Taylor Swift', max_songs = 0).id

In [10]:
test

1177

In [11]:
def get_artistID(artist):
    '''given the artist name return the artist ID'''
    return genius.search_artist(artist, max_songs = 0).id

In [12]:
artists = test_data['Artist']

In [13]:
test_data.insert(loc = 1, column = 'Artist_ID', value = [get_artistID(a) for a in artists])

In [14]:
test_data

Unnamed: 0,Artist,Artist_ID,Song
0,Queen,563,Don't Stop Me Now
1,Taylor Swift,1177,Blank Space
2,Cavetown,966969,Juliet
3,Ricky Montgomery,406321,Get Used to It
4,Cavetown,966969,Bad Dream Baby


In [15]:
queen_dict = (genius.search_artist_songs(test_data['Artist_ID'][0], 
                            search_term = test_data["Song"][0]))

In [16]:
queen_dict.keys()
# queen_dict['songs']

dict_keys(['songs', 'next_page', 'total_entries'])

In [17]:
# queen_dict['songs'][0]

In [18]:
# getting the year released: 
print(queen_dict['songs'][0]['release_date_components']) # this one is better 
print(queen_dict['songs'][0]['release_date_for_display'])

{'year': 1978, 'month': 11, 'day': 10}
November 10, 1978


In [19]:
def song_dictionary(artist_ID, song_title, most_popular_only = True):
    '''returns the dictionary of keywords and information about a song
    INPUTS: artist_ID: an integer value
            song_title: the title of the song, not case sensitive
    caveat: will return all songs of that artist with the same or similar title 
    (eg the 10 minute version or Taylor's version vs the original, new features, etc. . . 
    to account for this, the term: most_popular_only can be set to true to only return the most popular 
    version of the song we want''' 
    dictionary = (genius.search_artist_songs(artist_id = artist_ID, 
                            search_term =song_title, 
                                            sort = 'popular'))
    if most_popular_only:
        return dictionary['songs'][0]
    return dictionary['songs']

# tests:

test_dict = song_dictionary(test_data['Artist_ID'][0], song_title = test_data['Song'][0])

In [20]:
def get_release_date(song_dict, form = 'd'):
    '''given a song dictionary, return the date the song came out.
    if form == r, then return the readable option
    otherwise return the dictionary version
    OUTPUT: either INT or DICT'''
    if form == 'd':
        return song_dict['release_date_components'] # this one is better
    if form == 'r':
        return song_dict['release_date_for_display']
    else:
        raise Exception('Wrong form selected, please choose d for DICTIONARY or r for plain text')
        
get_release_date(test_dict), get_release_date(test_dict, form = 'r')

({'year': 1978, 'month': 11, 'day': 10}, 'November 10, 1978')

In [21]:
def get_year(song):
    '''given SONG, return the year the song came out
    OUTPUT: INT'''
    year = get_release_date(song)['year']
    return year
    
    
get_year(test_dict)

1978

In [22]:
def get_song_ID(song):
    return song['id']


test_song_ID = get_song_ID(test_dict)

In [23]:
# maybe let's see comments! 

def get_comments(song_ID, style = 'plain'):
    comms = genius.song_comments(song_ID, text_format = style)
    return comms
    
# this is the first comment (in this list), with no other information     
get_comments(test_song_ID)['comments'][0]['body']
# let's see if I can get more information:

{'plain': 'Obviously they call him Mr. Fahrenheit because he’s so hot, duh!'}

In [24]:
len(get_comments(test_song_ID, style = 'markdown')['comments'])

10

In [25]:
# plain text lyrics, use print to see them nicer 

# (genius.search_song(title = test_data['Song'][0], 
#                     artist = test_data['Artist'][0],
#                    get_full_info = False) # can also do song_ID if you want a specific one
#  .lyrics)

In [26]:
# queen_dict['songs']

In [27]:
# let's set up a cache so we don't keep querying into the API:

In [28]:
from functools import cache

In [29]:
# let's see what object has the most information:

test_data

Unnamed: 0,Artist,Artist_ID,Song
0,Queen,563,Don't Stop Me Now
1,Taylor Swift,1177,Blank Space
2,Cavetown,966969,Juliet
3,Ricky Montgomery,406321,Get Used to It
4,Cavetown,966969,Bad Dream Baby


In [30]:
sample_row = test_data.sample(1)

In [31]:
sample_row

Unnamed: 0,Artist,Artist_ID,Song
3,Ricky Montgomery,406321,Get Used to It


In [32]:
# let's see what we can do: 

# the thing with this one is you can't ask for a specific song, so maybe not this one unless totally necessary:
sample_artist = genius.search_artist(sample_row['Artist'].iloc[0], max_songs = 0) # returns DICTIONARY

In [33]:
# okay so for this one you need the name and the song title. . . otherwise you muight get the wrong song 
sample_song = genius.search_song(title = sample_row['Song'].iloc[0], artist = sample_row['Artist'].iloc[0])
# returns a song object 

# this could break on the wrong artist, so let's see what it does down below:

In [34]:
sample_song.to_dict()

{'_type': 'song',
 'annotation_count': 10,
 'api_path': '/songs/2045031',
 'artist_names': 'Ricky Montgomery',
 'full_title': 'Get Used to It by\xa0Ricky\xa0Montgomery',
 'header_image_thumbnail_url': 'https://images.genius.com/cc976fa6827211e5fe62b1a0d637f21d.300x300x1.jpg',
 'header_image_url': 'https://images.genius.com/cc976fa6827211e5fe62b1a0d637f21d.816x816x1.jpg',
 'id': 2045031,
 'instrumental': False,
 'lyrics_owner_id': 2470755,
 'lyrics_state': 'complete',
 'lyrics_updated_at': 1686868931,
 'path': '/Ricky-montgomery-get-used-to-it-lyrics',
 'pyongs_count': 1,
 'relationships_index_url': 'https://genius.com/Ricky-montgomery-get-used-to-it-sample',
 'release_date_components': {'year': 2014, 'month': 11, 'day': 14},
 'release_date_for_display': 'November 14, 2014',
 'release_date_with_abbreviated_month_for_display': 'Nov. 14, 2014',
 'song_art_image_thumbnail_url': 'https://images.genius.com/cc976fa6827211e5fe62b1a0d637f21d.300x300x1.jpg',
 'song_art_image_url': 'https://image

In [35]:
# cool, got song ID and lyrics out of this: but we're missing artist information 

sample_song.to_dict().keys()

dict_keys(['_type', 'annotation_count', 'api_path', 'artist_names', 'full_title', 'header_image_thumbnail_url', 'header_image_url', 'id', 'instrumental', 'lyrics_owner_id', 'lyrics_state', 'lyrics_updated_at', 'path', 'pyongs_count', 'relationships_index_url', 'release_date_components', 'release_date_for_display', 'release_date_with_abbreviated_month_for_display', 'song_art_image_thumbnail_url', 'song_art_image_url', 'stats', 'title', 'title_with_featured', 'updated_by_human_at', 'url', 'featured_artists', 'primary_artist', 'primary_artists', 'artist', 'lyrics'])

In [36]:
sample_song = sample_song.to_dict()

In [37]:
sample_song['id']

2045031

In [38]:
print(sample_song['lyrics'])

27 ContributorsGet Used to It Lyrics[Verse 1]
Used to live down by the beach
And I used to be good on my feet
And my fingers used to dance on every key
Now they're just pieces of meat
Used to wear a suit and tie
And I used to grab a birdie every night
Drive down to the hotel parking lot
Make monkey love in the street

Used to go to university
Used to be the head of varsity
Used to live inside this box
With everyone noticing me
I used to leave the evening feeling right
I'll be with you each and every night
Chasing that horizon in our eyes

[Chorus]
You want a garden, but you got a balcony
And you're always lookin' for some company
You want a say, well, what you got to say?
Give in to me, give in to me and see

[Verse 2]
Used to live down by Hyperion
Used to be a Californian
Way back when before my Missouri
Left not but this envy in my heart
Heart and soul
Apropos, you're beautiful
You're beautiful
See Ricky Montgomery LiveGet tickets as low as $35You might also like[Chorus]
You want a g

In [39]:
# let's ssee if we can get artist infromation: 

sample_artist.to_dict()

{'alternate_names': ['Richard Montgomery', 'Richard Owen Holmes Montgomery'],
 'api_path': '/artists/406321',
 'description': {'plain': 'Richard “Ricky” Owen Holmes Montgomery (born April 3, 1993) is an American singer and songwriter born and raised in Los Angeles, California, but moved to St. Louis with his mother in 2005.\n\nBefore his solo work, He began playing in local bands around West St. Louis County at 14, and he released music with indie bands like Adversary In Arms, Carpathia, and Henry On The Run.\n\nAfter achieving success on Vine, Ricky released his debut extended play, Caught On The Moon, on November 14, 2014, that were featured again two years later, in his full debut studio album, Montgomery Ricky on April 1, 2016.\n\nRicky soon created a band called, The Honeysticks, which was formed in 2018 with artists, Caleb Hurst, Benjamin Russin, and Ryan Fyffe. The group released three studio songs along with covers from other artists posted on YouTube. Unfortunately, in early 2

In [40]:
sample_artist.to_dict().keys()

dict_keys(['alternate_names', 'api_path', 'description', 'facebook_name', 'followers_count', 'header_image_url', 'id', 'image_url', 'instagram_name', 'is_meme_verified', 'is_verified', 'name', 'translation_artist', 'twitter_name', 'url', 'current_user_metadata', 'iq', 'description_annotation', 'user', 'songs'])

In [41]:
sample_artist = sample_artist.to_dict()

In [42]:
description = sample_artist['description']['plain']

In [43]:
# let's try to find geographical locations: 
# maybe split by sentence and then see if theres a sentence with geographic locations mentioned?

import re 

In [44]:
re.split(pattern = r'\.', string = description)[0:4]

['Richard “Ricky” Owen Holmes Montgomery (born April 3, 1993) is an American singer and songwriter born and raised in Los Angeles, California, but moved to St',
 ' Louis with his mother in 2005',
 '\n\nBefore his solo work, He began playing in local bands around West St',
 ' Louis County at 14, and he released music with indie bands like Adversary In Arms, Carpathia, and Henry On The Run']

In [7]:
# okay so the period is NOT going to work :(( maybe I can relate it to the database? Keep sentences that have states in them? cities in them too?

city_data = pd.read_csv('uscities.csv')

FileNotFoundError: [Errno 2] No such file or directory: 'uscities.csv'

In [None]:
wrong_artist = genius.search_song(title = 'HOT TO GO!', artist = 'Justin Bieber')

In [None]:
wrong_song_lyrics = wrong_artist.to_dict()['lyrics']

In [None]:
wrong_song_lyrics
