# Genius Lyrics Data Mining 

goal: find a way to mine the necessary data efficiently and without much user input

**areas of interest**
1. can we query music by genre?
2. if we give it a list of songs, is there an easy way to get certain elements from it:
    > lyrics
    
    > artists 
    
    > year released 
    
    > where the artist is from 
3. can we get this information without downloading json files per artist?


**current bugs**
1. find a way to get artist and song id as opposed to the names and titles

In [96]:
# install if this next cell doesn't run:
# !pip install lyricgenius

In [97]:
# importing libraries: 

import lyricsgenius
import pandas as pd
import json

I told my group to give me a list of songs to refer to, so i think i'll create a sample table to work with. 

In [98]:
titles = {'Artist':['Queen','Taylor Swift','Cavetown','Ricky Montgomery','Cavetown'],
          'Song':["Don't Stop Me Now",'Blank Space','Juliet','Get Used to It','Bad Dream Baby']}

In [99]:
test_data = pd.DataFrame.from_dict(titles)

test_data

Unnamed: 0,Artist,Song
0,Queen,Don't Stop Me Now
1,Taylor Swift,Blank Space
2,Cavetown,Juliet
3,Ricky Montgomery,Get Used to It
4,Cavetown,Bad Dream Baby


In [100]:
# now let's pass some of this into the genius thing: 

# !pip install lyricsgenius

In [101]:
# importing libraries

import lyricsgenius

# visualizations
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

In [102]:
token = 'HLRORyLWz4776M-RCyF_dAOMBQe9q84xenZadvnQgSO-nO9XOhlqkz5Bc-LH5PlY'
genius = (lyricsgenius.Genius(token, 
                              skip_non_songs = True, 
                              verbose = False, # turns on or off the text responses, should turn off when done debugging
                              remove_section_headers = False, # for the lyrics: no chorus or bridge listed
                              retries = 3) ) # the number of times search should try in event of crashes or timeouts 

**you can search by artist name:**


genius.search_artist(_artists name_ )

In [103]:
# lets use the artist class

# genius.artist('00343')

In [104]:
test = genius.search_artist('Taylor Swift', max_songs = 0).id

In [105]:
test

1177

In [106]:
def get_artistID(artist):
    '''given the artist name return the artist ID'''
    return genius.search_artist(artist, max_songs = 0).id

In [107]:
artists = test_data['Artist']

In [112]:
test_data.insert(loc = 1, column = 'Artist_ID', value = [get_artistID(a) for a in artists])

In [113]:
test_data

Unnamed: 0,Artist,Artist_ID,Song
0,Queen,563,Don't Stop Me Now
1,Taylor Swift,1177,Blank Space
2,Cavetown,966969,Juliet
3,Ricky Montgomery,406321,Get Used to It
4,Cavetown,966969,Bad Dream Baby


In [131]:
queen_dict = (genius.search_artist_songs(test_data['Artist_ID'][0], 
                            search_term = test_data["Song"][0]))

In [214]:
queen_dict.keys()
queen_dict['songs']

[{'_type': 'song',
  'annotation_count': 9,
  'api_path': '/songs/205311',
  'artist_names': 'Queen',
  'full_title': "Don't Stop Me Now by\xa0Queen",
  'header_image_thumbnail_url': 'https://images.genius.com/aface99ac22323aec35a2841f57af5c1.300x298x1.jpg',
  'header_image_url': 'https://images.genius.com/aface99ac22323aec35a2841f57af5c1.600x595x1.jpg',
  'id': 205311,
  'instrumental': False,
  'lyrics_owner_id': 269746,
  'lyrics_state': 'complete',
  'lyrics_updated_at': 1663862595,
  'path': '/Queen-dont-stop-me-now-lyrics',
  'pyongs_count': 112,
  'relationships_index_url': 'https://genius.com/Queen-dont-stop-me-now-sample',
  'release_date_components': {'year': 1978, 'month': 11, 'day': 10},
  'release_date_for_display': 'November 10, 1978',
  'release_date_with_abbreviated_month_for_display': 'Nov. 10, 1978',
  'song_art_image_thumbnail_url': 'https://images.genius.com/aface99ac22323aec35a2841f57af5c1.300x298x1.jpg',
  'song_art_image_url': 'https://images.genius.com/aface99ac

In [145]:
queen_dict['songs'][0]

{'_type': 'song',
 'annotation_count': 9,
 'api_path': '/songs/205311',
 'artist_names': 'Queen',
 'full_title': "Don't Stop Me Now by\xa0Queen",
 'header_image_thumbnail_url': 'https://images.genius.com/aface99ac22323aec35a2841f57af5c1.300x298x1.jpg',
 'header_image_url': 'https://images.genius.com/aface99ac22323aec35a2841f57af5c1.600x595x1.jpg',
 'id': 205311,
 'instrumental': False,
 'lyrics_owner_id': 269746,
 'lyrics_state': 'complete',
 'lyrics_updated_at': 1663862595,
 'path': '/Queen-dont-stop-me-now-lyrics',
 'pyongs_count': 112,
 'relationships_index_url': 'https://genius.com/Queen-dont-stop-me-now-sample',
 'release_date_components': {'year': 1978, 'month': 11, 'day': 10},
 'release_date_for_display': 'November 10, 1978',
 'release_date_with_abbreviated_month_for_display': 'Nov. 10, 1978',
 'song_art_image_thumbnail_url': 'https://images.genius.com/aface99ac22323aec35a2841f57af5c1.300x298x1.jpg',
 'song_art_image_url': 'https://images.genius.com/aface99ac22323aec35a2841f57af

In [158]:
# getting the year released: 
print(queen_dict['songs'][0]['release_date_components']) # this one is better 
print(queen_dict['songs'][0]['release_date_for_display'])

{'year': 1978, 'month': 11, 'day': 10}
November 10, 1978


In [177]:
def song_dictionary(artist_ID, song_title, most_popular_only = True):
    '''returns the dictionary of keywords and information about a song
    INPUTS: artist_ID: an integer value
            song_title: the title of the song, not case sensitive
    caveat: will return all songs of that artist with the same or similar title 
    (eg the 10 minute version or Taylor's version vs the original, new features, etc. . . 
    to account for this, the term: most_popular_only can be set to true to only return the most popular 
    version of the song we want''' 
    dictionary = (genius.search_artist_songs(artist_id = artist_ID, 
                            search_term =song_title, 
                                            sort = 'popular'))
    if most_popular_only:
        return dictionary['songs'][0]
    return dictionary['songs']

# tests:

test_dict = song_dictionary(test_data['Artist_ID'][0], song_title = test_data['Song'][0])

In [181]:
def get_release_date(song_dict, form = 'd'):
    '''given a song dictionary, return the date the song came out.
    if form == r, then return the readable option
    otherwise return the dictionary version
    OUTPUT: either INT or DICT'''
    if form == 'd':
        return song_dict['release_date_components'] # this one is better
    if form == 'r':
        return song_dict['release_date_for_display']
    else:
        raise Exception('Wrong form selected, please choose d for DICTIONARY or r for plain text')
        
get_release_date(test_dict), get_release_date(test_dict, form = 'r')

({'year': 1978, 'month': 11, 'day': 10}, 'November 10, 1978')

In [192]:
def get_year(song):
    '''given SONG, return the year the song came out
    OUTPUT: INT'''
    year = get_release_date(song)['year']
    return year
    
    
get_year(test_dict)

1978

In [187]:
def get_song_ID(song):
    return song['id']


test_song_ID = get_song_ID(test_dict)

In [209]:
# maybe let's see comments! 

def get_comments(song_ID, style = 'plain'):
    comms = genius.song_comments(song_ID, text_format = style)
    return comms
    
# this is the first comment (in this list), with no other information     
get_comments(test_song_ID)['comments'][0]['body']
# let's see if I can get more information:

{'plain': 'Obviously they call him Mr. Fahrenheit because he’s so hot, duh!'}

In [210]:
len(get_comments(test_song_ID, style = 'markdown')['comments'])

10

In [165]:
# plain text lyrics, use print to see them nicer 


(genius.search_song(title = test_data['Song'][0], 
                    artist = test_data['Artist'][0],
                   get_full_info = False) # can also do song_ID if you want a specific one
 .lyrics)

"143 ContributorsTranslationsРусскийDeutschEspañolDon’t Stop Me Now Lyrics[Intro]\nTonight I'm gonna have myself a real good time\nI feel alive\nAnd the world, I'll turn it inside out, yeah\nI'm floating around in ecstasy\nSo (Don't stop me now)\n(Don't stop me)\n'Cause I'm having a good time\nHaving a good time\n\n[Verse 1]\nI'm a shooting star, leaping through the sky like a tiger\nDefying the laws of gravity\nI'm a racing car, passing by like Lady Godiva\nI'm gonna go, go, go, there's no stopping me\n\n[Pre-Chorus]\nI'm burning through the sky, yeah\nTwo hundred degrees, that's why they call me Mister Fahrenheit\nI'm travelling at the speed of light\nI wanna make a supersonic man outta you\n\n[Chorus]\n(Don't stop me now)\nI'm having such a good time, I'm having a ball\n(Don't stop me now)\nIf you wanna have a good time, just give me a call\n(Don't stop me now)\n'Cause I'm having a good time\n(Don't stop me now)\nYes, I'm having a good time\nI don't wanna stop at all, yeah\nYou migh

In [175]:
# queen_dict['songs']