In [2]:
![logo](images/guitar_logo.png)

zsh:1: unknown file attribute: i


#### _**Using Genius.com API + Beautiful Soup to get song lyrics and write them to local text & pickle files.**_

_Citations:_

Functions for acquiring the scraping the URLs and write to files is inspired by this very helpful tutorial: [How to Scrape Song Lyrics: A Gentle Tutorial](https://medium.com/analytics-vidhya/how-to-scrape-song-lyrics-a-gentle-python-tutorial-5b1d4ab351d2) by Nick Pai.

Getting my Genius.com API key: https://docs.genius.com/#/getting-started-h1

In [15]:
GENIUS_API_TOKEN='5la2M_pYH7rZ653TL8ulhRnTwi6Gyy7RfhKtK5wp0tcG3xilwiWfuhTSHni5keuP'

In [16]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

import statsmodels.api as sm

# Make HTTP requests
import requests

# Scrape data from an HTML document
from bs4 import BeautifulSoup
import lxml
import html5lib

# I/O
import os

# Search and manipulate strings
import re

# pickle files for later use 
import pickle

## Necessary functions

#### 1. Get a list of Genius.com URL’s for a specified number of songs for a given artist

In [17]:
# Get artist object from Genius API
def request_artist_info(artist_name, page):
    base_url = 'https://api.genius.com'
    headers = {'Authorization': 'Bearer ' + GENIUS_API_TOKEN}
    search_url = base_url + '/search?per_page=10&page=' + str(page)
    data = {'q': artist_name}
    response = requests.get(search_url, data=data, headers=headers)
    return response

# Get Genius.com song url's from artist object
def request_song_url(artist_name, song_cap):
    page = 1
    songs = []
    
    while True:
        response = request_artist_info(artist_name, page)
        json = response.json()
        # Collect up to song_cap song objects from artist
        song_info = []
        for hit in json['response']['hits']:
            if artist_name.lower() in hit['result']['primary_artist']['name'].lower():
                song_info.append(hit)
    
        # Collect song URL's from song objects
        for song in song_info:
            if (len(songs) < song_cap):
                url = song['result']['url']
                songs.append(url)
            
        if (len(songs) == song_cap):
            break
        else:
            page += 1
        
#     print('Found {} songs by {}'.format(len(songs), artist_name))
    print(f'Found {len(songs)} songs by {artist_name}')
    return songs

In [18]:
# checking process quickly by requesting 2 Bob Dylan songs 

request_song_url('Bob Dylan', 2)

Found 2 songs by Bob Dylan


['https://genius.com/Bob-dylan-murder-most-foul-lyrics',
 'https://genius.com/Bob-dylan-blowin-in-the-wind-lyrics']

#### 2. Fetch lyrics from the URLs

*Note: The below function had to be edited from the above-referenced [Analytics Vidhya Medium article](https://medium.com/analytics-vidhya/how-to-scrape-song-lyrics-a-gentle-python-tutorial-5b1d4ab351d2) to work when I used it more recently.*

Replaced  

`lyrics = html.find('div', class_='lyrics').get_text()`

with 

`lyrics = html.select_one('div[class^="lyrics"], div[class^="SongPage__Section"]').get_text(separator="\n")`
    
*Citation for this fix: https://stackoverflow.com/questions/67324013/beautiful-soup-sometimes-return-nonetype*

In [36]:
def scrape_song_lyrics(url):
    page = requests.get(url)
    html = BeautifulSoup(page.text, 'html.parser')
    lyrics = html.select_one(
        'div[class^="lyrics"], div[class^="SongPage__Section"]'
    ).get_text(separator="\n")    
    #remove identifiers like chorus, verse, etc
    lyrics = re.sub(r'[\(\[].*?[\)\]]', '', lyrics)
    #remove empty lines
    lyrics = os.linesep.join([s for s in lyrics.splitlines() if s])         
    return lyrics

In [38]:
# DEMO -- making sure the new function works 
print(scrape_song_lyrics('https://genius.com/Lana-del-rey-young-and-beautiful-lyrics'))

I've seen the world, done it all
Had my cake now
Diamonds, brilliant,
 
and Bel Air now
Hot summer nights, mid-July
When you and I were forever wild
The crazy days, city lights
The way you'd play with me like a child
Will you still love me when I'm no longer young and beautiful?
Will you still love me when I got nothing but my aching soul?
I know you will, I know you will, I know that you will
Will you still love me when I'm no longer beautiful?
I've seen the world, lit it up as my stage now
Channeling angels in the new age now
Hot summer days
, rock and roll
The way you'd play for me at your show
And all the ways I got to know
Your pretty face and electric soul
Will you still love me when I'm no longer young and beautiful?
Will you still love me when I got nothing but my aching soul?
I know you will, I know you will, I know that you will
Will you still love me when I'm no longer beautiful?
Dear Lord, when I get to heaven
Please, let me bring my man
When he comes, tell me that You'll l

In [33]:
# checking function again by grabbing lyrics for one Bob Dylan song 

scrape_song_lyrics('https://genius.com/Bob-dylan-murder-most-foul-lyrics')

'\'Twas a dark day in Dallas, November \'63\nA day that will live on in infamy\nPresident\u2005Kennedy\u2005was a-ridin\' high\nGood\u2005day to be livin\' and a\u2005good day to die\nBeing led to the slaughter like a sacrificial lamb\nHe said, "Wait a minute, boys, you know who I am?"\n"Of course we do, we know who you are"\nThen they blew off his head while he was still in the car\nShot down like a dog in broad daylight\nWas a matter of timing and the timing was right\nYou got unpaid debts, we\'ve come to collect\nWe\'re gonna kill you with hatred, without any respect\nWe\'ll mock you and shock you and we\'ll grin in your face\nWe\'ve already got someone here to take your place\nThe day they blew out the brains of the king\nThousands were watching, no one saw a thing\nIt happened so quickly, so quick, by surprise\nRight there in front of everyone\'s eyes\nGreatest magic trick ever under the sun\nPerfectly executed, skillfully done\nWolfman, oh Wolfman, oh Wolfman, howl\nRub-a-dub-dub

In [34]:
# trying print view instead of returning lyrics as I did in the previous cell 

print (scrape_song_lyrics('https://genius.com/Bob-dylan-murder-most-foul-lyrics'))

'Twas a dark day in Dallas, November '63
A day that will live on in infamy
President Kennedy was a-ridin' high
Good day to be livin' and a good day to die
Being led to the slaughter like a sacrificial lamb
He said, "Wait a minute, boys, you know who I am?"
"Of course we do, we know who you are"
Then they blew off his head while he was still in the car
Shot down like a dog in broad daylight
Was a matter of timing and the timing was right
You got unpaid debts, we've come to collect
We're gonna kill you with hatred, without any respect
We'll mock you and shock you and we'll grin in your face
We've already got someone here to take your place
The day they blew out the brains of the king
Thousands were watching, no one saw a thing
It happened so quickly, so quick, by surprise
Right there in front of everyone's eyes
Greatest magic trick ever under the sun
Perfectly executed, skillfully done
Wolfman, oh Wolfman, oh Wolfman, howl
Rub-a-dub-dub, it's a murder most foul
Hush, little children, you

#### 3. Loop through all URL’s and write lyrics to one file

In [21]:
def write_lyrics_to_file(artist_name, song_count):
    f = open('lyrics/' + artist_name.lower() + '.txt', 'wb')
    urls = request_song_url(artist_name, song_count)
    for url in urls:
        lyrics = scrape_song_lyrics(url)
        f.write(lyrics.encode("utf8"))
    f.close()
    num_lines = sum(1 for line in open('lyrics/' + artist_name.lower() + '.txt', 'rb'))
    print(f'Wrote {num_lines} lines to file from {song_count} songs.')

In [11]:
# murder = scrape_song_lyrics('https://genius.com/Bob-dylan-murder-most-foul-lyrics')
# f = open('murder.pkl', 'wb')
# pickle.dump(murder, f)
# f.close()

*Note: Only the above function to write the lyrics to a local .txt file was in the [Analytics Vidhya Medium article](https://medium.com/analytics-vidhya/how-to-scrape-song-lyrics-a-gentle-python-tutorial-5b1d4ab351d2). That however didn't work for my purposes when I tried to load the files for cleaning and analysis, so I modified it to write a function that would create a local .pkl file of the lyrics. The pickled files did work, and you can find the code to load and clean in the [02_data_cleaning](projects/bob_dylan/02_data_cleaning.ipynb) notebook.*

In [41]:
def pickle_lyrics_to_file(artist_name, song_count):
    urls = request_song_url(artist_name, song_count)

    t = open('lyrics/' + artist_name.lower() + '.txt', 'wb')
    for url in urls:
        lyrics = scrape_song_lyrics(url)
        t.write(lyrics.encode("utf8"))
    t.close()
    
    f = open('lyrics/' + artist_name.lower() + '.pkl', 'wb')
    for url in urls: 
        lyrics = scrape_song_lyrics(url)
        pickle.dump(lyrics, f)
    f.close()
    
    num_lines = sum(1 for line in open('lyrics/' + artist_name.lower() + '.txt', 'rb'))
    pickle_lines = sum(1 for line in open('lyrics/' + artist_name.lower() + '.pkl', 'rb'))
    
    print(f'Wrote {num_lines} lines and pickled {pickle_lines} lines to file from {song_count} songs.')
#     print(f'Wrote {} lines to file from {} songs'.format(num_lines, song_count))

### Demo 

_confirming viability of process by scraping a few songs I don't plan to use in the analysis_

In [72]:
pickle_lyrics_to_file('led zeppelin', 10)

Found 10 songs by led zeppelin
Wrote 319 lines and pickled 319 lines to file from 10 songs.


In [74]:
pickle_lyrics_to_file('the beatles', 7)

Found 7 songs by the beatles
Wrote 222 lines and pickled 222 lines to file from 7 songs.


In [39]:
pickle_lyrics_to_file('johnny cash', 3)

Found 3 songs by johnny cash
Wrote 39 lines and pickled 39 lines to file from 3 songs.


In [42]:
pickle_lyrics_to_file('johnny cash', 3)

Found 3 songs by johnny cash
Wrote 89 lines and pickled 89 lines to file from 3 songs.


_Note: I moved Led Zeppelin, The Beatles, and Johnny Cash to the `test_cases` folder after making sure the `pickle_lyrics_to_file` function works._

## Scraping my artists

_Note: The 10 musicians were chosen because they are classic rock performers that are, at least in-part, known for their lyricism. I also selected them based on my affinity for and knowledge of their music, as each of these people is both a favorite musician of mine and someone whose lyrics will make for an interesting comparison in my analysis - or so I believe now._

In [22]:
#musician names 

musicians = ['bob_dylan', 'neil_young', 'willie_nelson', 'the_band', 'john_prine', 'leonard_cohen', 'janis_joplin', 'linda_ronstadt', 'mark_knopfler', 'david_bowie', 'stevie_nicks']

In [23]:
len(musicians)

10

In [24]:
pickle_lyrics_to_file('bob dylan', 100)

Found 100 songs by bob dylan
Wrote 4949 lines and pickled 4952 lines to file from 100 songs.


In [25]:
pickle_lyrics_to_file('neil young', 100)

Found 100 songs by neil young
Wrote 2939 lines and pickled 2939 lines to file from 100 songs.


In [26]:
pickle_lyrics_to_file('willie nelson', 100)

Found 100 songs by willie nelson
Wrote 2604 lines and pickled 2604 lines to file from 100 songs.


In [27]:
pickle_lyrics_to_file('the band', 100)

Found 100 songs by the band
Wrote 3830 lines and pickled 3830 lines to file from 100 songs.


In [29]:
pickle_lyrics_to_file('john prine', 100)

Found 100 songs by john prine
Wrote 3221 lines and pickled 3221 lines to file from 100 songs.


In [30]:
pickle_lyrics_to_file('leonard cohen', 100)

Found 100 songs by leonard cohen
Wrote 4147 lines and pickled 4152 lines to file from 100 songs.


In [31]:
pickle_lyrics_to_file('janis joplin', 100)

Found 100 songs by janis joplin
Wrote 2765 lines and pickled 2765 lines to file from 100 songs.


In [32]:
pickle_lyrics_to_file('linda ronstadt', 100)

Found 100 songs by linda ronstadt
Wrote 2678 lines and pickled 2678 lines to file from 100 songs.


In [33]:
pickle_lyrics_to_file('mark knopfler', 100)

Found 100 songs by mark knopfler
Wrote 3205 lines and pickled 3205 lines to file from 100 songs.


In [34]:
pickle_lyrics_to_file('david bowie', 100)

Found 100 songs by david bowie
Wrote 3694 lines and pickled 3698 lines to file from 100 songs.


In [35]:
pickle_lyrics_to_file('stevie nicks', 100)

Found 100 songs by stevie nicks
Wrote 4416 lines and pickled 4417 lines to file from 100 songs.
