# Radiohead Song Lyric Analysis
### Data Gathering

Before we can perform analysis on Radiohead song lyrics, we need to obtain and organize them. To start, I found the songs that I wanted the lyrics for. Radiohead has a lot of songs, but I only wanted the ones from their nine studio albums (I'm more familiar with these so working with them should be easier). Using data from [Wikipedia](https://en.wikipedia.org/wiki/Radiohead_discography), I put them into a [`yaml` file](Radiohead-Discography.yaml), organized them by album and song title.

Now that I have the names, I need the corresponding lyrics. I was able to find the following code on [Quora](https://www.quora.com/Whats-a-good-api-to-use-to-get-song-lyrics), created by [Sagun Shrestha](https://www.quora.com/profile/Sagun-Shrestha-7), to get lyrics from [azlyrics.com](azlyrics.com):

In [1]:
import re
import urllib.request
from bs4 import BeautifulSoup

def get_lyrics(artist, song_title):
    artist = artist.lower()
    song_title = song_title.lower()
    # remove all except alphanumeric characters from artist and song_title
    artist = re.sub('[^A-Za-z0-9]+', "", artist)
    song_title = re.sub('[^A-Za-z0-9]+', "", song_title)
    if artist.startswith("the"):  # remove starting 'the' from artist e.g. the who -> who
        artist = artist[3:]
    url = "http://azlyrics.com/lyrics/" + artist + "/" + song_title + ".html"

    try:
        content = urllib.request.urlopen(url).read()
    except Exception as e:
        print(e)
        return None
    soup = BeautifulSoup(content, 'html.parser')
    lyrics = str(soup)
    # lyrics lie between up_partition and down_partition
    up_partition = '<!-- Usage of azlyrics.com content by any third-party lyrics provider is prohibited by our licensing agreement. Sorry about that. -->'
    down_partition = '<!-- MxM banner -->'
    lyrics = lyrics.split(up_partition)[1]
    lyrics = lyrics.split(down_partition)[0]
    lyrics = lyrics.replace('<br>', '').replace('</br>', '').replace('</div>', '').strip()
    return lyrics


To test, let's grab the lyrics for their song "Kid A" (I can never tell what they're saying there):

In [3]:
print(get_lyrics("Radiohead", "Kid A"))

I slipped away
I slipped on a little white lie

We got heads on sticks
You got ventriloquists
We got heads on sticks
You got ventriloquists

Standing in the shadows at the end of my bed <i>[x4]</i>

Rats and children follow me out of town
Rats and children follow me out of town
Come on kids...


That's nice. With that done, it was only a matter of looping through the `yaml` file and getting the lyrics for each song. After doing this, I put them into a [`csv` file](lyrics.csv) with columns for `artist`, `album`, `song`, and `lyrics`.

This process worked for most songs, but the following did not give lyrics:  
- How Do You Do?
- High and Dry
- ***Treefingers***
- Packt Like Sardines in a Crushd Tin Box
- Pulk/Pull Revolving Doors
- Morning Bell/Amnesiac
- ***Hunting Bears***
- 2 + 2 = 5 (The Lukewarm.)
- Sit Down. Stand Up. (Snakes & Ladders.)
- Sail to the Moon. (Brush the Cobwebs out of th...
- Backdrifts. (Honeymoon is Over.)
- Go to Sleep. (Little Man being Erased.)
- Where I End and You Begin. (The Sky is Falling...
- We suck Young Blood. (Your Time is up.)
- The Gloaming. (Softly Open our Mouths in the C...
- There there. (The Boney King of Nowhere.)
- I Will. (No man's Land.)
- A Punchup at a Wedding. (No no no no no no no ...
- Myxomatosis. (Judge, Jury & Executioner.)
- Scatterbrain. (As Dead as Leaves.)
- A Wolf at the Door. (It Girl. Rag Doll.)
- ***Feral***

For some songs in the list, this is because they have no lyrics. Those songs are emboldened and italicized.  

For the other songs, it's because [azlyrics.com](azlyrics.com) has them under a different name than I do:  
- How Do You Do? -> How Do You ~~Do~~?  
- High and Dry -> High **&** Dry  
- Packt Like Sardines in a Crushd Tin Box -> Packt Like Sardines in a Crush**e**d Tin Box  
- Pulk/Pull Revolving Doors -> **Pull / Pulk** Revolving Doors  
- Morning Bell/Amnesiac -> **Amnesiac / Morning Bell**  

All of the songs with parentheses in the title are from the album Hail to the Thief, where each song was given two names. [azlyrics.com](azlyrics.com) only uses the first name for each song.

### Data Cleanup

Sometimes the lyrics given by [azlyrics.com](azlyrics.com) were not in a useful form.