# AZLyrics Scraper

With this code you could scrape individual lyrics pages or whole artist pages for some of their basic data. This information will be saved in a csv file. To retrieve the data of the artist/band you want, you need to change the following link within the quotation markts with that of the AZLyrics artist page (make sure you use the full link). 

In [19]:
index_url = 'https://www.azlyrics.com/l/larkinpoe.html'

## Setting up

First we start by importing various python libraries:

In [20]:
import sys
import csv
import requests
import time
import random
from bs4 import BeautifulSoup

Then we make functions to retrieve the urls from the web and look for certain elements within the webpages:

In [21]:
def load_page(url):
    with requests.get(url) as f:
        page = f.text
    return page

def get_element_text(element):
    try:
        return element.text.strip()
    except AttributeError as e:                     
        print('Element not found, error: {}'.format(e), file=sys.stderr)
        return ''

def get_genre_element(html):
    try:
        genre_script = html.find('body').find_all('script')[2]
        genre = str(genre_script).split()[9][1:-2]
        return genre
    except AttributeError as e:                     
        print('Element not found, error: {}'.format(e), file=sys.stderr)
        return ''

## Getting the individual lyrics
We proceed to make a function to get the basic information from each song on a songpage from AZLyrics:

In [22]:
def get_song_info(url):
    song_page = BeautifulSoup(load_page(url), 'lxml')                  
    interesting_html = song_page.find(class_='container main-page')    
    if not interesting_html:
        print('No information availible for song at {}'.format(url), file=sys.stderr)
        return {}                                                      
    album = get_element_text(interesting_html.find(class_='songinalbum_title').find('b'))[1:-1]
    album_released = get_element_text(interesting_html.find(class_='songinalbum_title'))[-5:-1]
    credits = get_element_text(interesting_html.find_all(class_='smt')[2])[11:]
    genre = get_genre_element(song_page)
    lyrics = get_element_text(interesting_html.find('div', {'class':None}))
    return {'album': album, 'album release': album_released,'credits': credits, 'genre': genre, 'lyrics': lyrics}                      

## Getting all artist songs

Now that we are able to get the basic information of each song we now need to define a function to look over the table containing all the songs from the artist and extract the song title and the associated link. The links will be important as it will allow us to loop over the songs and retrieve them one by one using the previous function. 

In [23]:
def get_songs(url):
    index_page = BeautifulSoup(load_page(url), 'lxml')        
    items = index_page.find(id="listAlbum")                   
    if not items:                                             
        print('Something went wrong!', file=sys.stderr)
        sys.exit()
    data = []
    for row in items.find_all(class_= 'listalbum-item'):          
        song = get_element_text(row.find('a'))
        link = row.find('a').get('href')
        link = 'https://www.azlyrics.com/' + str(link)
        data.append({    
                         'song': song,
                         'link': link,
                        })
    return data

## Scraping

The following code scrapes AZLyrics for the data for all the given artist's songs. This may take a while depending on the amount of songs released by the artist.

In [None]:
song_data = get_songs(index_url)                      
for row in song_data:
    print('Scraping info on {}.'.format(row['song'])) #can be useful for debugging
    url = row['link']
    song_info = get_song_info(url)                    
    for key, value in song_info.items():
        row[key] = value
    time.sleep(random.uniform(3,8)) #take this faster code if you have <100 songs you want to download
    #time.sleep(random.uniform(4,16)) #take this slower code if you have >100 songs you want to download


Scraping info on Long Hard Fall.
Scraping info on We Intertwine.
Scraping info on Burglary.
Scraping info on To Myself.
Scraping info on Shadows Of Ourselves.
Scraping info on The Principle Of Silver Lining.
Scraping info on Ball And Chain.
Scraping info on Nothin' But Air.
Scraping info on Fairbanks, Alaska.
Scraping info on Praying For The Bell.
Scraping info on Sea Song.
Scraping info on Wrestling A Stranger.
Scraping info on Natalie.
Scraping info on Enough For You.
Scraping info on By The Pier.
Scraping info on In My Time Of Dying (Live).
Scraping info on Principle Of Silver Lining (Live).
Scraping info on Teardrop (Live).
Scraping info on I Belong To Love.
Scraping info on Leave.
Scraping info on I Can Almost.
Scraping info on Tired.
Scraping info on As Good As You.
Scraping info on Missing Home.
Scraping info on Wait For Me.
Scraping info on Widow's Walk.
Scraping info on Jailbreak.
Scraping info on Don't.
Scraping info on Stubborn Love.
Scraping info on Dandelion.
Scraping info

## Writing data into CSV

In this last part we will write down the data we have just scraped in a csv file and convert it to a table using the pandas module in python. From this dataframe we could acces the data easily and perform operations on them.

In [7]:
filename = index_url.rsplit('/', 1)[1][:-5]
with open(filename, 'w', encoding='utf-8') as f:       
    fieldnames=['song', 'album', 'album release', 'genre','credits', 'lyrics']
    writer = csv.DictWriter(f,
                            delimiter=',',                
                            quotechar='"',                
                            quoting=csv.QUOTE_NONNUMERIC, 
                            fieldnames=fieldnames
                            )
    writer.writeheader()                                  
    for row in song_data:
        writer.writerow({k:v for k,v in row.items() if k in fieldnames})

## Optional: Creating a Dataframe from the CSV file

This allows you to see what is in the csv file from this Juypiter Notebook and do operations on the data if so desired.

In [8]:
import pandas as pd

df = pd.read_csv(filename) 
for index, value in df.dtypes.items(): 
    if value == 'object':
        df[index] = df[index].fillna('')
    else:
        df[index] = df[index].fillna(0)
df['album release'] = df['album release'].astype(int)
df['lyrics'] = df['lyrics'].astype('string')
df

Unnamed: 0,song,album,album release,genre,credits,lyrics
0,Long Hard Fall,Spring,2010,pop,,"And me, I cried when I told the truth I cried..."
1,We Intertwine,Spring,2010,pop,"Rebecca Anne Lovell, Megan Lovell",As the cold turns to frost and the day becomes...
2,Burglary,Spring,2010,pop,,I never meant to be your mark I never meant t...
3,To Myself,Spring,2010,pop,,What good could come from a world gone bad He...
4,Shadows Of Ourselves,Spring,2010,pop,,"Oh, warm in my bed With your head on my chest..."
...,...,...,...,...,...,...
89,Ramblin' Man,Kindred Spirits,2020,pop,Dickey Betts,"Lord, I was born a ramblin' man Tryna make a ..."
90,Bell Bottom Blues,Kindred Spirits,2020,pop,Eric Clapton,"Bell bottom blues, you made me cry I don't wa..."
91,Crocodile Rock,Kindred Spirits,2020,pop,"Elton John, Bernie Taupin","Well, I remember when rock was young Me and S..."
92,Black Echo,,0,,,
