# AZLyrics Scraper

With this code you could scrape individual lyrics pages or whole artist pages for some of their basic data. First we start by importing the relevant libraries:

In [None]:
import sys
import csv
import requests
import re
from bs4 import BeautifulSoup

Then we make functions to retrieve the urls from the web and look for certain elements within the webpages:

In [None]:
def load_page(url):
    with requests.get(url) as f:
        page = f.text
    return page

def get_element_text(element):
    try:
        return element.text.strip()
    except AttributeError as e:                     
        print('Element not found, error: {}'.format(e), file=sys.stderr)
        return ''

## Getting the individual lyrics
We proceed to make a function to get the basic information from each song on a songpage from AZLyrics:

In [None]:
def get_song_info(url):
    song_page = BeautifulSoup(load_page(url), 'lxml')                  
    interesting_html = song_page.find(class_='container main-page')    
    if not interesting_html:
        print('No information availible for song at {}'.format(url), file=sys.stderr)
        return {}                                                      
    album = get_element_text(interesting_html.find(class_='songinalbum_title'))[8:-8]
    album_released = get_element_text(interesting_html.find(class_='songinalbum_title'))[-5:-1]
    credits = get_element_text(interesting_html.find('small'))[11:] 
    lyrics = get_element_text(interesting_html.find('div', {'class':None}))
    return {'album': album, 'album release': album_released,'credits': credits, 'lyrics': lyrics}                      

The previous functions can be tested by using the following code:

In [None]:
song_url = 'https://www.azlyrics.com/lyrics/genesis/wherethesourturnstosweet.html' #you should be able to replace this link with that of your favorite song
song_info = get_song_info(song_url)
for key, value in song_info.items():
    if key == 'lyrics': #you can replace 'lyrics' with any one of the keys from the dictionary we just made
        print(value)

## Getting all artist songs


In [6]:
def get_songs(url):
    index_page = BeautifulSoup(load_page(url), 'lxml')        
    items = index_page.find(id="listAlbum")                   
    if not items:                                             
        print('Something went wrong!', file=sys.stderr)
        sys.exit()
    data = []
    for row in items.find_all(class_= 'listalbum-item'):          
        song = row.find('a').text.strip()
        link = row.find('a').get('href')
        link = 'https://www.azlyrics.com/' + str(link)
        data.append({    
                         'song': song,
                         'link': link,
                        })
    return data

## Scraping

The following code scrapes AZLyrics for the data for all the given artist's songs. After a certain amount of songs it will give an error as the site will try to block you from scraping. I have thusfar not found any easy code to fix this issue.

In [10]:
index_url = 'https://www.azlyrics.com/g/genesis.html' 
song_data = get_songs(index_url)                      
for row in song_data:
    print('Scraping info on {}.'.format(row['song'])) 
    url = row['link']
    song_info = get_song_info(url)                    
    for key, value in song_info.items():
        row[key] = value                              

Scraping info on Where The Sour Turns To Sweet.
Scraping info on In The Beginning.
Scraping info on Fireside Song.
Scraping info on The Serpent.
Scraping info on Am I Very Wrong?.
Scraping info on In The Wilderness.
Scraping info on The Conqueror.
Scraping info on In Hiding.
Scraping info on One Day.
Scraping info on Window.
Scraping info on In Limbo.
Scraping info on Silent Sun.
Scraping info on A Place To Call My Own.
Scraping info on A Winter's Tale.


Element not found, error: 'NoneType' object has no attribute 'text'


Scraping info on One-Eyed Hound.
Scraping info on That's Me.
Scraping info on Silent Sun.
Scraping info on Image Blown Out.
Scraping info on She's So Beautiful.


Element not found, error: 'NoneType' object has no attribute 'text'


Scraping info on Looking For Someone.
Scraping info on White Mountain.


Element not found, error: 'NoneType' object has no attribute 'text'
Element not found, error: 'NoneType' object has no attribute 'text'
Element not found, error: 'NoneType' object has no attribute 'text'
Element not found, error: 'NoneType' object has no attribute 'text'


Scraping info on Visions Of Angels.


Element not found, error: 'NoneType' object has no attribute 'text'
Element not found, error: 'NoneType' object has no attribute 'text'
Element not found, error: 'NoneType' object has no attribute 'text'
Element not found, error: 'NoneType' object has no attribute 'text'


Scraping info on Stagnation.


Element not found, error: 'NoneType' object has no attribute 'text'
Element not found, error: 'NoneType' object has no attribute 'text'
Element not found, error: 'NoneType' object has no attribute 'text'
Element not found, error: 'NoneType' object has no attribute 'text'


Scraping info on Dusk.


Element not found, error: 'NoneType' object has no attribute 'text'
Element not found, error: 'NoneType' object has no attribute 'text'
Element not found, error: 'NoneType' object has no attribute 'text'
Element not found, error: 'NoneType' object has no attribute 'text'


Scraping info on The Knife.


Element not found, error: 'NoneType' object has no attribute 'text'
Element not found, error: 'NoneType' object has no attribute 'text'
Element not found, error: 'NoneType' object has no attribute 'text'
Element not found, error: 'NoneType' object has no attribute 'text'


Scraping info on The Musical Box.


KeyboardInterrupt: 

## Writing data into dataframe

In this last part we will write down the data we have just scraped in a csv file and convert it to a table using the pandas module in python. From this dataframe we could acces the data easily and perform operations on them.

In [11]:
with open('songs.csv', 'w', encoding='utf-8') as f:       
    fieldnames=['song', 'album', 'album release', 'credits', 'lyrics']
    writer = csv.DictWriter(f,
                            delimiter=',',                
                            quotechar='"',                
                            quoting=csv.QUOTE_NONNUMERIC, 
                            fieldnames=fieldnames
                            )
    writer.writeheader()                                  
    for row in song_data:
        writer.writerow({k:v for k,v in row.items() if k in fieldnames})

In [12]:
import pandas as pd

dataset = pd.read_csv('songs.csv') 
dataset = dataset.dropna()
dataset['album release'] = dataset['album release'].astype(int)
dataset['lyrics'] = dataset['lyrics'].astype('string')
dataset



Unnamed: 0,song,album,album release,credits,lyrics
0,Where The Sour Turns To Sweet,From Genesis To Revelation,1969,Genesis,We're waiting for you Come and join us now W...
1,In The Beginning,From Genesis To Revelation,1969,Genesis,Ocean of motion Squirming around and up and d...
2,Fireside Song,From Genesis To Revelation,1969,Genesis,As daybreak breaks the mist upon the earth It...
3,The Serpent,From Genesis To Revelation,1969,Genesis,"Dark night, planets are set Creator prepares ..."
4,Am I Very Wrong?,From Genesis To Revelation,1969,Genesis,Am I very wrong To hide behind the glare from...
5,In The Wilderness,From Genesis To Revelation,1969,Genesis,Leaving all the world to play they disappear ...
6,The Conqueror,From Genesis To Revelation,1969,Genesis,He climbs inside the looking glass And points...
7,In Hiding,From Genesis To Revelation,1969,Genesis,"Pick me up, put me down Push me in, turn me r..."
8,One Day,From Genesis To Revelation,1969,Genesis,Don't get me wrong I think I'm in love But t...
9,Window,From Genesis To Revelation,1969,Genesis,"Slowly I stretch out my arms, freely Shadows ..."
