## Summary

The notebook employs the LyricsGenius API, a third-party Python package, to scrape Genius.com for lyrics data. It begins by loading .csv files containing lists of artists sorted into specific genres. A custom function is then used to scrape data about these artists from Genius.com and save it to a separate .csv file. Due to occasional failures of the LyricsGenius API, the notebook includes a specialized cell designed to resume scraping from the last point reached in the list, ensuring continuity in data collection despite the API's intermittent challenges.

In [1]:
#Get access token for Genuis.com

import json 
f = open("path/to/credentials")
data = json.load(f)
token = data["access_token"]
#print(token)

In [2]:
# look at artists csv
import pandas as pd

artist_df = pd.read_csv("data/artist-main-genre.csv")
display(artist_df)

Unnamed: 0,artist,genre
0,Sean Paul,pop
1,Chris Brown,pop
2,Nickelback,rock
3,Ne-Yo,rap
4,The Fray,pop
...,...,...
568,"Tyler, The Creator",rap
569,Lil Tjay,rap
570,Anuel AA,rap
571,Chase Rice,country


In [3]:
artist_df.groupby("genre").count()
country_list = artist_df[artist_df["genre"]=="country"]["artist"].tolist()
rap_list = artist_df[artist_df["genre"]=="rap"]["artist"].tolist()
rock_list = artist_df[artist_df["genre"]=="rock"]["artist"].tolist()
pop_list = artist_df[artist_df["genre"]=="pop"]["artist"].tolist()

print(rap_list)

['Ne-Yo', 'T.I.', 'Yung Joc', 'Nelly Furtado', 'Gnarls Barkley', 'Chamillionaire', 'Nelly', 'Dem Franchize Boyz', 'Eminem', 'T-Pain', 'Ludacris', 'Jamie Foxx', 'Cascada', 'D4L', 'Kanye West', 'E-40', 'Juelz Santana', 'Busta Rhymes', 'Lil Jon', 'Three 6 Mafia', 'Bubba Sparxxx', 'Jibbs', 'Lil Wayne', 'Field Mob', 'Daddy Yankee', 'Wyclef Jean', 'Bow Wow', 'Young Dro', 'Paul Wall', 'Chingy', 'Krayzie Bone', 'Slim Thug', 'Gorillaz', 'Snoop Dogg', 'Mims', 'Shop Boyz', "Soulja Boy Tell'em", 'Jim Jones', 'JAY-Z', 'Diddy', 'Hurricane Chris', 'Fabolous', '50 Cent', 'Fat Joe', 'Plies', 'Pretty Ricky', 'Baby Boy Da Prince', 'Bone Thugs-N-Harmony', 'Rich Boy', 'DJ Khaled', 'Eve', 'Yung Berg', 'Kardinal Offishall', "Colby O'Donis", 'Webbie', 'Rick Ross', 'Baby Bash', 'Lupe Fiasco', 'David Banner', 'The Game', 'Drake', 'Kid Cudi', 'Jeremih', 'Young Money', 'Asher Roth', 'New Boyz', 'Dorrough', 'B.o.B', 'Gucci Mane', 'Cali Swag District', 'CeeLo Green', 'Wiz Khalifa', 'Bad Meets Evil', 'Dr. Dre', 'Wak

In [4]:
test_list = rap_list[:3]
print(test_list)

['Ne-Yo', 'T.I.', 'Yung Joc']


In [11]:
import csv
import requests
import lyricsgenius as lg

# Define the fields for the CSV where the lyrics will be stored
fields = ["title", "artist", "lyrics"]

# Setup the Genius API with a token and specific settings to filter and format the lyrics
genius = lg.Genius(token, skip_non_songs=True,
                   excluded_terms=["(Remix)", "(Live)"],
                   remove_section_headers=True, verbose=False, retries=5)

# Define a function to retrieve lyrics for a list of artists
def get_lyrics(artists: list, max_songs: int, output_path: str, write_headers: bool):
    c = 0  # Initialize a counter for the songs
    
    # Open the CSV file for appending data
    csvfile = open(output_path, "a")
    # Create a CSV DictWriter object to handle the writing of data
    writer = csv.DictWriter(csvfile, fieldnames=fields, restval='')
    
    # If specified, write the header row in the CSV file
    if write_headers:
        writer.writeheader()
    
    # Iterate over the list of artists to retrieve their songs
    for name in artists:
        try:
            # Search for the artist on Genius and get their songs
            songs = (genius.search_artist(name, max_songs=max_songs)).songs
            for song in songs:
                try:
                    c += 1  # Increment the song counter
                    # Create a dictionary for the current song with the desired information
                    song_dict = {
                        "title": song.title,
                        "artist": song.artist,
                        # Clean up the lyrics by replacing new lines with colons and splitting by the word "Lyrics:"
                        "lyrics": song.lyrics.replace("\n", ":").split("Lyrics:")[1]
                    }
                    # Write the song's information to the CSV file
                    writer.writerow(song_dict)
                except Exception as e:
                    # If an exception occurs during the song data processing, print the exception
                    print(f"Exception at {name}: {c}, {str(e)}")
            # Print a success message for the artist once all their songs are processed
            print(f"-----------------{song.artist} SUCCESSFULLY ADDED-------------------- \n"
                  f"  SONG COUNT: {c}")
        except requests.exceptions.HTTPError as error:
            # Handle HTTP errors during the artist search
            print("HTTP Error")
            print(error.args[0])
    # Close the CSV file after writing is complete
    csvfile.close()

    

In [12]:
#get rap artist data
# get_lyrics(rap_list, 200, "data/rap_lyrics.csv")

In [16]:
#find where script failed and function at that point
rap_df = pd.read_csv("data/rap_lyrics 2.csv")
print(rap_df.iloc[[len(rap_df)-1]].artist)

modified_rap_list = rap_list[rap_list.index("Migos")+1:]
display(modified_rap_list)

14480    Migos
Name: artist, dtype: object


['Kodak Black',
 '21 Savage',
 'Cardi B',
 'Childish Gambino',
 'Quavo',
 'French Montana',
 'XXXTENTACION',
 'KYLE',
 'Logic',
 'SZA',
 'Amine',
 'Ayo & Teo',
 'Playboi Carti',
 'Machine Gun Kelly',
 'A Boogie Wit da Hoodie',
 'Jon Bellion',
 'Swae Lee',
 'Juice WRLD',
 'Lil Pump',
 'NF',
 'Lil Baby',
 'YoungBoy Never Broke Again',
 'Rich The Kid',
 'Bad Bunny',
 'BlocBoy JB',
 'Offset',
 'Famous Dex',
 'Metro Boomin',
 'Lil Dicky',
 'Lil Skies',
 'A$AP Ferg',
 'Lil Nas X',
 'DaBaby',
 'Lizzo',
 'Lil Tecca',
 'YNW Melly',
 'Gunna',
 'Blueface',
 'City Girls',
 'Sheck Wes',
 'Calboy',
 'Megan Thee Stallion',
 'Mustard',
 'NLE Choppa',
 'Flipp Dinero',
 'Polo G',
 'Pinkfong',
 'Saweetie',
 'YK Osiris',
 'Tyler, The Creator',
 'Lil Tjay',
 'Anuel AA']

In [17]:
get_lyrics(modified_rap_list,200, "data/rap_lyrics 2.csv", write_headers=False)

Exception at Kodak Black: 43
Exception at Kodak Black: 88
Exception at Kodak Black: 199
-----------------Kodak Black SUCCESSFULLY ADDED-------------------- 
  SONG COUNT: 200
Exception at 21 Savage: 313
Exception at 21 Savage: 331
Exception at 21 Savage: 333
Exception at 21 Savage: 339
Exception at 21 Savage: 340
Exception at 21 Savage: 341
Exception at 21 Savage: 344
Exception at 21 Savage: 346
Exception at 21 Savage: 347
Exception at 21 Savage: 350
Exception at 21 Savage: 351
Exception at 21 Savage: 353
Exception at 21 Savage: 357
Exception at 21 Savage: 358
Exception at 21 Savage: 361
Exception at 21 Savage: 362
Exception at 21 Savage: 363
Exception at 21 Savage: 365
Exception at 21 Savage: 367
Exception at 21 Savage: 368
-----------------21 Savage SUCCESSFULLY ADDED-------------------- 
  SONG COUNT: 369
Exception at Cardi B: 401
Exception at Cardi B: 402
Exception at Cardi B: 403
Exception at Cardi B: 406
Exception at Cardi B: 416
Exception at Cardi B: 418
Exception at Cardi B: 42

Exception at Jon Bellion: 2505
Exception at Jon Bellion: 2535
Exception at Jon Bellion: 2536
Exception at Jon Bellion: 2538
Exception at Jon Bellion: 2539
Exception at Jon Bellion: 2544
Exception at Jon Bellion: 2561
Exception at Jon Bellion: 2566
Exception at Jon Bellion: 2571
Exception at Jon Bellion: 2578
Exception at Jon Bellion: 2594
Exception at Jon Bellion: 2595
Exception at Jon Bellion: 2597
Exception at Jon Bellion: 2598
-----------------Jon Bellion SUCCESSFULLY ADDED-------------------- 
  SONG COUNT: 2608
Exception at Swae Lee: 2629
Exception at Swae Lee: 2632
Exception at Swae Lee: 2637
Exception at Swae Lee: 2641
Exception at Swae Lee: 2642
Exception at Swae Lee: 2645
Exception at Swae Lee: 2648
Exception at Swae Lee: 2649
Exception at Swae Lee: 2653
Exception at Swae Lee: 2654
Exception at Swae Lee: 2655
Exception at Swae Lee: 2659
Exception at Swae Lee: 2667
Exception at Swae Lee: 2675
Exception at Swae Lee: 2677
Exception at Swae Lee: 2680
Exception at Swae Lee: 2684
Ex

Exception at Lizzo: 4954
Exception at Lizzo: 4962
Exception at Lizzo: 4973
Exception at Lizzo: 4975
Exception at Lizzo: 4977
Exception at Lizzo: 4978
Exception at Lizzo: 4980
Exception at Lizzo: 4982
Exception at Lizzo: 4984
Exception at Lizzo: 4988
Exception at Lizzo: 4989
Exception at Lizzo: 4990
Exception at Lizzo: 4993
-----------------Lizzo SUCCESSFULLY ADDED-------------------- 
  SONG COUNT: 4994
Exception at Lil Tecca: 5067
Exception at Lil Tecca: 5080
Exception at Lil Tecca: 5085
Exception at Lil Tecca: 5089
Exception at Lil Tecca: 5090
Exception at Lil Tecca: 5091
Exception at Lil Tecca: 5094
Exception at Lil Tecca: 5097
Exception at Lil Tecca: 5099
Exception at Lil Tecca: 5100
Exception at Lil Tecca: 5103
Exception at Lil Tecca: 5104
Exception at Lil Tecca: 5106
Exception at Lil Tecca: 5114
Exception at Lil Tecca: 5115
Exception at Lil Tecca: 5117
Exception at Lil Tecca: 5118
Exception at Lil Tecca: 5120
Exception at Lil Tecca: 5121
Exception at Lil Tecca: 5122
Exception at 

In [None]:
#get pop artist data
get_lyrics(pop_list, 200, "data/pop_lyrics.csv", write_headers=True)

Exception at Sean Paul: 21
Exception at Sean Paul: 23
Exception at Sean Paul: 32
Exception at Sean Paul: 37
Exception at Sean Paul: 51
Exception at Sean Paul: 52
Exception at Sean Paul: 53
Exception at Sean Paul: 63
Exception at Sean Paul: 64
Exception at Sean Paul: 66
Exception at Sean Paul: 70
Exception at Sean Paul: 72
Exception at Sean Paul: 74
Exception at Sean Paul: 78
Exception at Sean Paul: 82
Exception at Sean Paul: 83
Exception at Sean Paul: 87
Exception at Sean Paul: 89
Exception at Sean Paul: 92
Exception at Sean Paul: 96
Exception at Sean Paul: 100
Exception at Sean Paul: 102
Exception at Sean Paul: 106
Exception at Sean Paul: 107
Exception at Sean Paul: 108
Exception at Sean Paul: 109
Exception at Sean Paul: 111
Exception at Sean Paul: 113
Exception at Sean Paul: 114
Exception at Sean Paul: 118
Exception at Sean Paul: 119
Exception at Sean Paul: 121
Exception at Sean Paul: 124
Exception at Sean Paul: 125
Exception at Sean Paul: 127
Exception at Sean Paul: 128
Exception at

In [None]:
#get country artist data
get_lyrics(country_list, 200, "data/country_lyrics.csv", write_headers=True)

In [None]:
#get rock artist data
get_lyrics(rock_list, 200, "data/rock_lyrics.csv", write_headers=True)