# Downloading Lyrical Discographies from Genius Lyrics

Taking inspiration from [Alexus Brown's 2021 Project](https://github.com/Data-Science-for-Linguists-2021/Rapper_Topic_Modeling), I'm using the `lyricsgenius` package tutorial by [John W Miller](https://github.com/johnwmillr/LyricsGenius).  This file has been separated out so that API access won't be reiterated multiple times.

In [1]:
from lyricsgenius import Genius
import pandas as pd

In [2]:
# Access Token has been redacted.  Genius API account required.
genius = Genius("XXXXXXXXXXXXXXXXXXXXX")

In [3]:
genius.remove_section_headers = True
genius.verbose = False

## Test Run with Crispin Glover's Discography

In [4]:
# request Crispin Glover's full discography
artist = genius.search_artist("Crispin Glover", sort="title")

In [5]:
# example lyrics: Crispin Glover's "Clowny Clown Clown"
print(artist.song("Clowny Clown Clown").lyrics)

Clowny Clown Clown LyricsI was walking on the ground
I didn't make a sound
Then, I turned around
And I saw a clown
Had a frown
Stood on a mound
Started barking like a hound
Huh her ha hur ur
Clowny clown clown
When I came to what I found
He showed me something that was brown
So we became great friends and
Late in life, he got sick...
I gave him some soup, but he got
Worse and asked for its purse
It got it, but it was empty
So it cried a plenty
I wondered what to do
I didn't know what to think
So I got a drink
And then I showed it
Something that was round
And it died, smiled
And fell on the ground
Clowny clown clown
Thinking back about those days with the clown
I get teary-eyed and really snide
I think that deep down
I hated that clown
But not as much as Mr. Farr
I'm going to go smoke a cigar
Clowny clown clown
I was walking on the ground
I didn't make a sound
And then I turned around
I saw a clown
Clown
Clowny Clown clown
Ha ha!
I hate you clown
Your ugly frown
Smiley lips
Think I'll c

In [6]:
# save the lyrics to a JSON file
artist.save_lyrics()

Lyrics_CrispinGlover.json already exists. Overwrite?
(y/n): y
Wrote Lyrics_CrispinGlover.json.


In [7]:
import json

In [8]:
# test reopening the JSON file
with open("Lyrics_CrispinGlover.json") as file:
    cgFileLoad = json.load(file)

In [33]:
## This would show the full 'songs' value for Crispin Glover
# cgFileLoad['songs']

In [19]:
# create a dataframe that contains Crispin Glover's songs and their full data
testdf = pd.DataFrame(cgFileLoad['songs'])

In [20]:
# reduce the dataframe to just the necessary identifying information and the lyrics
testdf = testdf[['artist', 'title', 'lyrics']]

In [21]:
# When I clean the data, I'll have to remove lyric-less and "title Lyrics"
testdf

Unnamed: 0,artist,title,lyrics
0,Crispin Glover,Auto-Manipulator,Auto-Manipulator Lyrics\nWomen are sweet\nAnd ...
1,Crispin Glover,Clowny Clown Clown,Clowny Clown Clown LyricsI was walking on the ...
2,Crispin Glover,Never Say “Never” To Always,Never Say “Never” To Always LyricsAlways is al...
3,Crispin Glover,New Clean Song,New Clean Song Lyrics\nYesterday I had my birt...
4,Crispin Glover,Overture,
5,Crispin Glover,Selected Readings From Oak Mot Part I,Selected Readings From Oak Mot Part I LyricsOa...
6,Crispin Glover,Selected Readings From Oak Mot Part II,"Selected Readings From Oak Mot Part II Lyrics""..."
7,Crispin Glover,Selected Readings From Oak Mot Part III,Selected Readings From Oak Mot Part III Lyrics...
8,Crispin Glover,Selected Readings From Oak Mot Part IV,Selected Readings From Oak Mot Part IV Lyrics“...
9,Crispin Glover,Selected Readings From Rat Catching,Selected Readings From Rat Catching LyricsRat ...


In [25]:
# initialize empty list
musicians = []

# load musicians from wiki_musicians.py into a list
with open("0_wiki_musicians.txt") as mfile:
    for line in mfile:
        musicians.append(line.strip('\n'))
mfile.close()

In [35]:
# Show a bit of the musicians list
musicians[:15]

['Hasil Adkins',
 'Eden ahbez',
 'Ajdar',
 'Leona Anderson',
 'Brittany Anjou',
 'Nathaniel Ayers',
 'Syd Barrett',
 'Tryphosa Bates-Batcheller',
 'Leila Bela',
 'The Better Beatles',
 'Y. Bhekhirst',
 'Button King',
 'Captain Beefheart',
 'Cherry Sisters',
 'Corn Mo']

##  Download the Full Musicians List
[This 49 Years of Lyrics Project](https://towardsdatascience.com/49-years-of-lyrics-why-so-angry-1adf0a3fa2b4) used a try-except code block, that I used somewhat similarly below.

In [27]:
# initialize a list for artists that aren't found on Genius
unsuccessful = []

# access Genius for all musicians on the list
for m in musicians:
    try:
        artist = genius.search_artist(m, sort="title")
        artist.save_lyrics()
    except:
        unsuccessful.append(m)

Wrote Lyrics_HasilAdkins.json.
Wrote Lyrics_EdenAhbez.json.
Wrote Lyrics_Ajdar.json.
Wrote Lyrics_LeonaAnderson.json.
Wrote Lyrics_BrittanyAnjou.json.
Wrote Lyrics_NatAyer.json.
Wrote Lyrics_SydBarrett.json.
Wrote Lyrics_LeiaBLACKSWAN.json.
Wrote Lyrics_CaptainBeefheart.json.
Wrote Lyrics_CornMo.json.
Wrote Lyrics_LesCompagnonsDeLaChanson.json.
Wrote Lyrics_DavidCronenbergsWife.json.
Wrote Lyrics_Dr.Demento.json.
Wrote Lyrics_DIVINE.json.
Wrote Lyrics_ROE.json.
Wrote Lyrics_RokyErickson.json.
Wrote Lyrics_DamiãoExperiença.json.
Wrote Lyrics_ExtraditionOrder.json.
Wrote Lyrics_JadFair.json.
Wrote Lyrics_SteveFarnie.json.
Wrote Lyrics_WildManFischer.json.
Wrote Lyrics_JohnFrusciante.json.
Lyrics_CrispinGlover.json already exists. Overwrite?
(y/n): y
Wrote Lyrics_CrispinGlover.json.
Wrote Lyrics_MarkGormley.json.
Wrote Lyrics_RobertGraettinger.json.
Wrote Lyrics_PeterGrudzien.json.
Wrote Lyrics_LeslieHall.json.
Wrote Lyrics_NaomiHale.json.
Wrote Lyrics_DavidLiebeHart.json.
Wrote Lyrics_Pa

In the output above, the following files absolutely need verification of the artist: (Marked with F if not verified, T if verified)
- (F) Nat Ayer (Nathaniel Ayers)
- (F) Leia BLACKSWAN (Leila Bela) - this is K-Pop
- (F) LesCompagnonsDeLaChanson (Cromagnon)
- (F) DIVINE (Divine) - intended to be Pink Flamingos Divine
- (F) ROE (Marian Dora?)
- (F) Naomi Hale (Naomi Hall) - really wish I had these lyrics though
- (F) Lil Peep (Lil B) - This is popular Rap
- (F) Miguelito Angel Masso (Angela Masson)
- (F) Global Network Sophia Amato (Amaro Neto)
- (F) Sandra Phillips (Sondra Prill)
- (F) Smelly Ellie (Smelly)
- (T) Alexander Skip Spence (Skip Spence)
- (F) The Tillers (The Tinklers)
- (F) JPEGMAFIA (Wing)
- (F) John Reuben (Zapoppin')

In summary, all but Alexander Skip Spence are the wrong file. To handle this, I've put all the correct files in a folder in my private directory and left the extraneous files elsewhere.

In [45]:
print(len(unsuccessful), "Artists could not be found on Genius, not including those that were found incorrectly.")

26 Artists could not be found on Genius, not including those that were found incorrectly.
