I found a JSON file of Mac Miller Lyrics and wanted to do a little exercise to test out semantic language analysis using the Spacy NLP Framework.

The goal of this project is to take user input and perform a semantic analysis on it. That input will then be compared against each song found in the lyrics.json file imported from https://github.com/timbmg/mac-miller-lyrics-dataset/blob/main/data/lyrics.json. The algorithm will then compare the semantic similaric using Spacy Natural Language Processing methods.

I included in this project various different inputs then provided my own verbal write up of how the model performed in each case.

For the first input, I also provided a pandasgui display that allows for various data visualizations.

There is a lot to build on with this data, and these workflows.

Let's take a look at my approach to this topic!

In [59]:
#Importing the necessary libraries
import pandas as pd
import json
import spacy
import en_core_web_md
from pandasgui import show

In [52]:
# returns JSON object as 
with open('lyrics.json', 'r', encoding='utf-8') as f:
    file = json.loads(f.read())

#Verify the data
print(file[0].keys())

#Verify the data
file[0]['metadata'].keys()

#Verify the data
file[0]['url']

#Verify the data
file[0]['annotations']


dict_keys(['lyrics', 'metadata', 'url', 'annotations'])


[{'section': 'Intro: Telly & Mac Miller', 'start': 0, 'end': 14},
 {'section': 'Verse', 'start': 14, 'end': 48},
 {'section': 'Outro', 'start': 48, 'end': 59}]

In [61]:
#Input for semantic anlysis
input_vibe = 'Nike blazer shoes'

#set an empty list to store the data in a similar format as a pandas dataframe
data_records = []

#intialize spacy model
nlp = spacy.load("en_core_web_md")

#setting doc1 as the input_vibe to compare to each of the song lyrics
doc1 = nlp(input_vibe)

#intialize a for loop to iterate through the lyrics.json file; set each song to a vector and compare similarity to input_vibe, append to empty data_records list
for f in file:    
    lyric = ' '.join(f['lyrics'])
    song_title = f['metadata']['title']
    album_title = f['metadata']['album_title'] 
    doc2 = nlp(lyric) 
    sim = doc1.similarity(doc2)
    if sim > 0.1: 
        row_list = [song_title, album_title, lyric, sim]
        data_records.append(row_list)

#convert the cleaned json and similarity scores to a pandas dataframe    
df = pd.DataFrame(data_records, columns =['song_title', 'album_title', 'lyric', 'similarity'])

#sort the dataframe by similarity scores, most to least similar
df.sort_values('similarity', ascending=False)

#display the dataframe with pandasgui
show(df)


[W008] Evaluating Doc.similarity based on empty vectors.

PandasGUI INFO — pandasgui.gui — Opening PandasGUI


<pandasgui.gui.PandasGui at 0x1b7a8737700>

In [17]:
df[df['song_title']=='Nikes on My Feet']['lyric'].item()

'Woah Haha, yeah And the Nikes on my feet keep my cypher complete Nike-Nike-Nike-Nike-Nike-Nike-Nike-Nike-Nikes And the Nikes on my feet keep my cypher complete Nike-Nike-Nike-Nike-Nike-Nike-Nike-Nike-Nikes (Hey, hey) Nike-Nike-Nikes (Hey), Nike-Nike-Nikes (Haha) Nike-Nike-Nike-Nike-Nike-Nike-Nike-Nike-Nikes (We just some motherfuckin\' kids) Nikes on my feet keep my cypher complete (Just some motherfuckin\' kids) Nike-Nike-Nike-Nike-Nike-Nike-Nike-Nike-Nikes Ayy, lace \'em up, lace \'em up, lace \'em up, lace \'em Blue suede shoes stay crispy like bacon Nikes on my feet make my cypher complete, uh I stay shining like the lights on the street in the night Revis take me shoppin\' when I\'m up in New York Hit the shoe store, go and cop a few more You at the mall getting dinner at the food court I\'m in LA eatin\' twenty-two course Young boss, bitch, paper in my pockets I got a closet filled with shoeboxes Mom said my spending habit little bit obnoxious But a pilot stay fresh up in his co

The sim score output reflects the dot product of the input phrase (vec1) and the output song(vec2) all vectors range from 0 (no similarity at all) and 1 (perfectly similar.)  Ranking of songs is reflected based upon strictly this metric.

This first search verifies the the model is working at least well enough to pass the smell test ....
With the input "Nike blazer shoes" we get matched up with the most similar song (sim = 0.524) is the song Mac Miller - Nikes On My Feet. 

This input matches up to my known expected output. 

Now lets explore some different kinds of inputs and see how our model responds

In [16]:
nlp = spacy.load("en_core_web_md")

input_vibe = 'Pittsburgh 412 Steel City Football steelers bridges steel rivers'

data_records = []
nlp = spacy.load("en_core_web_md")
doc1 = nlp(input_vibe)
for f in file:    
    lyric = ' '.join(f['lyrics'])
    song_title = f['metadata']['title']
    album_title = f['metadata']['album_title'] ## can be done ahead of time
    doc2 = nlp(lyric) ## can be done ahead of time
    sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe
    row_list = [song_title, album_title, lyric, sim]
    data_records.append(row_list)
    
df = pd.DataFrame(data_records, columns =['song_title', 'album_title', 'lyric', 'similarity'])

df.sort_values('similarity', ascending=False)


Pittsburgh 412 Steel City Football steelers bridges steel rivers


  sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe


Unnamed: 0,song_title,album_title,lyric,similarity
225,Just My Imagination,The High Life,Aha Wake up in my king size bed Girls all arou...,0.299098
209,The High Life,The High Life,"I'm so high (So high, high) Find me somewhere ...",0.283345
268,Excelsior,Balloonerism*,"Yeah, ya Yeah, yeah, yeah, yeah Yeah, uh On Fo...",0.281823
67,Break the Law,GO:OD AM,"Get high, breakin' laws Get high, breakin' law...",0.276479
117,Bird Call,Watching Movies with the Sound Off,"Quack, quack Uh Uh-huh Uh I'm Chillin' for an ...",0.256443
...,...,...,...,...
330,Must Be,Unreleased Songs,"Yeah, yeah, yeah Mama, don't cry no more, I'm ...",-0.052150
110,Life,Live from Space,"Yeah *Gunshot, woman screaming* I didn't mean ...",-0.060720
259,My Biography,But My Mackin Ain’t Easy,This is my biography It's really something tha...,-0.065604
33,Nothing from Nothing,Spotify Singles,Nothin' from nothin' leaves nothin' You gotta ...,-0.076401


In [18]:
df[df['song_title']=='Just My Imagination']['lyric'].item()

"Aha Wake up in my king size bed Girls all around me, eyes all red (All red) The true life of a player kid It's nice to wake up to a naked bitch (Uh-huh) Got the finest clothes, diamonds, hoes Goin' to sleep on a pile of dough Lacoste, Phantoms, yards, mansions Piff in the cannons, house in the Hamptons (Livin' it up) Big parties every night This is the life of the paradise My main man Twista told 'em right I make 'em celebrities overnight (Overnight) Grammys, Oscars, starrin' in movies Got a pool house, four cars, and jacuzzis (What?) On my way up, gettin' my cake up (Uh-huh) Wake up! It was just my imagination Runnin' away with me It was just my imagination Runnin' away I got groupies who got groupies And I buy a new car every two weeks (Two weeks) My chain got diamonds in it And my whip got a diamond finish (Aha) Got the MTV j-j-jam of the week (Jam of the week) Sign autographs when I stand in the street (Uh-huh) Plus I been all over BET, Columbia Records and D.T.P. (D.T.P.) Both wa

Here we search the input: 'Pittsburgh 412 Steel City Football steelers bridges steel rivers' ... this list has several known features associated with the city of pittsburgh, let's see if we get a response from the model that matches. 

We got the song "Just my Imagination", the lyrics are displayed above. My thoughts are that this is a difficult to understand match from what I interrpreted as a straightforward input.

This is verified by the model by only scoring 0.299 on the sim score, but I need to do more thinking about how the input and these lyrics are thematically related. Seems back propagation through black box models is esoteric...

In [19]:
nlp = spacy.load("en_core_web_md")

input_vibe = 'esoteric'

data_records = []
nlp = spacy.load("en_core_web_md")
doc1 = nlp(input_vibe)
for f in file:    
    lyric = ' '.join(f['lyrics'])
    song_title = f['metadata']['title']
    album_title = f['metadata']['album_title'] ## can be done ahead of time
    doc2 = nlp(lyric) ## can be done ahead of time
    sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe
    row_list = [song_title, album_title, lyric, sim]
    data_records.append(row_list)
    
df = pd.DataFrame(data_records, columns =['song_title', 'album_title', 'lyric', 'similarity'])

df.sort_values('similarity', ascending=False)

esoteric


  sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe


Unnamed: 0,song_title,album_title,lyric,similarity
285,Your Shoes Are Untied,Unreleased Songs,"Imaginary girlfriend, imaginary whores, imagin...",0.595058
86,Funeral,Faces,"""I know what happens to you, me and everyone e...",0.505072
124,Suplexes Inside of Complexes & Duplexes,Watching Movies with the Sound Off,This is madness (madness) This is an outrage! ...,0.490970
294,Untitled Snippet (2/26/13),Unreleased Songs,"Can you take me to euphoria, according to the ...",0.489207
90,San Francisco,Faces,"Yeah, welcome to the dark side of my bizarre m...",0.479829
...,...,...,...,...
259,My Biography,But My Mackin Ain’t Easy,This is my biography It's really something tha...,0.146049
11,"Mad Flava, Heavy Flow (Interlude)",K.I.D.S.,Mad-f-f-f-flava- Ma- ma- ma- ma- mad flava Hea...,0.117700
89,55,Faces,,0.000000
262,Tambourine Dream,Balloonerism*,,0.000000


In [33]:
df[df['song_title']=='Your Shoes Are Untied']['lyric'].item()

"Imaginary girlfriend, imaginary whores, imaginary virgins (Virgins) Imaginary Gods, imaginary evils, protect me with a cross (Cross) Imaginary senses Go and fight a war for imaginary vengeance (Vengeance) My old bitch imaginary pregnant Guess I'll be a dad with imaginary presents (Presents) Pockets full of imaginary money, ask me how it feel I imagine that it's stunning (Stunning) In my dreams, I imagine that I'm running Very fast, away from an imaginary something (Something) Imaginary enemies Live under my skin with imaginary tendencies (Tendencies) Meet imaginary Suzie Killed herself tryna reach imaginary beauty (Beauty) Imaginary, imaginary, imaginary, imaginary Imaginary, imaginary, imaginary, imaginary Imaginary problems I like to close my eyes and imagine I'm forgotten (Forgotten) Imagine I belong in this world with the rest of you Imaginary goblins (Goblins) Asleep in this imaginary coffin Finally get a break from my imaginary conscious (Conscious) Imaginary images Killed a man

"Esoteric" a favorite word of mine, and a word I closely associate with Mac Miller's music. Let's see what this input gives us for a song...

The song output by the model was "Your Shoes are Untied" a song where Mac Miller explores several "imaginery" characters, explores the concepts of suicide, death, life beyond, God and the cirlces of life, all difficult to categorize, or one may say, esoteric.

This strong linguistic similarity is reflected in the similarity score of 0.599.

In [23]:
nlp = spacy.load("en_core_web_md")

input_vibe = 'Lazy sunday afternoon chillin listen to music, maybe (dont tell my mom) I will smoke a lot of weed'

doc1 = nlp(input_vibe)

print(doc1)

data_records = []
nlp = spacy.load("en_core_web_md")
doc1 = nlp(input_vibe)
for f in file:    
    lyric = ' '.join(f['lyrics'])
    song_title = f['metadata']['title']
    album_title = f['metadata']['album_title'] ## can be done ahead of time
    doc2 = nlp(lyric) ## can be done ahead of time
    sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe
    row_list = [song_title, album_title, lyric, sim]
    data_records.append(row_list)
    
df = pd.DataFrame(data_records, columns =['song_title', 'album_title', 'lyric', 'similarity'])

df.sort_values('similarity', ascending=False)

Lazy sunday afternoon chillin listen to music, maybe (dont tell my mom) I will smoke a lot of weed


  sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe


Unnamed: 0,song_title,album_title,lyric,similarity
325,Numbness,Unreleased Songs,"But don't give, but don't give True love finds...",0.891148
220,Travelin’ Man ’09,The High Life,Uh The High Life You know I stay on the move M...,0.890499
374,Bump Demon,Unreleased Songs,I love the way that you touch I think about it...,0.889863
348,Hide and Seek,Unreleased Songs,"Yeah Well God damn baby Yeah, okay Down down s...",0.881539
304,Sunday,Unreleased Songs,Countin' up the minutes that I wasted I could ...,0.880980
...,...,...,...,...
214,Jerome Weinberg Speaks (Interlude),The High Life,"""Boo! Fuck this dude! You're a clown""",0.404047
11,"Mad Flava, Heavy Flow (Interlude)",K.I.D.S.,Mad-f-f-f-flava- Ma- ma- ma- ma- mad flava Hea...,0.126223
167,Hole in My Pocket,Blue Slide Park,,0.000000
262,Tambourine Dream,Balloonerism*,,0.000000


In [35]:
df[df['song_title']=='Numbness']['lyric'].item()

'But don\'t give, but don\'t give True love finds you in the end But don\'t give, but don\'t give True love will find you in the end I told you once that I made it (Made it) That everything would start changin\' (Changin\') Been wakin\' up when it\'s rainin\', but I ain\'t one for complainin\' And I ain\'t one to tell somebody what to do With their own life, own life (Life) People say what I should do with mine Well, that\'s the shit that I don\'t like I go nice with that numbness (Numbness) Me usin\' drugs with my crutches (Crutches) Got purp inside of these Dutches Double cup and that\'s what I fuck with (Fuck with) \'Cause everybody got substance (Yeah) And everybody addicted (So?) Everybody got a life to live But not everybody go live it (Huh) I seen some people conflicted They lose their minds to these riches (Riches) Testin\' all of their limits then burnin\' all of their bridges How beautiful are these women? (Women) But I be callin\' them bitches (Bitches) And I be fallin\' a v

I wanted to explore how the model handles something a little unortodox like drug use so I put the following as input: 'Lazy sunday afternoon chillin listen to music, maybe (dont tell my mom) I will smoke a lot of weed.' 

This got an extremely high simlarity score (sim = 0.89114) not sure about you, but I do not see much linguistic similarity, this kind of stuff makes me question the veracity of the whole model ...

moving on!

In [36]:
nlp = spacy.load("en_core_web_md")

input_vibe = 'Lazy sunday afternoon chillin listen to music'
doc1 = nlp(input_vibe)

print(doc1)

data_records = []
nlp = spacy.load("en_core_web_md")
doc1 = nlp(input_vibe)
for f in file:    
    lyric = ' '.join(f['lyrics'])
    song_title = f['metadata']['title']
    album_title = f['metadata']['album_title'] ## can be done ahead of time
    doc2 = nlp(lyric) ## can be done ahead of time
    sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe
    row_list = [song_title, album_title, lyric, sim]
    data_records.append(row_list)
    
df = pd.DataFrame(data_records, columns =['song_title', 'album_title', 'lyric', 'similarity'])

df.sort_values('similarity', ascending=False)

Lazy sunday afternoon chillin listen to music


  sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe


Unnamed: 0,song_title,album_title,lyric,similarity
261,Take Me to Paradise,But My Mackin Ain’t Easy,Take me to paradise Cause this games about as ...,0.682943
88,Ave Maria,Faces,"Yea, and I can't feel my legs, I'm a paraplegi...",0.654828
152,Sunlight,Macadelic,"Check it out Uh-ah-ah, top of the morning to y...",0.654586
347,"Hold On, Let Go",Unreleased Songs,She go skiin' in the summer Maxin' out her cre...,0.653772
318,Paid Dues (The Reunion),Unreleased Songs,"Uh, The Ill Spoken 2008 shit Hahaha, we had to...",0.650268
...,...,...,...,...
139,Love Me As I Have Loved You,Macadelic,"(Girl laughing) Vibrer, une corde, une inspira...",0.219561
11,"Mad Flava, Heavy Flow (Interlude)",K.I.D.S.,Mad-f-f-f-flava- Ma- ma- ma- ma- mad flava Hea...,0.036769
262,Tambourine Dream,Balloonerism*,,0.000000
89,55,Faces,,0.000000


In [38]:
df[df['song_title']=='Take Me to Paradise']['lyric'].item()

'Take me to paradise Cause this games about as fair as life We in a war that nobody gets to fight Spit my life Bring truth to the people I\'m just a kid that likes to do shit illegal I spent the night to rock But this writers block is kinda like the cops Holding me back, I keep my dro in a sack So while I\'m holding this track, I keep my mind straight MC\'s fold and collapse due to the crime rates So I dip and climb gates, you never see me where you find jake Kid who rhymes great taking it back for old times sake Times change and no one wasn\'t with you in the past But I was letting records spin while I was sitting in the class I take two hits and pass while I listen to this jazz Walking home alone to find some kids to whip they ass \'Cause fuck it See the streets is cold, they might be eating your soul Rolling with dough like a pizza roll I used to steal little shit out the convenient store They used to catch me ask me "Mac, what you need this for?" I said "It\'s just a tootsie roll, 

I wanted to run a similar serach with removing the portion about illegal drug use to test the impact on the output.

The input here was: 'Lazy sunday afternoon chillin listen to music'

the output song was: Take me to Paradise

based on the title alone, it semantically matches up with the input phrase very well. The overall positive mood of the song and the positive associations of a lazy sunday afternoon checkout through the model with a sim score of 0.6829

In [39]:
nlp = spacy.load("en_core_web_md")

input_vibe = 'Doing illegal things for money, clout, women, and power'
doc1 = nlp(input_vibe)

print(doc1)

data_records = []
nlp = spacy.load("en_core_web_md")
doc1 = nlp(input_vibe)
for f in file:    
    lyric = ' '.join(f['lyrics'])
    song_title = f['metadata']['title']
    album_title = f['metadata']['album_title'] ## can be done ahead of time
    doc2 = nlp(lyric) ## can be done ahead of time
    sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe
    row_list = [song_title, album_title, lyric, sim]
    data_records.append(row_list)
    
df = pd.DataFrame(data_records, columns =['song_title', 'album_title', 'lyric', 'similarity'])

df.sort_values('similarity', ascending=False)

Doing illegal things for money, clout, women, and power


  sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe


Unnamed: 0,song_title,album_title,lyric,similarity
268,Excelsior,Balloonerism*,"Yeah, ya Yeah, yeah, yeah, yeah Yeah, uh On Fo...",0.842585
383,Amnesia,Unreleased Songs,"Yeah, yeah Yeah, yeah Might not, feel your, fa...",0.815879
260,Smoke Signals,But My Mackin Ain’t Easy,"Aight, John Record, Easy Mac Ill Spoken collec...",0.799370
115,I’m Not Real,Watching Movies with the Sound Off,"No Yeah No Ugh, passport, filling it up with s...",0.794488
117,Bird Call,Watching Movies with the Sound Off,"Quack, quack Uh Uh-huh Uh I'm Chillin' for an ...",0.789334
...,...,...,...,...
214,Jerome Weinberg Speaks (Interlude),The High Life,"""Boo! Fuck this dude! You're a clown""",0.232562
11,"Mad Flava, Heavy Flow (Interlude)",K.I.D.S.,Mad-f-f-f-flava- Ma- ma- ma- ma- mad flava Hea...,0.147878
89,55,Faces,,0.000000
262,Tambourine Dream,Balloonerism*,,0.000000


In [42]:
print(df[df['song_title']=='Excelsior']['lyric'].item())
print(df[df['song_title']=='Amnesia']['lyric'].item())

Yeah, ya Yeah, yeah, yeah, yeah Yeah, uh On Fourth Street, the orphan children play on the jungle gym Little Timmy broke his arm again on the monkey bars Johnny's dad got a nicer car than all the other kids He becomes the alpha and picks on everybody else Max protects Claire from all the bullies Claire always wish she was as pretty as Julie The boys always chase Julie around the sandbox Claire just waits 'til she gets picked up by her grandpa All of this before the brainwash starts Before they get polluted, start thinkin' like adults Life is fantasy and somersaults then Before the world tear apart imagination Before there were rules, before there were limits Your only enemies were (Want some Brussels sprouts and spinach?) Me, I used to want to be a wizard, when did life get so serious? Whatever happened to apple juice and cartwheels? Whatever happened to apple juice and cartwheels? Abracadabra, abracadabra Abracadabra (Hahahaha) Abracadabra, vadacadous
Yeah, yeah Yeah, yeah Might not, 

input: 'Doing illegal things for money, clout, women, and power'
output: 'Excelsior', 'Amnesia'

Not sure hows these match up, although i mention women in the input and seveal women's names are mentioned in the song. Semantically not what I expected but I think it does checkout upon preponderance. 

Sim score = 0.84

In [43]:
nlp = spacy.load("en_core_web_md")

input_vibe = 'Doing illegal things'
doc1 = nlp(input_vibe)

print(doc1)

data_records = []
nlp = spacy.load("en_core_web_md")
doc1 = nlp(input_vibe)
for f in file:    
    lyric = ' '.join(f['lyrics'])
    song_title = f['metadata']['title']
    album_title = f['metadata']['album_title'] ## can be done ahead of time
    doc2 = nlp(lyric) ## can be done ahead of time
    sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe
    row_list = [song_title, album_title, lyric, sim]
    data_records.append(row_list)
    
df = pd.DataFrame(data_records, columns =['song_title', 'album_title', 'lyric', 'similarity'])

df.sort_values('similarity', ascending=False)

Doing illegal things


  sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe


Unnamed: 0,song_title,album_title,lyric,similarity
91,Colors and Shapes,Faces,"""Have I answered the question, 'Who am I'? Mm-...",0.731374
81,It Just Doesn’t Matter,Faces,They've got the best equipment that money can ...,0.729652
0,Kickin’ Incredibly Dope Shit (Intro),K.I.D.S.,"When you're young, not much matters When you f...",0.726794
106,Objects in the Mirror (Live),Live from Space,"Some people people, woah oh-oh woah-oh-oh Yeah...",0.723153
185,All That,"I Love Life, Thank You","Hey, hey Uh, and it's the best day ever Every ...",0.722642
...,...,...,...,...
31,Floating,Circles (Deluxe),Woah-woah-woah-woah-woah-woah-woah-woah-woah-w...,0.316055
11,"Mad Flava, Heavy Flow (Interlude)",K.I.D.S.,Mad-f-f-f-flava- Ma- ma- ma- ma- mad flava Hea...,0.032025
262,Tambourine Dream,Balloonerism*,,0.000000
89,55,Faces,,0.000000


In [44]:
print(df[df['song_title']=='Colors and Shapes']['lyric'].item())

"Have I answered the question, 'Who am I'? Mm-hmm. Well, I confront it all the time We're teaching people how to use their head. The uh, point is, in order to use your head, you have to go out of your mind. You have to go out of all of the, uh, the esthetics and all the ways in which you think." If it was colors and shapes, the imaginary 'Stead of all of this weight that we have to carry Would you be able to breathe? And if you could just find where that comfort resides No distraction or movement that fucks wit' your mind Would you let them see? While beneath the ocean, I met with the captain Who sank to the floor on his ship All of his passengers escaped to safety But he was not done with his trip He looked up and smiled, asked me, "How do you do?'' I told him, "I'm losin' my grip" He told me, ''Son, if you want to hold onto yourself Then let yourself slip'' Fall, ooh Fall, oh Fall Oh, it feels good to fall Ooh, ooh These puzzles are so hard to make into pictures Of something that the

input: 'Doing illegal things'
output: 'Colors and Shapes'

Not sure hows these match up, he does mention several things in this, could be a noun heavy song

Sim score = 0.731

In [45]:
nlp = spacy.load("en_core_web_md")

input_vibe = 'sliding into the darkness of my mind'
doc1 = nlp(input_vibe)

print(doc1)

data_records = []
nlp = spacy.load("en_core_web_md")
doc1 = nlp(input_vibe)
for f in file:    
    lyric = ' '.join(f['lyrics'])
    song_title = f['metadata']['title']
    album_title = f['metadata']['album_title'] ## can be done ahead of time
    doc2 = nlp(lyric) ## can be done ahead of time
    sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe
    row_list = [song_title, album_title, lyric, sim]
    data_records.append(row_list)
    
df = pd.DataFrame(data_records, columns =['song_title', 'album_title', 'lyric', 'similarity'])

df.sort_values('similarity', ascending=False)

sliding into the darkness of my mind


  sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe


Unnamed: 0,song_title,album_title,lyric,similarity
86,Funeral,Faces,"""I know what happens to you, me and everyone e...",0.773462
335,Laundromat,Unreleased Songs,Ayo Jerm I need you to do me a favor real quic...,0.755599
294,Untitled Snippet (2/26/13),Unreleased Songs,"Can you take me to euphoria, according to the ...",0.755169
124,Suplexes Inside of Complexes & Duplexes,Watching Movies with the Sound Off,This is madness (madness) This is an outrage! ...,0.738771
213,5 O’Clock,The High Life,Somebody told me sleep was a cousin of death A...,0.733276
...,...,...,...,...
139,Love Me As I Have Loved You,Macadelic,"(Girl laughing) Vibrer, une corde, une inspira...",0.231538
11,"Mad Flava, Heavy Flow (Interlude)",K.I.D.S.,Mad-f-f-f-flava- Ma- ma- ma- ma- mad flava Hea...,0.118774
167,Hole in My Pocket,Blue Slide Park,,0.000000
262,Tambourine Dream,Balloonerism*,,0.000000


In [None]:
print(df[df['song_title']=='Funeral']['lyric'].item())

input: 'Sliding into the darkness of my mind'
output: 'Excelsior', 'Amnesia'

Perfect matchup, not much to add here.

Sim score = 0.77

In [46]:
nlp = spacy.load("en_core_web_md")

input_vibe = 'otter'
doc1 = nlp(input_vibe)

print(doc1)

data_records = []
nlp = spacy.load("en_core_web_md")
doc1 = nlp(input_vibe)
for f in file:    
    lyric = ' '.join(f['lyrics'])
    song_title = f['metadata']['title']
    album_title = f['metadata']['album_title'] ## can be done ahead of time
    doc2 = nlp(lyric) ## can be done ahead of time
    sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe
    row_list = [song_title, album_title, lyric, sim]
    data_records.append(row_list)
    
df = pd.DataFrame(data_records, columns =['song_title', 'album_title', 'lyric', 'similarity'])

df.sort_values('similarity', ascending=False)

otter


  sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe


Unnamed: 0,song_title,album_title,lyric,similarity
337,The Weather,Unreleased Songs,Hmm-mm Hmm-mm-mm Hmm-mm-mm What will exist in ...,0.297544
239,Snap Back,The Jukebox: Prelude to Class Clown,"Had some baggy gear, now my clothes tailored (...",0.294208
237,Swing Set,The Jukebox: Prelude to Class Clown,Crusing through the city with the windows down...,0.294004
88,Ave Maria,Faces,"Yea, and I can't feel my legs, I'm a paraplegi...",0.293945
196,Oy Vey,Best Day Ever,"One day, leggo! One day, I'ma be so rich that ...",0.289218
...,...,...,...,...
139,Love Me As I Have Loved You,Macadelic,"(Girl laughing) Vibrer, une corde, une inspira...",0.101434
11,"Mad Flava, Heavy Flow (Interlude)",K.I.D.S.,Mad-f-f-f-flava- Ma- ma- ma- ma- mad flava Hea...,0.026368
89,55,Faces,,0.000000
262,Tambourine Dream,Balloonerism*,,0.000000


In [47]:
print(df[df['song_title']=='The Weather']['lyric'].item())

Hmm-mm Hmm-mm-mm Hmm-mm-mm What will exist in forever? Forever nothin' without truth I can't tell you 'bout the weather I fell asleep watchin' the news Love will exist in forever Forever nothin' without you I can't tell you 'bout the weather I fell asleep watchin' the news Rain, always The state of mind, flooded brain Thoughts keep racin' by, jump and pray Away we fly, day by day Travel through space and time, save the day Baby, you wastin' time, a-okay Make up your crazy mind, fade away Don't you fade away into an area of grey (Ooh-ooh) We are, we are everything but plain (Ooh-ooh) We are, we are shinin' in the shade (Ooh-ooh) I am Poseidon on the waves What will exist in forever? Forever nothin' without truth I can't tell you 'bout the weather I fell asleep watchin' the news Love will exist in forever Forever nothin' without you I can't tell you 'bout the weather


Decided to try an animal?

input: 'otter'
output: 'The Weather'

Otter = animal that existis in nature, "the weather" a huge subcomponent of that nature thing. Is that where the semantic similarity comes from? I probably will never know, interesting to thing about. Seems to me my prior statement's lack of formality is reflected in the sim score.  

Sim score = 0.297

In [48]:
nlp = spacy.load("en_core_web_md")

input_vibe = 'confidence'
doc1 = nlp(input_vibe)

print(doc1)

data_records = []
nlp = spacy.load("en_core_web_md")
doc1 = nlp(input_vibe)
for f in file:    
    lyric = ' '.join(f['lyrics'])
    song_title = f['metadata']['title']
    album_title = f['metadata']['album_title'] ## can be done ahead of time
    doc2 = nlp(lyric) ## can be done ahead of time
    sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe
    row_list = [song_title, album_title, lyric, sim]
    data_records.append(row_list)
    
df = pd.DataFrame(data_records, columns =['song_title', 'album_title', 'lyric', 'similarity'])

df.sort_values('similarity', ascending=False)

confidence


  sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe


Unnamed: 0,song_title,album_title,lyric,similarity
299,The Network,Unreleased Songs,"Yeah, okay Uh, I'm the protector of the networ...",0.609383
185,All That,"I Love Life, Thank You","Hey, hey Uh, and it's the best day ever Every ...",0.604127
383,Amnesia,Unreleased Songs,"Yeah, yeah Yeah, yeah Might not, feel your, fa...",0.603556
91,Colors and Shapes,Faces,"""Have I answered the question, 'Who am I'? Mm-...",0.602002
269,Friendly Hallucinations,Balloonerism*,"Can you hear the whispers of an innocent, igno...",0.593221
...,...,...,...,...
214,Jerome Weinberg Speaks (Interlude),The High Life,"""Boo! Fuck this dude! You're a clown""",0.209757
11,"Mad Flava, Heavy Flow (Interlude)",K.I.D.S.,Mad-f-f-f-flava- Ma- ma- ma- ma- mad flava Hea...,0.070745
262,Tambourine Dream,Balloonerism*,,0.000000
167,Hole in My Pocket,Blue Slide Park,,0.000000


In [62]:
print(df[df['song_title']=='The Network']['lyric'].item())

Yeah, okay Uh, I'm the protector of the network Invent the architecture, make the world around you better The artificial intelligence to apply the pressure The paranoia always keep 'em in line If you speak the language, then you readin' my mind The abstract has no reason to rhyme Uh, no need to define, the rules are here to keep you alive Oh Lord, I think I'm seein' the signs So step aside, the evolution, death by execution Fresh air with less pollution, let the Earth be Earth They computin' the solutions, tryn' to find out how we do it It's the fuckin' revolution, let the church be church, yeah I'm fishin' from the moon, see how my dreams work Give you what you want, but bet I'm takin' what I need first I never said I was an expert, but You now connected to the network Uh, I'm the protector of the network Invent the architecture, so it's good for our successors Live inside of a spaghetti western I've been thinkin' I could hide in here forever Yeah, they stimulate us with simulated cre

input: 'Confidence'
output: 'The Network'

a big part of confidence is belief in the self, in this song Mac refers to himself a lot with the first person I, he also calls himself a king.

Sim score = 0.609

In [49]:
nlp = spacy.load("en_core_web_md")

input_vibe = 'existential'
doc1 = nlp(input_vibe)

print(doc1)

data_records = []
nlp = spacy.load("en_core_web_md")
doc1 = nlp(input_vibe)
for f in file:    
    lyric = ' '.join(f['lyrics'])
    song_title = f['metadata']['title']
    album_title = f['metadata']['album_title'] ## can be done ahead of time
    doc2 = nlp(lyric) ## can be done ahead of time
    sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe
    row_list = [song_title, album_title, lyric, sim]
    data_records.append(row_list)
    
df = pd.DataFrame(data_records, columns =['song_title', 'album_title', 'lyric', 'similarity'])

df.sort_values('similarity', ascending=False)

existential


  sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe


Unnamed: 0,song_title,album_title,lyric,similarity
285,Your Shoes Are Untied,Unreleased Songs,"Imaginary girlfriend, imaginary whores, imagin...",0.653077
185,All That,"I Love Life, Thank You","Hey, hey Uh, and it's the best day ever Every ...",0.632597
269,Friendly Hallucinations,Balloonerism*,"Can you hear the whispers of an innocent, igno...",0.632563
86,Funeral,Faces,"""I know what happens to you, me and everyone e...",0.629657
355,Family Lives,Unreleased Songs,The lazy guy is sleepin' on your couch again R...,0.621947
...,...,...,...,...
214,Jerome Weinberg Speaks (Interlude),The High Life,"""Boo! Fuck this dude! You're a clown""",0.193806
11,"Mad Flava, Heavy Flow (Interlude)",K.I.D.S.,Mad-f-f-f-flava- Ma- ma- ma- ma- mad flava Hea...,0.152730
167,Hole in My Pocket,Blue Slide Park,,0.000000
262,Tambourine Dream,Balloonerism*,,0.000000



input: 'existential'
output: 'Your Shoes are untied'

a big part of confidence is belief in the self, in this song Mac refers to himself a lot with the first person I, he also calls himself a king.

"Existential" a favorite word of mine, and a word I closely associate with Mac Miller's music. Let's see what this input gives us for a song...

The song output by the model was "Your Shoes are Untied" a song where Mac Miller explores several "imaginery" characters, explores the concepts of suicide, death, life beyond, God and the cirlces of life, all difficult to categorize, or one may say, existential.

This strong linguistic similarity is reflected in the similarity score of 0.653.

In [50]:
nlp = spacy.load("en_core_web_md")

input_vibe = 'big comfy bed'
doc1 = nlp(input_vibe)

print(doc1)

data_records = []
nlp = spacy.load("en_core_web_md")
doc1 = nlp(input_vibe)
for f in file:    
    lyric = ' '.join(f['lyrics'])
    song_title = f['metadata']['title']
    album_title = f['metadata']['album_title'] ## can be done ahead of time
    doc2 = nlp(lyric) ## can be done ahead of time
    sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe
    row_list = [song_title, album_title, lyric, sim]
    data_records.append(row_list)
    
df = pd.DataFrame(data_records, columns =['song_title', 'album_title', 'lyric', 'similarity'])

df.sort_values('similarity', ascending=False)

big comfy bed


  sim = doc1.similarity(doc2) ## has to be done AFTER input_vibe


Unnamed: 0,song_title,album_title,lyric,similarity
346,Hos Go Crazy,Unreleased Songs,"Rich Gang, Mac what it do fool! Hoes go crazy,...",0.481162
211,Musical Chairs,The High Life,Ayo.. Yo what up Jerm? Beat's crazy 'bout to g...,0.473059
259,My Biography,But My Mackin Ain’t Easy,This is my biography It's really something tha...,0.464907
327,Nothing on Me,Unreleased Songs,I hear these kids and they trying to spit But ...,0.460660
196,Oy Vey,Best Day Ever,"One day, leggo! One day, I'ma be so rich that ...",0.459907
...,...,...,...,...
209,The High Life,The High Life,"I'm so high (So high, high) Find me somewhere ...",0.092914
139,Love Me As I Have Loved You,Macadelic,"(Girl laughing) Vibrer, une corde, une inspira...",0.071099
262,Tambourine Dream,Balloonerism*,,0.000000
167,Hole in My Pocket,Blue Slide Park,,0.000000


not sure how 'big comfy bed' ---> 'Hos Go Crazy' but I guess that happened ... science, always leaving more unanswered questions than when you started. 

input: 'existential'
output: 'Hos Go Crazy'

???