In [6]:
import numpy as np
import pandas as pd
import spotipy
import spotipy.util as util
from spotipy.oauth2 import SpotifyClientCredentials
import sys

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

from ipywidgets import widgets, interactive
import plotly.figure_factory as ff
import plotly.graph_objs as go
from IPython.display import HTML
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

from bs4 import BeautifulSoup
import time
from selenium.common.exceptions import StaleElementReferenceException
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException

# Sentiment Analysis of Spotify's Top 50 of 2010

Now comes another interesting part of this project - incorporating lyrical sentiment analysis into how we view a song. We all have experience where the music can compound the lyrics and give it a powerful effect. For the time being, we will start with the most popular songs of the past decade - is it possible that this is something that makes these songs so popular? 

So we will start by taking the file that was made from running the other notebook (SpotifyAudioFeatures).

In [7]:
%run -i 'client_id_script.py'

In [66]:
tens_df = pd.read_csv('top2010s.csv')

However, because we are reading a csv, some of the columns that were once lists or numerical values are not strings... thus, we'll have to make some edits to change those back.

In [70]:
rsults = []
for i in range(len(tens_df)):
    result = [x.strip() for x in tens_df.iloc[i]["Artists"].replace("[", "").replace("]", "").replace("'", "").split(",")]
    rsults.append(result)
tens_df["Artists"] = rsults

For our sentiment analysis, we will be using a package called VADER (Valence Aware Dictionary and sEntiment Reasoner). It is specifically geared toward uncovering sentiment in social media. This may not be the perfect sentiment analysis package, but this could be used to compare to another sentiment analysis tool in the future.

In [9]:
analyser = SentimentIntensityAnalyzer()

In [10]:
def sentiment_analyzer_scores(sentence):
    score = analyser.polarity_scores(sentence)
    print("{:-<40} {}".format(sentence, str(score)))

For VADER, we can look at sentences (or even paragraphs) and see what proportion of the sentence is considered "positive", "negative", or "neutral" in sentiment. We will test this out with lyrics from the song 'Let it Go':

In [11]:
sentiment_analyzer_scores("Let it go, let it go \n Can't hold it back anymore \n Let it go, let it go \n Turn away and slam the door")

Let it go, let it go 
 Can't hold it back anymore 
 Let it go, let it go 
 Turn away and slam the door {'neg': 0.106, 'neu': 0.894, 'pos': 0.0, 'compound': -0.3818}


When we look at the entire phrase, we see that it's slightly negative, mostly neutral, and not positive at all. However, we can also break the sentence in half to see the difference between the two lines:

In [12]:
sentiment_analyzer_scores("Let it go, let it go \n Turn away and slam the door")

Let it go, let it go 
 Turn away and slam the door {'neg': 0.191, 'neu': 0.809, 'pos': 0.0, 'compound': -0.3818}


In [82]:
sentiment_analyzer_scores("Let it go, let it go \n Can't hold it back anymore")

Let it go, let it go 
 Can't hold it back anymore {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}


That gives us a basic look into how we will use the package.

## Web Scraping

Unfortunately, the lyrics won't get themselves. I could have just manually google-searched every song for its lyrics, but that would be a lot of work, and it's cool to be able to find a way to automate the simple things in life with a script. We will take a look at my approach and how I came to finally getting all the lyrics.

### Testing... Google Search

My first thought was: "Oh, it would be nice if I could just google-search the song, and get the lyrics from the main page." I went ahead and tried that. To do this, you will need a few things:

Selenium (Python package)
BeautifulSoup (Python package)
Chrome Driver

When you have the first two downloaded and you have the path to where the chrome driver file is, you can run the script below (be sure to replace the file path)

In [14]:
browser = webdriver.Chrome("../FEHeroes/chromedriver")
#gg = ("https://gamepress.gg/feheroes/heroes")
#browser.get(gg)
#respData = browser.page_source
search_string = "let+it+go+lyrics"

browser.get("https://www.google.com/search?q=" + search_string)
    #clickme = browser.find_element_by_xpath("//div[@class='hide-focus-ring pSO8Ic vk_arc']")
    #clickme = browser.find_element_by_xpath("//div[@aria-label='Show more']/div[@class='hide-focus-ring pSO8Ic vk_arc' and text()='Show more']")
    #clickme.click()
    #with wait_for_page_load(browser):
browser.find_element_by_xpath("//div[@class='hide-focus-ring pSO8Ic vk_arc']").click()
time.sleep(5)
    #WebDriverWait(browser, 20).until(lambda browser: browser.execute_script("return document.readyState;") == "complete")
    #browser.find_element_by_xpath("//div[@aria-label='Show more']/div[@class='hide-focus-ring pSO8Ic vk_arc' and text()='Show more']")
matched_elements = browser.page_source

browser.close()

Running the above will get us the page, but we will need to make it parsable. Which is where BeautifulSoup comes in handy. 

After running the below cell, you may realize that it's difficult to try to find where the lyrics are on the page. It takes some looking into the structure of the html, but you will be able to find it, and when you do, you'll be able to get the lyrics.

In [86]:
nsoup = BeautifulSoup(matched_elements, 'html.parser')
#nsoup.findAll('span', jsname = "YS01Ge")

In [84]:
#nsoup.findAll('div', role = "button")

In [18]:
nsoup.findAll('span', jsname = "YS01Ge")

[<span jsname="YS01Ge">The snow glows white on the mountain tonight</span>,
 <span jsname="YS01Ge">Not a footprint to be seen</span>,
 <span jsname="YS01Ge">A kingdom of isolation</span>,
 <span jsname="YS01Ge">And it looks like I'm the queen</span>,
 <span jsname="YS01Ge">The wind is howling like this swirling storm inside</span>,
 <span jsname="YS01Ge">Couldn't keep it in, heaven knows I've tried</span>,
 <span jsname="YS01Ge">Don't let them in, don't let them see</span>,
 <span jsname="YS01Ge">Be the good girl you always have to be</span>,
 <span jsname="YS01Ge">Conceal, don't feel, don't let them know</span>,
 <span jsname="YS01Ge">Well, now they know</span>,
 <span jsname="YS01Ge">Let it go, let it go</span>,
 <span jsname="YS01Ge">Can't hold it back anymore</span>,
 <span jsname="YS01Ge">Let it go, let it go</span>,
 <span jsname="YS01Ge">Turn away and slam the door</span>,
 <span jsname="YS01Ge">I don't care what they're going to say</span>,
 <span jsname="YS01Ge">Let the storm 

We have the lyrics, so it works, right? Not so fast... unfortunately, if you look carefully at the lyrics, you'll notice a section that goes:

<span jsname="YS01Ge">It's funny how some distance makes</span>,  
<span jsname="YS01Ge">It's funny how some distance makes everything seem small</span>,

If you look at the Google Search page, you'll notice that there's a button that the user has to click on to get all the lyrics. Before you click, you get the first line. But after you click on it, you get the second line. However, the page source shows both lines. This would take an extra step to try to figure out, because you'd have to be able to figure out a pattern for where this phenomenon occurs.

We can test it with another song:

In [19]:
browser = webdriver.Chrome("../FEHeroes/chromedriver")
#gg = ("https://gamepress.gg/feheroes/heroes")
#browser.get(gg)
#respData = browser.page_source
searchi_string = "we+are+young+featuring+janelle+monae+lyrics"

for i in range(1): 
    browser.get("https://www.google.com/search?q=" + searchi_string)
    elements = browser.page_source

browser.close()

In [20]:
soup = BeautifulSoup(elements, 'html.parser')
soup.findAll('span', jsname = "YS01Ge")

[<span jsname="YS01Ge">Give me a second I</span>,
 <span jsname="YS01Ge">I need to get my story straight</span>,
 <span jsname="YS01Ge">My friends are in the bathroom getting higher than the Empire State</span>,
 <span jsname="YS01Ge">My lover she's waiting for me just across the bar</span>,
 <span jsname="YS01Ge">My seat's been taken by some sunglasses asking about a scar, and</span>,
 <span jsname="YS01Ge">I know I gave it to you months ago</span>,
 <span jsname="YS01Ge">I know you're trying to forget</span>,
 <span jsname="YS01Ge">But between the drinks and subtle things</span>,
 <span jsname="YS01Ge">The holes in my apologies, you know</span>,
 <span jsname="YS01Ge">I'm trying hard to take it back</span>,
 <span jsname="YS01Ge">So if by the time the bar closes</span>,
 <span jsname="YS01Ge">And you feel like falling down, I'll carry you home</span>,
 <span jsname="YS01Ge">Tonight, we are young</span>,
 <span jsname="YS01Ge">So let's set the world on fire</span>,
 <span jsname="YS01

This one is even tougher, because instead of one line, it looks like it's a set of two lines that showcase the phenomenon. Thus, this won't work out that well for automation.

### Moving on to AZ Lyrics

So what next? We will try AZ lyrics, a website that contains accessible lyrics of popular songs. We will try this with another song in our list:

In [21]:
browser = webdriver.Chrome("../FEHeroes/chromedriver")
#gg = ("https://gamepress.gg/feheroes/heroes")
#browser.get(gg)
#respData = browser.page_source
singer = "shakira"
song = "wakawakathistimeforafrica"

for i in range(1): 
    browser.get("https://www.azlyrics.com/lyrics/" + singer + "/" + song + ".html")
    el = browser.page_source

browser.close()

In [22]:
so = BeautifulSoup(el, 'html.parser')

Running the commented-out line will give us every text item in the page source. From here, we can parse through it to get our desired lyrics. 

Luckily, the page structure for all of the lyrics look quite similar. We will try this pattern below to see if we can get all of the lyrics

In [23]:
#so.findAll(text = True)

In [24]:
beg = so.findAll(text = True).index(' Usage of azlyrics.com content by any third-party lyrics provider is prohibited by our licensing agreement. Sorry about that. ')
en = so.findAll(text = True).index(' MxM banner ')

In [88]:
#so.findAll(text = True)[beg + 1:en].apply(strip("/n"))
lyr = [x.strip("\n") for x in so.findAll(text = True)[beg + 1:en]]
sentiment_analyzer_scores(" ".join(lyr))

I'm not afraid (I'm not afraid) To take a stand (to take a stand) Everybody (everybody) Come take my hand (come take my hand) We'll walk this road together, through the storm Whatever weather, cold or warm Just letting you know that you're not alone Holler if you feel like you've been down the same road (same road)  Yeah, it's been a ride I guess I had to, go to that place, to get to this one Now some of you, might still be in that place If you're trying to get out, just follow me I'll get you there  You can try and read my lyrics off of this paper before I lay 'em But you won't take the sting out these words before I say 'em 'Cause ain't no way I'mma let you stop me from causing mayhem When I say I'mma do something I do it I don't give a damn what you think I'm doing this for me, so fuck the world Feed it beans, it's gassed up, if it thinks it's stopping me I'mma be what I set out to be, without a doubt undoubtedly And all those who look down on me I'm tearing down your balcony No ifs

In [29]:
browser = webdriver.Chrome("../FEHeroes/chromedriver")
#gg = ("https://gamepress.gg/feheroes/heroes")
#browser.get(gg)
#respData = browser.page_source
singer = "xxxtentacion"
song = "sad"

for i in range(1): 
    browser.get("https://www.azlyrics.com/lyrics/" + singer + "/" + song + ".html")
    eles = browser.page_source

browser.close()

In [89]:
s = BeautifulSoup(eles, 'html.parser')
#s.findAll("div")[14]

In [31]:
s.find('div', text = "Yeah")

In [32]:
begi = s.findAll(text = True).index(' Usage of azlyrics.com content by any third-party lyrics provider is prohibited by our licensing agreement. Sorry about that. ')
endo = s.findAll(text = True).index(' MxM banner ')
lyrim = [x.strip("\n") for x in s.findAll(text = True)[begi + 1:endo]]
sentiment_analyzer_scores(" ".join(lyrim))

Yeah  Who am I? Someone that's afraid to let go, uh You decide if you're ever gonna let me know (yeah) Suicide if you ever try to let go, uh I'm sad, I know, yeah, I'm sad, I know, yeah Who am I? Someone that's afraid to let go, uh You decide if you're ever gonna let me know (yeah) Suicide if you ever try to let go, uh I'm sad, I know, yeah, I'm sad, I know, yeah  I gave her everything She took my heart and left me lonely I think broken heart's contagious I won't fix, I'd rather weep I'm lost then I'm found But it's torture being in love I love when you're around But I fucking hate when you leave  Who am I? Someone that's afraid to let go, uh You decide if you're ever gonna let me know (yeah) Suicide if you ever try to let go, uh I'm sad, I know, yeah I'm sad, I know, yeah Who am I? Someone that's afraid to let go, uh You decide if you're ever gonna let me know (yeah) Suicide if you ever try to let go, uh I'm sad, I know, yeah I'm sad, I know, yeah  Who am I? Someone that's afraid to l

In [34]:
def azartistprep(artist):
    if "&" in artist:
        art, tist = artist.split("&")
        return art.lower().replace(" ", "").replace(".", "")
    if "The" in artist:
        art, tist = artist.split("The")
        return tist.lower().replace(" ", "").replace(".", "")
    return artist.lower().replace(" ", "").replace(".", "")

def azlyricprep(song):
    if " - " in song:
        end = ""
        a, b = song.split("-", 1)
        if a == "Cheerleader":
            return "cheerleaderfelixjaehnremix"
        if "Remix" in b:
            c, d = b.split("Remix")
            end += c.lower().replace(" ", "") + "remix"
        a = a.lower().replace(" ", "")
        a += end
        return a.replace("'", "").replace("!", "").replace(".", "")
    if "[" in song:
        a, b = song.split("[", 1)
        a = a.lower().replace("(", "").replace(")", "").replace(" ", "")
        return a.replace("'", "").replace("!", "").replace(".", "")
    if "(feat." in song:
        a, b = song.split("(feat.", 1)
        a = a.lower().replace(" ", "")
        return a.replace("'", "").replace("!", "").replace(".", "").replace("(", "").replace(")", "")
    else:
        return song.lower().replace("'", "").replace("!", "").replace(".", "").replace("(", "").replace(")", "").replace(" ", "").replace("ñ", "n")

In [35]:
azlyricprep("Waka Waka (This Time for Africa) [The Official. (feat.)")

'wakawakathistimeforafrica'

The two cells below incorporates all of the work and investigation that we did in relation to lyrics from AZ lyrics, allowing us to obtain the page sources for all the lyrics playlist, plus extract just the lyrics for every song.

In [73]:
sources = []

browser = webdriver.Chrome("../FEHeroes/chromedriver")
#gg = ("https://gamepress.gg/feheroes/heroes")
#browser.get(gg)
#respData = browser.page_source

for index, song in tens_df.iterrows():
    singer = azartistprep(song["Artists"][0])
    song = azlyricprep(song["Name"])
    browser.get("https://www.azlyrics.com/lyrics/" + singer + "/" + song + ".html")
    sources.append(browser.page_source)
    time.sleep(10)

browser.close()

In [76]:
lyrics = []
for source in sources:
    so = BeautifulSoup(source, 'html.parser')
    beg = so.findAll(text = True).index(' Usage of azlyrics.com content by any third-party lyrics provider is prohibited by our licensing agreement. Sorry about that. ')
    en = so.findAll(text = True).index(' MxM banner ')
    lyr = [x.strip("\n") for x in so.findAll(text = True)[beg + 1:en]]
    lyrics.append(lyr)

## Sentiment Analysis

In [93]:
def sentiment_scores(sentence): 
  
    # Create a SentimentIntensityAnalyzer object. 
    sid_obj = SentimentIntensityAnalyzer() 
  
    # polarity_scores method of SentimentIntensityAnalyzer 
    # oject gives a sentiment dictionary. 
    # which contains pos, neg, neu, and compound scores. 
    sentiment_dict = sid_obj.polarity_scores(" ".join(sentence))
    return sentiment_dict
    
#    print("Overall sentiment dictionary is : ", sentiment_dict) 
#    print("sentence was rated as ", sentiment_dict['neg']*100, "% Negative") 
#    print("sentence was rated as ", sentiment_dict['neu']*100, "% Neutral") 
#    print("sentence was rated as ", sentiment_dict['pos']*100, "% Positive") 

In [94]:
for x in lyrics:
    print(sentiment_scores(x))

{'neg': 0.052, 'neu': 0.799, 'pos': 0.149, 'compound': 0.9919}
{'neg': 0.144, 'neu': 0.723, 'pos': 0.133, 'compound': -0.2755}
{'neg': 0.099, 'neu': 0.825, 'pos': 0.076, 'compound': -0.2287}
{'neg': 0.088, 'neu': 0.737, 'pos': 0.175, 'compound': 0.9891}
{'neg': 0.05, 'neu': 0.764, 'pos': 0.186, 'compound': 0.9984}
{'neg': 0.093, 'neu': 0.762, 'pos': 0.145, 'compound': 0.9938}
{'neg': 0.066, 'neu': 0.654, 'pos': 0.28, 'compound': 0.9982}
{'neg': 0.017, 'neu': 0.744, 'pos': 0.239, 'compound': 0.9921}
{'neg': 0.157, 'neu': 0.633, 'pos': 0.21, 'compound': 0.9933}
{'neg': 0.289, 'neu': 0.621, 'pos': 0.09, 'compound': -0.9958}
{'neg': 0.223, 'neu': 0.69, 'pos': 0.087, 'compound': -0.9948}
{'neg': 0.215, 'neu': 0.735, 'pos': 0.05, 'compound': -0.9982}
{'neg': 0.077, 'neu': 0.822, 'pos': 0.101, 'compound': 0.8934}
{'neg': 0.021, 'neu': 0.772, 'pos': 0.207, 'compound': 0.9993}
{'neg': 0.041, 'neu': 0.812, 'pos': 0.147, 'compound': 0.9905}
{'neg': 0.066, 'neu': 0.688, 'pos': 0.246, 'compound': 0

In [78]:
sentiment_scores(lyrics[0][12])

{'neg': 0.272, 'neu': 0.728, 'pos': 0.0, 'compound': -0.5216}

## Future Ideas for Analysis

In the context of comparing sentiment of lyrics to audio features, there is another avenue that can be taken, which may appear in a future update:

* Take genres and audio features of genres and compare to lyrical sentiment - see if there is a relationship between lyrical sentiment and audio features for genres.
* A deep dive into audio features for specific genres/artists - see if a model can be trained to predict for genre of an artist based on audio features
* Build personal sentiment analyzer and compare to Vader (will require lots of computing power to train model...)