# CorEx and Vader Sentiment Analysis

In this notebook I try CorEx topic modeling to see how/if it differs from NMF, LSA, and LDA.

I also do Vader Sentiment Analysis to see if metal as a genre is truly as negative as I expect.

In [1]:
import pandas as pd
import numpy as np
import pickle
from corextopic import corextopic as ct
from corextopic import vis_topic as vt
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set()
from sklearn.feature_extraction.text import CountVectorizer

In [2]:
with open('../data/metal_artists.pickle','rb') as rf:
    metal = pickle.load(rf)

In [3]:
df = pd.DataFrame(metal)
df.dropna(subset=['lyrics'],inplace=True)

In [4]:
stop_words = open('../data/stopwords.txt','r').read().split()

In [5]:
cv = CountVectorizer(stop_words=stop_words,min_df=0.01,binary=True)
cv_matrix = cv.fit_transform(df.lyrics)
cv_df = pd.DataFrame(cv_matrix.toarray(),columns=cv.get_feature_names())

In [6]:
topic_model = ct.Corex(n_hidden=4,words=cv_df.columns,seed=23)
topic_model.fit(cv_matrix,words=cv_df.columns,docs=df.lyrics)

<corextopic.corextopic.Corex at 0x7fbbb0e3da90>

In [7]:
topics = topic_model.get_topics()
for n,topic in enumerate(topics):
    topic_words,_ = zip(*topic)
    print('{}: '.format(n) + ','.join(topic_words))

0: death,blood,light,fire,earth,sky,darkness,fear,flesh,world
1: shit,fuck,bitch,ass,baby,money,little,girl,rock,good
2: life,time,feel,lost,eyes,heart,end,nothing,alone,pain
3: things,always,really,need,love,hard,much,even,better,try


In [8]:
predict_array = topic_model.predict(cv_matrix)
predict_array

array([[ True, False, False, False],
       [ True, False,  True,  True],
       [ True, False,  True, False],
       ...,
       [False, False,  True,  True],
       [ True,  True,  True,  True],
       [False,  True,  True, False]])

In [9]:
# assign each song to a topic (it's not perfect since each song could have multiple topics)
predictions = np.argmax(predict_array,axis=1)

In [10]:
predictions.shape

(50039,)

In [11]:
# save topic assignments to later merge with the original song dataframe
with open('../data/corex_topic_assignments.pickle','wb') as out:
    pickle.dump(predictions,out)

## Vader Sentiment Analysis

In [12]:
import vaderSentiment.vaderSentiment as vs

In [13]:
vader = vs.SentimentIntensityAnalyzer()

In [14]:
def vader_score(text):
    return vader.polarity_scores(text)['compound']

The Polarity Score returns a value from -1 to 1, with -1 being all negative words and 1 being all positive words, and 0 being neutral.

In [15]:
df['vader'] = df.lyrics.apply(vader_score)

In [16]:
df.vader.describe()

count    50039.000000
mean        -0.148296
std          0.849597
min         -1.000000
25%         -0.973400
50%         -0.561500
75%          0.894650
max          0.999900
Name: vader, dtype: float64

We see that metal songs are on average quite negative in terms of lyric sentiment (as expected).

In [17]:
df.head()

Unnamed: 0,artist_name,release_date,page_views,song_title,album_name,spotify_url,lyrics,vader
0,Kreator,2016-12-16,13775,Satan Is Real,Gods of Violence,,Martyrs\nYou cannot kill us all\nVengeance wil...,-0.9947
1,Kreator,1986-11-01,8745,Pleasure to Kill,Pleasure to Kill,,Day turns to night as I rise from my grave\nBl...,-0.992
2,Kreator,,5037,Enemy Of God,Enemy Of God,,"Shocked Orwellian races, gather united in grie...",-0.9964
3,Kreator,2017-01-27,0,Fallen Brother,Gods of Violence,,[Instrumental Intro]\n\nMuch too young you had...,-0.9933
4,Kreator,2017-01-27,0,Totalitarian Terror,Gods of Violence,,Come experience hate like you never have befor...,-0.9942


In [19]:
# save vader scores per song for use in visualizations
df[['song_title','vader']].to_csv('../data/vader_scores.csv',index=False)