# Corpus Analysis on **Ice Nine Kills** Lyrics

> Conducting Corpus Analysis on the lyrics of the singles off of Ice Nine Kills' two horror themed albums *The Silver Scream* and *The Silver Scream 2: Welcome to Horrorwood*. 

## Installing, Importing, and Processing

In [1]:
import spacy

In [2]:
!spacy download en_core_web_sm

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     -------- ------------------------------- 2.6/12.8 MB 13.7 MB/s eta 0:00:01
     ------------------ --------------------- 6.0/12.8 MB 14.2 MB/s eta 0:00:01
     --------------------------- ------------ 8.7/12.8 MB 14.1 MB/s eta 0:00:01
     ----------------------------------- --- 11.5/12.8 MB 13.9 MB/s eta 0:00:01
     --------------------------------------  12.6/12.8 MB 13.6 MB/s eta 0:00:01
     --------------------------------------- 12.8/12.8 MB 10.8 MB/s eta 0:00:00
[+] Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')


In [3]:
import os
from spacy import displacy
import pandas as pd
pd.options.mode.chained_assignment = None 
import plotly.express as px

In [9]:
lyrics = []
file_names = [] 

for _file_name in os.listdir('/Users/praga/Desktop/Collecting Data/CD_Assignment2/Data'):
    if _file_name.endswith('.txt'):
        lyrics.append(open('Data' + '/' + _file_name, 'r', encoding = 'utf-8').read())
        file_names.append(_file_name) 

In [10]:
d = {'Filename': file_names, 'Text': lyrics}

In [11]:
song_df = pd.DataFrame(d)

In [12]:
song_df.head()

Unnamed: 0,Filename,Text
0,american_nightmare.txt,Getting ready for bed at a regular time\nIs on...
1,assault_n_batteries.txt,Breaking news alert\nA deadly shootout at a lo...
2,enjoy_your_slay.txt,"Going down, sir? (Indeed)\nHere you are\nPlagu..."
3,funeral_derangements.txt,"Slave to the plot, let them rot\nOr bring them..."
4,grave_mistake.txt,Here lies a lifeless bride and groom\nTill dea...


In [14]:
song_df['Text'] = song_df['Text'].str.replace('\n', ' ', regex = True).str.strip()
song_df.head()

Unnamed: 0,Filename,Text
0,american_nightmare.txt,Getting ready for bed at a regular time Is one...
1,assault_n_batteries.txt,Breaking news alert A deadly shootout at a loc...
2,enjoy_your_slay.txt,"Going down, sir? (Indeed) Here you are Plagued..."
3,funeral_derangements.txt,"Slave to the plot, let them rot Or bring them ..."
4,grave_mistake.txt,Here lies a lifeless bride and groom Till deat...


In [19]:
metadata_df = pd.read_csv('metadata.csv')

In [20]:
metadata_df.head()

Unnamed: 0,Filename,Title,Track Listing,Song Length,Release Date,Album,Horror Reference,Lyric Source
0,american_nightmare.txt,The American Nightmare,1,4:11,20-06-2018,The Silver Scream,A Nightmare on Elm Street,LyricFind
1,thank_god_its_friday.txt,Thank God It's Friday,2,4:24,13-07-2018,The Silver Scream,Friday the 13th,LyricFind
2,enjoy_your_slay.txt,Enjoy Your Slay (featuring Sam Kubrick of Shie...,8,4:16,26-05-2017,The Silver Scream,The Shining,musixmatch
3,grave_mistake.txt,A Grave Mistake,6,3:04,14-09-2018,The Silver Scream,The Crow,musixmatch
4,stabbing_in_the_dark.txt,Stabbing in the Dark,3,4:36,19-10-2018,The Silver Scream,Halloween,musixmatch


In [32]:
song_df['Filename'] = song_df['Filename'].str.replace('.txt', '', regex=True)
#since title in metadata is not the same as filename, take out .txt in metadata df as well
metadata_df['Filename'] = metadata_df['Filename'].str.replace('.txt', '', regex=True)

In [36]:
playlist_df = metadata_df.merge(song_df, on = 'Filename')

In [37]:
playlist_df.head()

Unnamed: 0,Filename,Title,Track Listing,Song Length,Release Date,Album,Horror Reference,Lyric Source,Text
0,american_nightmare,The American Nightmare,1,4:11,20-06-2018,The Silver Scream,A Nightmare on Elm Street,LyricFind,Getting ready for bed at a regular time Is one...
1,thank_god_its_friday,Thank God It's Friday,2,4:24,13-07-2018,The Silver Scream,Friday the 13th,LyricFind,He drowned in all our sins He drowned in our m...
2,enjoy_your_slay,Enjoy Your Slay (featuring Sam Kubrick of Shie...,8,4:16,26-05-2017,The Silver Scream,The Shining,musixmatch,"Going down, sir? (Indeed) Here you are Plagued..."
3,grave_mistake,A Grave Mistake,6,3:04,14-09-2018,The Silver Scream,The Crow,musixmatch,Here lies a lifeless bride and groom Till deat...
4,stabbing_in_the_dark,Stabbing in the Dark,3,4:36,19-10-2018,The Silver Scream,Halloween,musixmatch,In calculated silence Captivated by the violen...


## Text Enrichment with spaCy

In [38]:
nlp = spacy.load('en_core_web_sm')
print(nlp.pipe_names)

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']


In [40]:
def process_lyrics(text):
    return nlp(text)

In [41]:
playlist_df['Doc'] = playlist_df['Text'].apply(process_lyrics)

## Text Reduction

### Tokenization

In [42]:
def get_token(doc):
    return[(token.text) for token in doc]

In [43]:
playlist_df['Tokens'] = playlist_df['Doc'].apply(get_token)

In [44]:
tokens = playlist_df[['Text', 'Tokens']].copy()
tokens.head()

Unnamed: 0,Text,Tokens
0,Getting ready for bed at a regular time Is one...,"[Getting, ready, for, bed, at, a, regular, tim..."
1,He drowned in all our sins He drowned in our m...,"[He, drowned, in, all, our, sins, He, drowned,..."
2,"Going down, sir? (Indeed) Here you are Plagued...","[Going, down, ,, sir, ?, (, Indeed, ), Here, y..."
3,Here lies a lifeless bride and groom Till deat...,"[Here, lies, a, lifeless, bride, and, groom, T..."
4,In calculated silence Captivated by the violen...,"[In, calculated, silence, Captivated, by, the,..."


### Lemmatization

In [45]:
def get_lemma(doc):
    return[(token.lemma_) for token in doc] 

In [46]:
playlist_df['Lemmas'] = playlist_df['Doc'].apply(get_lemma)

## Text Annotation

### Parts of Speech (POS) Tagging

In [47]:
def get_pos(doc): 
    return[(token.pos_, token.tag_) for token in doc]

In [48]:
playlist_df['POS'] = playlist_df['Doc'].apply(get_pos) 

In [49]:
def extract_proper_nouns(doc):
    return[token.text for token in doc if token.pos_ == 'PROPN'] 

In [50]:
playlist_df['Proper_Nouns'] = playlist_df['Doc'].apply(extract_proper_nouns) 

In [58]:
#testing progress
list(playlist_df.loc[[0, 4], 'Proper_Nouns'])

[['David', 'Dreams', 'Craven', 'Sweet', 'Wicked', 'morgue', 'Five', 'Seven'],
 ['Fall',
  'Haddonfield',
  'Knife',
  'Day',
  'Knife',
  'Fall',
  'Scream',
  'Halloween',
  'Fall',
  'Orange',
  'Grove',
  'Ave',
  'Suspect',
  'Michael',
  'Myers',
  'Michael',
  'Fall']]

In [59]:
#saving dataframe as csv file
playlist_df.to_csv('INK_silverscream_singles_with_spaCy_tags.csv') 