# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

### Student Name: Lindsey Sullivan
### GitHub Repository: https://github.com/LindseySully/Module_04

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

### Prerequisites

In [95]:
## spaCy Pipeline & spaCyTextBlob
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe('spacytextblob')

import requests
import lyricsgenius



Import of JSON file storing credentials

In [78]:
import json

credentials = {}
try:
    with open('credentials.json') as file:
        credentials = json.load(file)
except FileNotFoundError:
    print("Error: credentials.json file not found.")


### Question 01

1. The following code accesses the **lyricgenius** public api, searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

In [79]:
apiKey = credentials['api_key']
genius = lyricsgenius.Genius(apiKey)

#search artist songs - max of 3
artist = genius.search_artist("Coldplay", max_songs=3)


Searching for songs by Coldplay...

Song 1: "Viva la Vida"
Song 2: "The Scientist"
Song 3: "Yellow"

Reached user-specified song limit (3).
Done. Found 3 songs.


In [94]:
#search for single song by the artist
song = artist.song("The Scientist")
genius.remove_section_headers = True


#store song lyrics in variable
lyrics = song.lyrics
#store artist in variable
artist_name = artist.name
print(artist_name)
print("----------")

#clean lyrics
start_of_lyrics = lyrics.replace('120 ContributorsTranslationsFrançaisTürkçeEspañolPortuguês',' ')
clean_chorus = start_of_lyrics.replace('See Coldplay LiveGet tickets as low as $22You might also like',' ')
final_lyrics = clean_chorus.replace('219Embed',' ')

print(final_lyrics)

Coldplay
----------
 The Scientist Lyrics[Verse 1]
Come up to meet you, tell you I'm sorry
You don't know how lovely you are
I had to find you, tell you I need you
And tell you I set you apart
Tell me your secrets and ask me your questions
No, let's go back to the start
Runnin' in circles, comin' up tails
Heads on a science apart

[Chorus]
Nobody said it was easy
It's such a shame for us to part
Nobody said it was easy
No one ever said it would be this hard
Oh, take me back to the start
[Verse 2]
I was just guessin' at numbers and figures
Pullin' the puzzles apart
Questions of science, science and progress
Do not speak as loud as my heart
And tell me you love me, come back and haunt me
Oh, and I rush to the start
Runnin' in circles, chasin' our tails
Comin' back as we are
 [Chorus]
Nobody said it was easy
Oh, it's such a shame for us to part
Nobody said it was easy
No one ever said it would be so hard
I'm goin' back to the start

[Outro]
Oh-ooh ooh-ooh-ooh-ooh
Ah-ooh ooh-ooh-ooh-ooh
Oh

In [93]:


song_dictionary = {
    "artist": artist_name,
    "title": "The scientist",
    "lyrics" : final_lyrics
}

print(json.dumps(song_dictionary, indent=2))

#writing JSON data to a file
with open("thescientist.json","w") as outfile:
    json.dump(song_dictionary,outfile,indent=2)

{
  "artist": "Spiritbox",
  "title": "The scientist",
  "lyrics": " The Scientist Lyrics[Verse 1]\nCome up to meet you, tell you I'm sorry\nYou don't know how lovely you are\nI had to find you, tell you I need you\nAnd tell you I set you apart\nTell me your secrets and ask me your questions\nNo, let's go back to the start\nRunnin' in circles, comin' up tails\nHeads on a science apart\n\n[Chorus]\nNobody said it was easy\nIt's such a shame for us to part\nNobody said it was easy\nNo one ever said it would be this hard\nOh, take me back to the start\n[Verse 2]\nI was just guessin' at numbers and figures\nPullin' the puzzles apart\nQuestions of science, science and progress\nDo not speak as loud as my heart\nAnd tell me you love me, come back and haunt me\nOh, and I rush to the start\nRunnin' in circles, chasin' our tails\nComin' back as we are\n [Chorus]\nNobody said it was easy\nOh, it's such a shame for us to part\nNobody said it was easy\nNo one ever said it would be so hard\nI'm goi

### Question 02

2. Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

In [82]:
#read contents of song_dictionary & print lyrics

with open ('thescientist.json') as thescientist:
    thescientist_dict = json.load(thescientist)
    print(thescientist_dict['lyrics'])

 The Scientist Lyrics[Verse 1]
Come up to meet you, tell you I'm sorry
You don't know how lovely you are
I had to find you, tell you I need you
And tell you I set you apart
Tell me your secrets and ask me your questions
No, let's go back to the start
Runnin' in circles, comin' up tails
Heads on a science apart

[Chorus]
Nobody said it was easy
It's such a shame for us to part
Nobody said it was easy
No one ever said it would be this hard
Oh, take me back to the start
[Verse 2]
I was just guessin' at numbers and figures
Pullin' the puzzles apart
Questions of science, science and progress
Do not speak as loud as my heart
And tell me you love me, come back and haunt me
Oh, and I rush to the start
Runnin' in circles, chasin' our tails
Comin' back as we are
 [Chorus]
Nobody said it was easy
Oh, it's such a shame for us to part
Nobody said it was easy
No one ever said it would be so hard
I'm goin' back to the start

[Outro]
Oh-ooh ooh-ooh-ooh-ooh
Ah-ooh ooh-ooh-ooh-ooh
Oh-ooh ooh-ooh-ooh-ooh

In [83]:
#sentiment analysis of lyrics
doc = nlp(thescientist_dict['lyrics'])
print("Polarity:",doc._.blob.polarity)

#Based on the polarity; I believe this song does not have a strong pull either way; the polarity is relatively neutral 
#To me this seems pretty accurate as this song is about how consuming love can be in the positive and negative aspects.

Polarity: 0.10294117647058822


### Question 03

3. Write a function that takes an artist, song, and filename, accesses the **lyricgenius** api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

In [88]:

#function
def artist_song_lyrics (artist_name: str,song_title: str,filename: str):
    genius.remove_section_headers = True
    artist= genius.search_artist(artist_name,max_songs=3,get_full_info = False)
    song = artist.song(song_title)
    lyrics = song.lyrics
    
    song_dictionary = {
        "artist": artist_name,
        "title" : song_title,
        "lyrics": lyrics
    }
    print(json.dumps(song_dictionary, indent=2))

    with open(filename +'.json',"w") as outfile:
        json.dump(song_dictionary,outfile,indent=2)

#user input
artist_name = input("Enter Artist's Name:")
song_title = input("Enter the song title:")
filename = input("Enter your preferred file name:")

artist_song_lyrics(artist_name,song_title,filename)

{
  "artist": "Spiritbox",
  "title": "Rotoscope",
  "lyrics": "9 ContributorsRotoscope Lyrics\nShadows bloom in the skyline\nSurface tension keeps dust in my eyes\nI can't take back the skeletons that haunt me frame by frame\nI can rapture the imprints sent to bore into my brain\nAnd I know that I feel the end is imminent\n\nHow long have I felt this way? Sign of the times\nShadows sway to light up my life\nTrace the answers, tears have never made me\nChange from violent delights\nShadows sway, light up my life\nTrace the answers, tears have never made me change\n\nShallow, this is what I created\nSplayed-out skeletons in the cracks in the pavement\nAnd now can you feel the injection back behind those cloudy eyes?\nIn between every cataract, a projection of my life\n\nHow long have I felt this way? Sign of the times\nShadows sway to light up my life\nTrace the answers, tears have never made me\nChange from violent delights\nShadows sway, light up my life\nTrace the answers, tears have

### Question 04

4. Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

In [109]:
def LyricPolarity (filename: str):
    with open (filename) as jsonlyrics:
        jsonlyrics_dict = json.load(jsonlyrics)
    doc = nlp(jsonlyrics_dict['lyrics'])
    print("Polarity:",doc._.blob.polarity, 'of', jsonlyrics_dict['title'], 'by', jsonlyrics_dict['artist'])

#file 1:
LyricPolarity("Alkaline-Sleep-Token.json")

#file 2:
LyricPolarity("Rotoscope-Spiritbox.json")

#file 3:
LyricPolarity("The-Summoning-Sleep-Token.json")

#file 4
LyricPolarity("VivalaVida-Coldplay.json")

Polarity: 0.19999999999999998 of Alkaline by Sleep Token
Polarity: -0.022549019607843133 of Rotoscope by Spiritbox
Polarity: 0.020864661654135346 of The Summoning by Sleep Token
Polarity: 0.11346358320042532 of Viva la Vida by Coldplay


These do seem to match my understanding of the lyrics. The only issue I would anticipate is due to my inability to fully clean the extra data fromt he lyrics which had contributors, an ad to see the artist in person, and the emded code at the bottom of the lyrics. 

In [None]:
import os
os.system('jupyter nbconvert --to html requests-json-nlp.ipynb')

0