# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

### Student Name: John Hickman

### Git Hub Repo https://github.com/Gretsch1963/44620-Mod-4

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

# Question 1

1. The following code accesses the [lyrics.ovh](https://lyricsovh.docs.apiary.io/#reference/0/lyrics-of-a-song/search) public api, searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

In [4]:
import requests
import json


#imports the packages and initiates lyrics Genius (vs Lyrics.ovh above)
import lyricsgenius

genius = lyricsgenius.Genius("sugMHkrc4qTfbzWfBIL3olgq2FPZslEVMep_YO5Tb1DsyBv3U4CCO7J8e_6x4oRZ")

#searches for a song by artist name:

artist = genius.search_artist("Bruce Springsteen", max_songs=3, sort="title", include_features=True)
print(artist.songs)

#searches for a single song by the artist:

song = artist.song("Born in the USA")

print(song.lyrics)
lyrics = song.lyrics

#adds the song to the artist object:
artist.add_song(song)

#saves the artist's songs to a JSON file:
artist.save_lyrics()

song_desc = {
    'artist': 'Bruce Springsteen',
    'title': 'Born in the USA',
    'lyrics': lyrics 
}

with open('born_in_the_USA.json', 'w') as f:
    json.dump(song_desc, f)

Searching for songs by Bruce Springsteen...

Song 1: "30 Days Out"
Song 2: "4th of July, Asbury Park (Live 1980)"
Song 3: "4th of July, Asbury Park (Sandy)"

Reached user-specified song limit (3).
Done. Found 3 songs.
[Song(id, artist, ...), Song(id, artist, ...), Song(id, artist, ...)]
Searching for "Born in the USA" by Bruce Springsteen...
Done.
101 ContributorsBorn in the U.S.A. Lyrics[Verse 1]
Born down in a dead man's town
The first kick I took was when I hit the ground
You end up like a dog that's been beat too much
'Til you spend half your life just covering up, now
[Chorus]
Born in the U.S.A. 
I was born in the U.S.A. 
I was born in the U.S.A. 
Born in the U.S.A. now 

[Verse 2]
Got in a little hometown jam
So they put a rifle in my hand
Sent me off to a foreign land
To go and kill the yellow man
[Chorus]
Born in the U.S.A. 
I was born in the U.S.A. 
I was born in the U.S.A. 
I was born in the U.S.A. 

[Verse 3]
Come back home to the refinery
Hiring man says, “Son, if it was up

# Question 2

2. Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

In [8]:
import requests
import json
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

with open('born_in_the_USA.json', 'r') as f:
    song_desc = json.load(f)

lyrics = song_desc['lyrics']

print(lyrics)

# Load spaCy and the spaCyTextBlob extension

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("spacytextblob")

# Perform sentiment analysis on the lyrics

doc = nlp(lyrics)
polarity_score = doc._.polarity
print("The polarity score is :", polarity_score)

## Given that the range of the polarity score is [-1.0,1.0] to check how positive or negative the lyrics is?
## The polarity score of .008 suggests that the sentiment is very slightly positive. 

101 ContributorsBorn in the U.S.A. Lyrics[Verse 1]
Born down in a dead man's town
The first kick I took was when I hit the ground
You end up like a dog that's been beat too much
'Til you spend half your life just covering up, now
[Chorus]
Born in the U.S.A. 
I was born in the U.S.A. 
I was born in the U.S.A. 
Born in the U.S.A. now 

[Verse 2]
Got in a little hometown jam
So they put a rifle in my hand
Sent me off to a foreign land
To go and kill the yellow man
[Chorus]
Born in the U.S.A. 
I was born in the U.S.A. 
I was born in the U.S.A. 
I was born in the U.S.A. 

[Verse 3]
Come back home to the refinery
Hiring man says, “Son, if it was up to me”
Went down to see my V.A. man
He said, “Son, don't you understand”
See Bruce Springsteen LiveGet tickets as low as $34You might also like[Verse 4]
I had a brother at Khe Sanh
Fighting off them Viet Cong
They're still there, he's all gone
He had a woman he loved in Saigon
I got a picture of him in her arms now
[Verse 5]
Down in the shadow of 

# Question 3

3. Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

In [9]:
import json
import lyricsgenius

#defining function and passes artist,song and filename

def get_lyrics(artist, song, f):    
  genius = lyricsgenius.Genius("sugMHkrc4qTfbzWfBIL3olgq2FPZslEVMep_YO5Tb1DsyBv3U4CCO7J8e_6x4oRZ")
  artist = genius.search_artist(artist, max_songs=1,get_full_info=False)
  song  = artist.song(song)
  with open(f+' song_lyrics.json', 'w') as f:
        json.dump(song.lyrics, f)

# lyrics based on artist and song

get_lyrics("Social Distortion", "Bad Luck","Bad_Luck")   
get_lyrics("The Interupters", "Raised By Wolves","Wolves")  
get_lyrics("Sarah Brightman", "Time To Say Goodye","Sarah")  
get_lyrics("New Order", "Substance","Substance")  

Searching for songs by Social Distortion...

Song 1: "Story of My Life"

Reached user-specified song limit (1).
Done. Found 1 songs.
Searching for "Bad Luck" by Social Distortion...
Done.
Searching for songs by The Interupters...

Changing artist name to 'The Interrupters'
Song 1: "She’s Kerosene"

Reached user-specified song limit (1).
Done. Found 1 songs.
Searching for "Raised By Wolves" by The Interrupters...
Done.
Searching for songs by Sarah Brightman...

Song 1: "Time To Say Goodbye"

Reached user-specified song limit (1).
Done. Found 1 songs.
Searching for "Time To Say Goodye" by Sarah Brightman...
Done.
Searching for songs by New Order...

Song 1: "Blue Monday"

Reached user-specified song limit (1).
Done. Found 1 songs.
Searching for "Substance" by New Order...
Done.


# Question 4

4. Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

In [10]:
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob
import json

# Running Sentiment Analysis and creating Output 

def sentiment_analysis(f):
    nlp = spacy.load('en_core_web_sm')
    nlp.add_pipe('spacytextblob')
    with open(f, 'r') as f:
        lyrics = json.load(f)
    doc = nlp(lyrics)
    polarity = doc._.blob.polarity
    
    print(f'Polarity scores of {f.name}: {polarity}')
    
#Fetching Song Polarity Scores for lyric output in #3

sentiment_analysis('Bad_Luck song_lyrics.json')
sentiment_analysis('Wolves song_lyrics.json')
sentiment_analysis('Sarah song_lyrics.json')
sentiment_analysis('Substance song_lyrics.json')

Polarity scores of Bad_Luck song_lyrics.json: -0.49929078014184364
Polarity scores of Wolves song_lyrics.json: -0.08827160493827159
Polarity scores of Sarah song_lyrics.json: 0.296
Polarity scores of Substance song_lyrics.json: -0.13774463383838384


### The polarity scores are as expected. Bad Luck is a negative sentiment song about a guy who's down on his luck and Raised By Wolves deals with childhood abandonment issues even if the songwriter overcomes the issues, so not surprising that the sentiment analysis was negative. Sarah Brightman's song is translated from Italian and doesn't make a lot of sense in english, so could understand  the sentiment might trend more neutral to positive. Substance is also a negative song resulting in a negative sentiment. 