# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

### Student Name: Andy Asher [github link](https://github.com/andyakiva/json-sentiment)

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

1. The following code accesses the [lyrics.ovh](https://lyricsovh.docs.apiary.io/#reference/0/lyrics-of-a-song/search) public api, searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

# Question 1

In [14]:
#import required libraries, modules, and packages for this project into the virtual env
import json
import pickle
import requests
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob
import xmltodict #this will help us convert to json

#show import success
print('All prereqs installed.')
!pip list

#create a text file with song lyrics and info about the song
# this assaignment originally used a now defunct api so we are using a different api
# This API, when called like this, does not allow any of the following stop words when searching for either artist or title "about, after, all, also, an, and, another, any, are, as, at, be, because, been, before, being, between, both, but, by, came, can, come, could, did, do, does, each, else, for, from, get, got, had, has, have, he, her, here, him, himself, his, how, if, in, into, is, it, its, just, like, make, many, me, might, more, most, much, must, my, never, no, now, of, on, only, or, other, our, out, over, re, said, same, see, should, since, so, some, still, such, take, than, that, the, their, them, then, there, these, they, this, those, through, to, too, under, up, use, very, want, was, way, we, well, were, what, when, where, which, while, who, will, with, would, you, your"
#This API, when called this way, does not allow special characters in the search
artist = 'Interpol'
song = 'Evil'
result = requests.get('http://api.chartlyrics.com/apiv1.asmx/SearchLyricDirect?artist='+artist+'&song='+song).text

#convrt that file to json
dictver = xmltodict.parse(result)
with open(song+artist+'.json','w') as outfile:
    json.dump(dictver, outfile)


All prereqs installed.
Package                   Version
------------------------- ----------
annotated-types           0.6.0
anyio                     4.0.0
argon2-cffi               23.1.0
argon2-cffi-bindings      21.2.0
arrow                     1.3.0
asttokens                 2.4.0
async-lru                 2.0.4
attrs                     23.1.0
Babel                     2.13.0
backcall                  0.2.0
beautifulsoup4            4.12.2
bleach                    6.1.0
blis                      0.7.11
boto3                     1.33.4
botocore                  1.33.4
catalogue                 2.0.10
certifi                   2024.2.2
cffi                      1.16.0
charset-normalizer        3.3.2
click                     8.1.7
cloudpathlib              0.16.0
colorama                  0.4.6
comm                      0.1.4
confection                0.1.4
contourpy                 1.2.0
cycler                    0.12.1
cymem                     2.0.8
debugpy                   1



2. Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

# Question 2

In [63]:
# Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!

file = open('EvilInterpol.json')
newfile = json.load(file)
content = newfile.get('GetLyricResult')
lyrics = content['Lyric']
print("The lyrics for ",song," by ",artist, " are:\n")
print(lyrics)

#use spaCyTextBlob to perform sentiment analysis on the lyrics.  
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("spacytextblob")

score = nlp(lyrics)._.polarity

#Print the polarity score of the sentiment analysis.  
print("\nThe polarity score for ",song," by ",artist, " is ",score)

#Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

#The polarity for the song is .087 which is very very mildly positive, but more acurately described as neutral. I think the sentiment analysis may have struggled with the cryptic nature of the lyrics as I beleive the song to be more strongly positive.


The lyrics for  Evil  by  Interpol  are:

Rosemary, Heaven restores you in life
You're coming with me
Through the aging, the fearing, the strife

It's the smiling on the package
It's the faces in the sand
It's the thought that moves you upwards
Embracing me with two hands

Right will take you places
Yeah maybe to the beach
When your friends they do come crying
Tell them now your pleasure's set upon slow release

Hey wait
Great smile
Sensitive to fate, not denial

But hey, who's on trial?

It took a lifespan with no cellmate
The long way back
Sandy, why can't we look the other way?

We speaks about travel
Yeah we think about the land
We smart like all peoples
Feeling real tan

I could take you places
But you need a new man?
Wipe the pollen from the faces
Make revision to a dream while you wait in the van

Hey wait
Great smile
Sensitive to fate, not denial

But hey, who's on trial?

It took a lifespan with no cellmate
The long way back
Sandy, why can't we look the other way?
You're weigh

3. Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

# Question 3

In [77]:
def storelyrics(artist, song, filename):
    result = requests.get('http://api.chartlyrics.com/apiv1.asmx/SearchLyricDirect?artist='+artist+'&song='+song).text
    dictver = xmltodict.parse(result)
    with open(filename+'.json','w') as outfile:
        json.dump(dictver, outfile)

storelyrics("Marilyn Manson", 'The Beautiful People', 'MMTBP')
storelyrics("Five Finger Death Punch","Never Enough", "FFDPNeverEnough")
storelyrics("Megadeth","Almost Honest","MDAlomstHonest")
storelyrics("Ozzy Osbourne","Crazy Train","OzzyOsbourneCrazyTrain")



4. Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

# Question 4

In [84]:
#Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  
def analyze(filename):
    file = open(filename)
    newfile = json.load(file)
    content = newfile.get('GetLyricResult')
    lyrics = content['Lyric']
    artist = content['LyricArtist']
    song = content['LyricSong']
    score = nlp(lyrics)._.polarity
    print("The polarity score for ",song, " by ", artist, " is ", score,".")

#Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.
analyze('FFDPNeverEnough.json')
analyze('MDAlomstHonest.json')
analyze('MMTBP.json')
analyze('OzzyOsbourneCrazyTrain.json')
    
#Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be? 

#No, not really. None of these should have a positive sentiment. The score for Crazy Train is pretty accurate, but the others are substantially off. All three are negative. Almost Honest is the closest to nuetral of the three but the other two are downright angry. I think the sentiment analysis is thrown off by sarcasm in the way Manson describes a veneer of beauty covering violent facists. I am unsure what is throwing off the analysis in the FFDP song, though, as it is pretty straightforward. My only guess is that the analysis uses scoring on individual words instead of assessing a whole phrase.

The polarity score for  Never Enough  by  Five Finger Death Punch  is  0.004720279720279719 .
The polarity score for  Almost Honest  by  Megadeth  is  0.32129120879120876 .
The polarity score for  The Beautiful People  by  Marilyn Manson  is  0.2821158008658008 .
The polarity score for  Crazy Train  by  Ozzy Osbourne  is  -0.32107438016528916 .


In [86]:
!jupyter nbconvert --to html requests-json-nlp.ipynb

[NbConvertApp] Converting notebook requests-json-nlp.ipynb to html
[NbConvertApp] Writing 299024 bytes to requests-json-nlp.html
