# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

### Student Name: Alison Hatfield

### GitHub Repo: https://github.com/ajhatfield/json-sentiment 

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

In [6]:
# Create and activate a Python virtual environment. 
# Before starting the project, try all these imports FIRST
# Address any errors you get running this code cell 
# by installing the necessary packages into your active Python environment.
# Try to resolve issues using your materials and the web.
# If that doesn't work, ask for help in the discussion forums.
# You can't complete the exercises until you import these - start early! 
# We also import json and pickle (included in the Python Standard Library).

import json
import pickle

import requests
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

print('All prereqs installed.')
!pip list

All prereqs installed.
Package            Version
------------------ -----------
annotated-types    0.6.0
appnope            0.1.4
asttokens          2.4.1
blis               0.7.11
catalogue          2.0.10
certifi            2024.2.2
charset-normalizer 3.3.2
click              8.1.7
cloudpathlib       0.16.0
comm               0.2.2
confection         0.1.4
cymem              2.0.8
debugpy            1.8.1
decorator          5.1.1
en-core-web-lg     3.7.1
en-core-web-sm     3.7.1
executing          2.0.1
filelock           3.13.3
fsspec             2024.3.1
huggingface-hub    0.22.2
idna               3.6
ipykernel          6.29.4
ipython            8.22.2
jedi               0.19.1
Jinja2             3.1.3
joblib             1.3.2
jupyter_client     8.6.1
jupyter_core       5.7.2
langcodes          3.3.0
MarkupSafe         2.1.5
matplotlib-inline  0.1.6
mpmath             1.3.0
murmurhash         1.0.10
nest-asyncio       1.6.0
networkx           3.2.1
nltk               3.8.1
numpy 

1. The following code accesses the [lyrics.ovh](https://lyricsovh.docs.apiary.io/#reference/0/lyrics-of-a-song/search) public api, searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

In [7]:
import requests
import json

song = "Hey Stephen (Taylor's Version)"
artist = 'Taylor Swift'

#API URL
url = f"https://api.lyrics.ovh/v1/{artist}/{song}"

#Creating the GET request from the API above
result = json.loads(requests.get(url).text)

#Putting the JSON data into a file
with open('lyrics_open.json', 'w') as file:
    json.dump(result,file)

#telling user where song lyrics will be
print(f'The lyrics of {song} have been written to lyrics_open.json')



The lyrics of Hey Stephen (Taylor's Version) have been written to lyrics_open.json


2. Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

In [8]:
import json
from textblob import TextBlob

#contents of the JSON file 
file_name = 'lyrics_open.json'
with open(file_name, 'r') as file:
    lyrics = json.load(file)
    
#Extract the lyrics from the above JSON data
lyrics_text = lyrics['lyrics']

#Print lyrics
print('Song Lyrics:')
print(lyrics_text)
print()

#Sentiment analysis using TextBlob
blob = TextBlob(lyrics_text)
score = blob.sentiment.polarity

#Print score
print(f'The Polarity score the the sentiment analysis is {score}') 


#Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, 
# do you think the lyrics have a more positive or negative connotaion?

if score > 0:
    print('The lyrics have a positive connotation')
elif score < 0:
    print('The lyrics have a negative connotation')
else:
    print('The lyrics are neutral')



Song Lyrics:
Paroles de la chanson Hey Stephen par Taylor Swift
Hey Stephen, I know looks can be deceiving
But I know I saw a light in you
And as we walked we were talking
I didn't say half the things I wanted to Of all the girls tossing rocks at your window
I'll be the one waiting there even when it's cold
Hey Stephen, boy, you might have me believing
I don't always have to be alone 'Cause I can't help it if you look like an angel
Can't help it if I wanna kiss you in the rain so
Come feel this magic I've been feeling since I met you
Can't help it if there's no one else
Mmm, I can't help myself Hey Stephen, I've been holding back this feeling

So I got some things to say to you
I've seen it all, so I thought
But I never seen nobody shine the way you do The way you walk, way you talk, way you say my name
It's beautiful, wonderful, don't you ever change
Hey Stephen, why are people always leaving?
I think you and I should stay the same 'Cause I can't help it if you look like an angel
Can'

3. Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

In [9]:
import json
import requests

def get_lyrics (artist, song, filename):
    #API URL
    url = f"https://api.lyrics.ovh/v1/{artist}/{song}"
    
    #GET request to API
    response = requests.get(url)
    
    #Check if successful
    if response.status_code == 200:
        #Extract response
        lyrics = response.json()
        lyrics_text = lyrics.get('lyrics')
        
        with open (filename, 'w') as file:
            json.dump((artist, song, lyrics_text),file)
        print(f"{song}'s lyrics by {artist} have been written to {filename}")
    else:
        print(f"failed to get lyrics for {song} by {artist}")

#4 examples
get_lyrics('Harry Styles', 'Kiwi', 'Kiwi_song_lyrics.json')
get_lyrics('The Lumineers', 'Sleep On The Floor', "Sleep_On_The_Floor_lyrics.json")
get_lyrics('Blink 182', 'First Date', 'First_Date_lyrics.json')
get_lyrics('Paramore', 'Still into You', 'Still_into_You_lyrics.json')
        

Kiwi's lyrics by Harry Styles have been written to Kiwi_song_lyrics.json
Sleep On The Floor's lyrics by The Lumineers have been written to Sleep_On_The_Floor_lyrics.json
First Date's lyrics by Blink 182 have been written to First_Date_lyrics.json
Still into You's lyrics by Paramore have been written to Still_into_You_lyrics.json


4. Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

In [13]:
#Initalize spaCy and add SpacyTextBlob pipeline
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe('spacytextblob')

#create funtion to analyze sentiment analysis and returns polarity score
def get_polarity_score(filename):
    with open(filename, 'r') as file:
        lyrics = file.read()
        doc = nlp(lyrics)
        polarity_score = doc._.polarity
        return polarity_score

#list of my song files
song_files = ['Kiwi_song_lyrics.json', 'First_Date_lyrics.json','Sleep_On_The_Floor_lyrics.json','Still_into_You_lyrics.json']

#Analyze score
for file_name in song_files:
    score = get_polarity_score(file_name)
    print(f'The polarity score for {file_name} is: {score}')

The polarity score for Kiwi_song_lyrics.json is: -0.13911845730027547
The polarity score for First_Date_lyrics.json is: 0.017333333333333333
The polarity score for Sleep_On_The_Floor_lyrics.json is: 0.11485690235690234
The polarity score for Still_into_You_lyrics.json is: -0.04375000000000001


I am suprised that Kiwi and Still into You both have negative scores. I always think these are upbeat, happier songs, but after looking at the lyrics more in-depth I understand why the program would think that. First date also has a really low score that is close to 0 and I would think that would be higher as well. It is talking all about going on a first date with someone which should bring excitment. I am not suprised by Sleep on the Floor, however. 