# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

### Student Name: Brittany Dowdle
### **[Clickable link to GitHub Repo](https://github.com/Bdowdle4/bd-json-sentiment)**

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.
***

In [7]:
# Create and activate a Python virtual environment. 
# Before starting the project, try all these imports FIRST
# Address any errors you get running this code cell 
# by installing the necessary packages into your active Python environment.
# Try to resolve issues using your materials and the web.
# If that doesn't work, ask for help in the discussion forums.
# You can't complete the exercises until you import these - start early! 
# We also import json and pickle (included in the Python Standard Library).

import json
import pickle

import requests
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

print('All prereqs installed.')
!pip list

All prereqs installed.
Package            Version
------------------ ------------
annotated-types    0.7.0
asttokens          2.4.1
blis               1.0.1
catalogue          2.0.10
certifi            2024.8.30
charset-normalizer 3.4.0
click              8.1.7
cloudpathlib       0.20.0
colorama           0.4.6
comm               0.2.2
confection         0.1.5
cymem              2.0.8
debugpy            1.8.8
decorator          5.1.1
en_core_web_sm     3.8.0
executing          2.1.0
idna               3.10
ipykernel          6.29.5
ipython            8.29.0
jedi               0.19.2
Jinja2             3.1.4
joblib             1.4.2
jupyter_client     8.6.3
jupyter_core       5.7.2
langcodes          3.4.1
language_data      1.2.0
marisa-trie        1.2.1
markdown-it-py     3.0.0
MarkupSafe         3.0.2
matplotlib-inline  0.1.7
mdurl              0.1.2
murmurhash         1.0.10
nest-asyncio       1.6.0
nltk               3.9.1
numpy              2.0.2
packaging          24.2
parso     

***
### **Question 1**
The following code accesses the [lyrics.ovh](https://lyricsovh.docs.apiary.io/#reference/0/lyrics-of-a-song/search) public api, searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

In [8]:
import requests
import json

# Access API and fetch lyrics
result = json.loads(requests.get('https://api.lyrics.ovh/v1/They Might Be Giants/Birdhouse in your soul').text)

# Write the resulting JSON to a file
output_file = 'lyrics.json'
with open(output_file, 'w', encoding='utf-8') as file:
    json.dump(result, file, ensure_ascii=False, indent=4)

print(f"Lyrics data has been written to {output_file}")

Lyrics data has been written to lyrics.json


### **Question 2**
Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

In [9]:
import json
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob
from spacy.tokens import Doc

# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

# Add sentiment analysis pipeline
nlp.add_pipe("spacytextblob", last=True)

# Manually register the polarity extension
if not Doc.has_extension('polarity'):
    Doc.set_extension('polarity', getter=lambda doc: doc._.blob.sentiment.polarity)

# Read in the contents of the file
with open(output_file, 'r', encoding='utf-8') as file:
    data = json.load(file)

# Extract and print the lyrics
lyrics = data.get('lyrics', '')
print("Lyrics:\n", lyrics)

# Perform sentiment analysis
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe('spacytextblob')

# Analyze the lyrics' sentiment
doc = nlp(lyrics)
polarity = doc._.polarity
print("Polarity score:", polarity)

# Analyze if lyrics have a positive or negative connotation
if polarity > 0:
    print("This song is positive.")
elif polarity < 0:
    print("This song is negative.")
else:
    print("This song is neutral.")

Lyrics:
 I'm your only friend 
I'm not your only friend 
But I'm a little glowing friend 
But really I'm not actually your friend 
But I am 


Blue canary in the outlet by the light switch 

Who watches over you 

Make a little birdhouse in your soul 

Not to put too fine a point on it 

Say I'm the only bee in your bonnet 

Make a little birdhouse in your soul 



I have a secret to tell 

From my electrical well 

It's a simple message and I'm leaving out the whistles and bells 

So the room must listen to me 

Filibuster vigilantly 

My name is blue canary one note* spelled l-i-t-e 

My story's infinite 

Like the Longines Symphonette it doesn't rest 



Blue canary in the outlet by the light switch 

Who watches over you 

Make a little birdhouse in your soul 

Not to put too fine a point on it 

Say I'm the only bee in your bonnet 

Make a little birdhouse in your soul 



I'm your only friend 

I'm not your only friend 

But I'm a little glowing friend 

But really I'm not actual

### **Question 3**
Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

In [10]:
import requests
import json

def favorite_song_lyrics(artist, song, filename):
    # Format the API URL
    url = f"https://api.lyrics.ovh/v1/{artist}/{song}"
    
    try:
        # Send a GET request to the API
        response = requests.get(url)
        
        # Check if the response was successful
        if response.status_code == 200:
            lyrics = response.json().get('lyrics', 'Lyrics not found')
            
            # Prepare the data to be written to a JSON file
            lyrics_data = {
                "artist": artist,
                "song": song,
                "lyrics": lyrics
            }
            
            # Write the data to the specified .json file
            with open(filename, 'w', encoding='utf-8') as file:
                json.dump(lyrics_data, file, ensure_ascii=False, indent=4)
            print(f"Lyrics for '{song}' by {artist} written to {filename}.")
        else:
            print(f"Failed to get lyrics for {song} by {artist}. Status code: {response.status_code}")
    except Exception as e:
        print(f"An error occurred: {e}")

# Testing the function with four songs
favorite_song_lyrics("Gnarls Barkley", "Crazy", "crazy_lyrics.json")
favorite_song_lyrics("Jay-Z", "99 Problems", "problems_lyrics.json")
favorite_song_lyrics("Beyonce", "Crazy in Love", "love_lyrics.json")
favorite_song_lyrics("OutKast", "Hey Ya!", "hey_lyrics.json")


Lyrics for 'Crazy' by Gnarls Barkley written to crazy_lyrics.json.
Lyrics for '99 Problems' by Jay-Z written to problems_lyrics.json.
Lyrics for 'Crazy in Love' by Beyonce written to love_lyrics.json.
Lyrics for 'Hey Ya!' by OutKast written to hey_lyrics.json.


### **Question 4**
Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

In [11]:
import json
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob
from spacy.tokens import Doc

# Initialize spaCy and the spacytextblob extension
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("spacytextblob", last=True)

# Manually register the polarity extension
if not Doc.has_extension('polarity'):
    Doc.set_extension('polarity', getter=lambda doc: doc._.blob.sentiment.polarity)


def analyze_sentiment_from_file(filename):
    try:
        # Load the JSON file with song lyrics
        with open(filename, 'r', encoding='utf-8') as file:
            song_data = json.load(file)
        
        # Extract the lyrics from the file
        lyrics = song_data.get("lyrics", "")
        
        # Perform sentiment analysis using spacytextblob
        doc = nlp(lyrics)
        polarity_score = doc._.polarity
        
        return polarity_score
    except Exception as e:
        print(f"An error occurred while processing {filename}: {e}")
        return None

# Test using the files created in question 3
files = [
    ("crazy_lyrics.json", "Gnarls Barkley - Crazy"),
    ("problems_lyrics.json", "Jay-Z - 99 Problems"),
    ("love_lyrics.json", "Beyonce - Crazy in Love"),
    ("hey_lyrics.json", "OutKast - Hey Ya!")
]

# Analyze sentiment and print the polarity score for each file
for filename, song_name in files:
    polarity = analyze_sentiment_from_file(filename)
    if polarity is not None:
        print(f"Sentiment score for: {song_name}: {polarity}")

Sentiment score for: Gnarls Barkley - Crazy: -0.21798245614035086
Sentiment score for: Jay-Z - 99 Problems: -0.19448632762586252
Sentiment score for: Beyonce - Crazy in Love: 0.02438902007083823
Sentiment score for: OutKast - Hey Ya!: 0.07173202614379086


In [12]:
# Comment on the sentiment analysis results
# Does the reported polarity match your understanding of the song's lyrics?
# Answer:
# The polarity scores may not always align perfectly with our understanding of the songs' emotions.
# This could be because sentiment analysis tools primarily focus on the words' sentiment
# rather than the overall context or themes. For example, a song might use negative words
# in a metaphorical or artistic way that the sentiment analyzer doesn't interpret correctly.

***
## Summary
#### *All interpretations were found by googling the song*

+ **Crazy (-0.218)** is about the idea that people should be themselves, even if it means being considered crazy. I would say that a negative score is accurate! Because even though the song meaning is positive the lyrics definitely make it sound like a bad situation.
+ **99 Problems (-0.194)** is about a traffic stop encounter with the police and the contested issues of search and seizure law. I would say that a slightly negative score is appropriate.
+ **Crazy in Love (0.024)** is about how, when you are falling in love, you do things that are out of character and you do not really care because you are just open. I am surprised that it isn't more positive, but I think a positive score is accurate.
+ **Hey Ya! (0.072)** is about the state of relationships in the 2000s, and how people can end up unhappy in relationships they stay in out of fear of being alone or the world’s expectations of what a relationship should be. I can safely say that I have never thought of this being a sad song. It is upbeat and has some positive connotations when listening. I would say this score falls to the misinterpretted category since it has a positive score. (But the tool wasn't the only one who was fooled!)

In [13]:
!jupyter nbconvert --to html requests-json-nlp.ipynb

[NbConvertApp] Converting notebook requests-json-nlp.ipynb to html
[NbConvertApp] Writing 305936 bytes to requests-json-nlp.html
