# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

### Student Name: Nick Elias

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

1. The following code accesses the [lyrics.ovh](https://lyricsovh.docs.apiary.io/#reference/0/lyrics-of-a-song/search) public api, searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

## Question 1

In [1]:
import requests
import json

# Fetch lyrics from the API
response = requests.get(
    "https://api.lyrics.ovh/v1/They Might Be Giants/Birdhouse in your soul"
)

# Parse the API response
if response.status_code == 200:
    result = json.loads(response.text)  # Convert the response to a dictionary

    # Save the result to a JSON file
    with open("lyrics_data.json", "w") as json_file:
        json.dump(result, json_file, indent=4)  # Pretty-print JSON

    print("Lyrics data saved to 'lyrics_data.json'.")
else:
    print(f"Error: {response.status_code} - {response.text}")

Lyrics data saved to 'lyrics_data.json'.


2. Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

## Question 2

In [2]:
import json
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

# Load the spaCy model
nlp = spacy.load('en_core_web_sm')

# Add the SpacyTextBlob extension to the pipeline
nlp.add_pipe('spacytextblob', last=True)

# Read in the lyrics data from the saved JSON file
with open("lyrics_data.json", "r") as json_file:
    lyrics_data = json.load(json_file)

# Extract the lyrics from the dictionary
lyrics = lyrics_data["lyrics"]

# Print the lyrics
print("Lyrics of the song:")
print(lyrics)

# Perform sentiment analysis using spaCyTextBlob
doc = nlp(lyrics)

# Get the polarity score (from the TextBlob object in the doc)
polarity_score = doc._.blob.polarity

# Print the polarity score
print(f"Polarity Score: {polarity_score}")

# Interpretation of the polarity score
# Polarity score is between [-1.0, 1.0]
# A score closer to 1.0 indicates positive sentiment, while closer to -1.0 indicates negative sentiment.
# Given the polarity score, the lyrics seem to have a more positive or negative connotation based on this score.

# Comment based on sentiment analysis
if polarity_score > 0:
    print("The lyrics have a more positive connotation.")
else:
    print("The lyrics have a more negative connotation.")

Lyrics of the song:
I'm your only friend 
I'm not your only friend 
But I'm a little glowing friend 
But really I'm not actually your friend 
But I am 


Blue canary in the outlet by the light switch 

Who watches over you 

Make a little birdhouse in your soul 

Not to put too fine a point on it 

Say I'm the only bee in your bonnet 

Make a little birdhouse in your soul 



I have a secret to tell 

From my electrical well 

It's a simple message and I'm leaving out the whistles and bells 

So the room must listen to me 

Filibuster vigilantly 

My name is blue canary one note* spelled l-i-t-e 

My story's infinite 

Like the Longines Symphonette it doesn't rest 



Blue canary in the outlet by the light switch 

Who watches over you 

Make a little birdhouse in your soul 

Not to put too fine a point on it 

Say I'm the only bee in your bonnet 

Make a little birdhouse in your soul 



I'm your only friend 

I'm not your only friend 

But I'm a little glowing friend 

But really I'm

3. Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

## Question 3

In [3]:
import requests
import json

# Function to fetch song lyrics and save them to a file
def get_song_lyrics(artist, song, filename):
    # Build the URL to get the lyrics from the lyrics.ovh API
    url = f'https://api.lyrics.ovh/v1/{artist}/{song}'
    
    try:
        # Make the GET request to fetch the lyrics
        response = requests.get(url)
        
        # Check if the request was successful
        if response.status_code == 200:
            # Parse the response JSON to get the lyrics
            lyrics_data = response.json()
            
            # Save the lyrics data to the specified filename
            with open(filename, 'w') as json_file:
                json.dump(lyrics_data, json_file)
            print(f"Lyrics for {song} by {artist} saved to {filename}")
        else:
            print(f"Error: Unable to fetch lyrics for {song} by {artist}. HTTP status code: {response.status_code}")
    
    except Exception as e:
        print(f"Error occurred: {e}")

# Test the function with four songs
get_song_lyrics('Radiohead', 'Creep', 'creep_lyrics.json')
get_song_lyrics('Pharrell Williams', 'Happy', 'happy_lyrics.json')
get_song_lyrics('Dr. Dog', "Where'd All the Time Go?", 'whered_all_the_time_go_lyrics.json')
get_song_lyrics('Taylor Swift', 'Shake It Off', 'shake_it_off_lyrics.json')
get_song_lyrics('The Beatles', 'Hey Jude', 'hey_jude_lyrics.json')

Lyrics for Creep by Radiohead saved to creep_lyrics.json
Lyrics for Happy by Pharrell Williams saved to happy_lyrics.json
Lyrics for Where'd All the Time Go? by Dr. Dog saved to whered_all_the_time_go_lyrics.json
Lyrics for Shake It Off by Taylor Swift saved to shake_it_off_lyrics.json
Lyrics for Hey Jude by The Beatles saved to hey_jude_lyrics.json


4. Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

## Question 4

In [4]:
import json
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

# Load spaCy model and add SpacyTextBlob pipeline
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe('spacytextblob', last=True)

# Function to perform sentiment analysis on song lyrics from a file
def analyze_lyrics_sentiment(filename):
    # Load the lyrics from the JSON file
    with open(filename, 'r') as json_file:
        lyrics_data = json.load(json_file)
    
    # Extract the lyrics
    lyrics = lyrics_data.get("lyrics", "")
    
    # Perform sentiment analysis using spaCyTextBlob
    doc = nlp(lyrics)
    
    # Get the polarity score using the 'blob' object provided by SpacyTextBlob
    polarity_score = doc._.blob.polarity  # Correct way to access polarity
    return polarity_score

# File names for the three songs (as created in question 3)
song_files = [
    'creep_lyrics.json',
    'happy_lyrics.json',
    'whered_all_the_time_go_lyrics.json',
    'shake_it_off_lyrics.json',
    'hey_jude_lyrics.json'
]

# Analyze sentiment for each song and print the result
for song_file in song_files:
    polarity = analyze_lyrics_sentiment(song_file)
    print(f"Polarity score for {song_file}: {polarity}")

# Commentary: 
# The polarity score ranges from -1.0 (negative) to 1.0 (positive). 
# Positive values indicate a positive sentiment, while negative values suggest a negative sentiment.
# Based on the lyrics of these songs, we can make an educated guess as to whether they convey more positive or negative emotions.


Polarity score for creep_lyrics.json: 0.5792857142857142
Polarity score for happy_lyrics.json: 0.49032258064516154
Polarity score for whered_all_the_time_go_lyrics.json: -0.0845982142857143
Polarity score for shake_it_off_lyrics.json: -0.47116745688174266
Polarity score for hey_jude_lyrics.json: 0.13194444444444445


### Reflection on polarity scores
The polarities do not match up with my expectations. "Creep" by Radiohead is a very melancholic song, where the chorus says "I'm a creep, I'm a weirdo...", yet it scored higher than any other song, even higher than the song "Happy" which is ridiculous. After looking at the lyrics, the phrase "I wish I was special" and the word "special" were repeated throughout, so that may have been a reason why it scored high. /m Also, "Shake it Off" by Taylor Swift scored very low compared to the rest, at -0.47. There may have been some sarcasm throughout the songs that may have confused the Spacy pipeline, so it would be interesting to further research this.