# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

### Student Name: Derek Graves

### GitHup Repo: https://github.com/dgraves4/json-sentiment

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

In [116]:
import json
import pickle

import requests
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

print('All prereqs installed.')
!pip list

All prereqs installed.
Package            Version
------------------ -----------
annotated-types    0.6.0
asttokens          2.4.1
blis               0.7.11
catalogue          2.0.10
certifi            2024.2.2
charset-normalizer 3.3.2
click              8.1.7
cloudpathlib       0.16.0
colorama           0.4.6
comm               0.2.2
confection         0.1.4
cymem              2.0.8
debugpy            1.8.1
decorator          5.1.1
en-core-web-sm     3.7.1
executing          2.0.1
idna               3.6
ipykernel          6.29.3
ipython            8.22.2
jedi               0.19.1
Jinja2             3.1.3
joblib             1.3.2
jupyter_client     8.6.1
jupyter_core       5.7.2
langcodes          3.3.0
MarkupSafe         2.1.5
matplotlib-inline  0.1.6
murmurhash         1.0.10
nest-asyncio       1.6.0
nltk               3.8.1
numpy              1.26.4
packaging          24.0
parso              0.8.3
pip                24.0
platformdirs       4.2.0
preshed            3.0.9
prompt-toolk

#### Question 1 

1. The following code accesses the [lyrics.ovh](https://lyricsovh.docs.apiary.io/#reference/0/lyrics-of-a-song/search) public api, searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

In [117]:
import requests
import json

result = json.loads(requests.get('https://api.lyrics.ovh/v1/They Might Be Giants/Birdhouse in your soul').text)


#### Question 1 Response:

In [118]:

# Fetching the lyrics and storing them in a dictionary object
result = json.loads(requests.get('https://api.lyrics.ovh/v1/They Might Be Giants/Birdhouse in your soul').text)

# Writing the JSON data to a file
with open('birdhouse_lyrics.json', 'w') as file:
    json.dump(result, file)

#### Question 2

Read in the contents of your file. Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics. Print the polarity score of the sentiment analysis. Given that the range of the polarity score is [-1.0,1.0] which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion? Answer this question in a comment in your code cell.

#### Question 2 Response: 

In [119]:
# Load the JSON file containing the lyrics
with open('birdhouse_lyrics.json', 'r') as file:
    data = json.load(file)
    lyrics = data['lyrics']

# Print the lyrics
print("Lyrics:")
print(lyrics)

# Perform sentiment analysis using spaCyTextBlob
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe('spacytextblob')
doc = nlp(lyrics)
polarity_score = doc._.polarity

# Print the polarity score
print("Polarity Score:", polarity_score)

# Based on a polarity score of approximately 0.04, we can infer that the sentiment of the lyrics is slightly positive, but closer to neutral overall. 


Lyrics:
Paroles de la chanson Birdhouse In Your Soul par They Might Be Giants
I'm Your Only Friend

I'm Not Your Only Friend

But I'm A Little Glowing Friend

But Really I'm Not Actually Your Friend

But I Am

Blue Canary In The Outlet By The Light Switch

Who Watches Over You

Make A Little Birdhouse In Your Soul

Not To Put Too Fine A Point On It

Say I'm The Only Bee In Your Bonnet

Make A Little Birdhouse In Your Soul

I Have A Secret To Tell

From My Electrical Well

It's A Simple Message And I'm Leaving Out The Whistles And Bells

So The Room Must Listen To Me


Filibuster Vigilantly

My Name Is Blue Canary One Note Spelled Lite

My Story's Infinite

Like The Longines Symphonette It Doesn't Rest

Blue Canary In The Outlet By The Light Switch

Who Watches Over You

Make A Little Birdhouse In Your Soul

Not To Put Too Fine A Point On It

Say I'm The Only Bee In Your Bonnet

Make A Little Birdhouse In Your Soul

I'm Your Only Friend

I'm Not Your Only Friend

But I'm A Little Glowin

3. Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

In [120]:
def get_lyrics_and_save(artist, song, filename):
    # Constructing the URL for the lyrics API
    url = f'https://api.lyrics.ovh/v1/{artist}/{song}'
    
    # Making a GET request to the API
    response = requests.get(url)
    
    # Checking if the request was successful (status code 200)
    if response.status_code == 200:
        # Extracting the lyrics from the response
        data = response.json()
        lyrics = data.get('lyrics')
        
        # Writing the lyrics to the specified file
        with open(filename, 'w') as file:
            file.write(lyrics)
        print(f"Lyrics for '{song}' by {artist} saved to '{filename}'.")
    else:
        # Handling different types of errors
        if response.status_code == 404:
            print(f"Failed to fetch lyrics for '{song}' by {artist}. Song not found.")
        else:
            print(f"Failed to fetch lyrics for '{song}' by {artist}. Error: {response.status_code}")

# Testing the function with four new songs
get_lyrics_and_save('Johnny Cash.', 'Hurt', 'hurt_lyrics.txt')
get_lyrics_and_save('Survivor', 'Eye of the Tiger', 'eye_of_the_tiger_lyrics.txt')
get_lyrics_and_save('Gloria Gaynor', 'I Will Survive', 'i_will_survive_lyrics.txt')
get_lyrics_and_save('The Beatles', 'Here Comes the Sun', 'here_comes_the_sun_lyrics.txt')


Lyrics for 'Hurt' by Johnny Cash. saved to 'hurt_lyrics.txt'.
Lyrics for 'Eye of the Tiger' by Survivor saved to 'eye_of_the_tiger_lyrics.txt'.
Lyrics for 'I Will Survive' by Gloria Gaynor saved to 'i_will_survive_lyrics.txt'.
Lyrics for 'Here Comes the Sun' by The Beatles saved to 'here_comes_the_sun_lyrics.txt'.


#### Question 4

4. Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

#### Question 4 Response - Part 1:

In [121]:
def analyze_sentiment(filename, song_title):
    # Load the file containing song lyrics
    with open(filename, 'r') as file:
        lyrics = file.read()
    
    # Perform sentiment analysis using spaCyTextBlob
    nlp = spacy.load('en_core_web_sm')
    nlp.add_pipe('spacytextblob')
    doc = nlp(lyrics)
    polarity_score = doc._.polarity
    
    # Print the polarity score with the full song title
    print(f"Polarity score for '{song_title}': {polarity_score}")
    
    return polarity_score

# Testing the function with the three files created in question 3
file_titles = [
    ('hurt_lyrics.txt', 'Hurt by Johnny Cash'),
    ('eye_of_the_tiger_lyrics.txt', 'Eye of the Tiger by Survivor'),
    ('i_will_survive_lyrics.txt', 'I Will Survive by Gloria Gaynor')
]

for file_name, title in file_titles:
    polarity_score = analyze_sentiment(file_name, title)



Polarity score for 'Hurt by Johnny Cash': 0.06662257495590829
Polarity score for 'Eye of the Tiger by Survivor': 0.07175925925925926
Polarity score for 'I Will Survive by Gloria Gaynor': 0.0350657367324034


#### Question 4 Response - Part 2: 

Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be? 

The polarity scores for 'Eye of the Tiger' and 'I Will Survive' align well with the positive themes of these songs, reflecting their messages of empowerment and resilience. However, I would have anticipated even higher scores closer to 1.0 on the positivity scale. The lower scores might be attributed to the algorithm's individual interpretation or the complexity of the lyrics.

The polarity score for 'Hurt' is unexpected considering its somber and melancholic lyrics. While the algorithm assigns a more neutral score, this discrepancy could stem from the song's nuanced emotional content and the algorithm's interpretation of certain phrases and words. It seems to highlight some of the limitations of sentiment analysis algorithms in dealing with human emotional themes in songs and lyrics.

Overall, while polarity scores offer valuable insights, they should be interpreted with caution, especially for songs with complex and emotional "human" themes.   As a potential remedy for polarity scores, perhaps one could search out benchmarks for data for polarity scores that were generated by other sentiment algorithms out there to see how this particular one stacks up. 