# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

### Student Name: Beth Harvey

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

In [None]:
# Create and activate a Python virtual environment. 
# Before starting the project, try all these imports FIRST
# Address any errors you get running this code cell 
# by installing the necessary packages into your active Python environment.
# Try to resolve issues using your materials and the web.
# If that doesn't work, ask for help in the discussion forums.
# You can't complete the exercises until you import these - start early! 
# We also import json and pickle (included in the Python Standard Library).

import json
import pickle

import requests
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

print('All prereqs installed.')
!pip list


All prereqs installed.
Package            Version
------------------ --------
appnope            0.1.3
asttokens          2.2.1
backcall           0.2.0
beautifulsoup4     4.12.2
blis               0.7.9
catalogue          2.0.8
certifi            2023.5.7
charset-normalizer 3.2.0
click              8.1.5
comm               0.1.3
confection         0.1.0
cymem              2.0.7
debugpy            1.6.7
decorator          5.1.1
en-core-web-sm     3.6.0
executing          1.2.0
idna               3.4
importlib-metadata 6.8.0
ipykernel          6.24.0
ipython            8.14.0
jedi               0.18.2
Jinja2             3.1.2
joblib             1.3.1
jupyter_client     8.3.0
jupyter_core       5.3.1
langcodes          3.3.0
lyricsgenius       3.0.1
MarkupSafe         2.1.3
matplotlib-inline  0.1.6
murmurhash         1.0.9
nest-asyncio       1.5.6
nltk               3.8.1
numpy              1.25.1
packaging          23.1
parso              0.8.3
pathy              0.10.2
pexpect         

### Question 1. 
The following code accesses the [lyrics.ovh](https://lyricsovh.docs.apiary.io/#reference/0/lyrics-of-a-song/search) public api, searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

In [4]:
import requests
import json

#result = json.loads(requests.get('https://api.lyrics.ovh/v1/They Might Be Giants/Birdhouse in your soul').text)

import lyricsgenius
# Create lyricsgenius object
with open('genius_token.txt', 'r') as token_file:
    token = token_file.readline()

genius = lyricsgenius.Genius(token)

# Find artist
artist = genius.search_artist("They Might Be Giants", max_songs=3, sort="title")

# # Get song
song = artist.song('Birdhouse in your soul')

# # Save lyrics
lyrics = song.lyrics

# # Create dictionary of song info
song_dict = {
    'artist': 'They Might Be Giants',
    'title': 'Birdhouse in your soul',
    'song_lyrics': lyrics
# }

with open('birdhouse_lyrics.json', 'w') as new_file:
    json.dump(song_dict, new_file)

Searching for songs by They Might Be Giants...

Song 1: "200 Sbemails (for Homestar Runner)"
Song 2: "2082"
Song 3: "25 O’Clock"

Reached user-specified song limit (3).
Done. Found 3 songs.


### Question 2. 
Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

In [4]:
# Read file
with open ('birdhouse_lyrics.json', 'r') as file:
    dictionary = json.load(file)

# Print lyrics
print(dictionary['song_lyrics'])

52 ContributorsBirdhouse in Your Soul Lyrics[Bridge]
I'm your only friend
I'm not your only friend
But I'm a little glowing friend
But really I'm not actually your friend
But I am

[Chorus]
Blue canary in the outlet by the light switch
Who watches over you
Make a little birdhouse in your soul
Not to put too fine a point on it
Say I'm the only bee in your bonnet
Make a little birdhouse in your soul

[Verse 1]
I have a secret to tell
From my electrical well
It's a simple message and I'm
Leaving out the whistles and bells
So the room must listen to me
Filibuster vigilantly
My name is blue canary
One note spelled l-i-t-e
My story's infinite
Like the Longines Symphonette
It doesn't rest
See They Might Be Giants LiveGet tickets as low as $61You might also like[Chorus]
Blue canary in the outlet by the light switch
Who watches over you
Make a little birdhouse in your soul
Not to put too fine a point on it
Say I'm the only bee in your bonnet
Make a little birdhouse in your soul

[Bridge]
I'm yo

In [53]:
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

# Load pipeline package
nlp = spacy.load('en_core_web_lg')

# Add Spacy Text Blob to pipeline
nlp.add_pipe('spacytextblob')

# Save lyrics as variable
text = dictionary['song_lyrics']

# Apply sentiment analysis to text
doc = nlp(text)

print('Polarity Score: ', doc._.blob.polarity)

"""
The polarity score is barely over zero, indicating that the lyrics have a very slight positive connotation. However, it's close
enough to zero that I would consider the lyrics neutral overall. As I read through the lyrics myself, this makes sense to me
because it's difficult to tell what the song is about without more context. There isn't anything blatantly positive or negative
in the lyrics at first glance."""

Polarity Score:  0.02575757575757576


"\nThe polarity score is barely over zero, indicating that the lyrics have a very slight positive connotation. However, it's close\nenough to zero that I would consider the lyrics neutral overall. As I read through the lyrics myself, this makes sense to me\nbecause it's difficult to tell what the song is about without more context. There isn't anything blatantly positive or negative\nin the lyrics at first glance."

### Question 3. 
Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

In [25]:
# Define Function
def write_song_from_api_to_json(artist, song, filename):
    # Look up song details
    song_details = genius.search_song(song, artist)
    # Save song lyrics
    song_lyrics = song_details.lyrics
    # Write lyrics to file
    with open(filename, 'w') as file:
        json.dump(song_lyrics, file)

In [26]:
# Test function 1
write_song_from_api_to_json('Colony House', 'Canonballers', 'canonballers_lyrics.json')

Searching for "Canonballers" by Colony House...
Done.


In [27]:
# Test function 2
write_song_from_api_to_json('AJR', 'Way Less Sad', 'way_less_sad_lyrics.json')

Searching for "Way Less Sad" by AJR...
Done.


In [62]:
# Test function 3
write_song_from_api_to_json('Owl City', 'Thunderstruck', 'thunderstruck_lyrics.json')

Searching for "Thunderstruck" by Owl City...
Done.


In [24]:
# Test function 4
write_song_from_api_to_json('The Struts', 'In Love With A Camera', 'in_love_with_a_camera_lyrics.json')

Searching for "In Love With A Camera" by The Struts...
Done.


### Question 4. 
Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

In [55]:
# Define function
def lyrics_sentiment_analysis(filename, songname):
    # Read lyric file
    with open(filename, 'r') as file:
        song_lyrics = json.load(file)
    # Create spacy sentiment object
    nlp = spacy.load('en_core_web_lg')
    # Add spacy text blob to pipeline
    nlp.add_pipe('spacytextblob')
    # Apply sentiment analysis to lyrics
    sentiment_analysis = nlp(song_lyrics)
    # Get polarity score
    polarity_score = sentiment_analysis._.blob.polarity
    return songname, polarity_score

In [56]:
# Test function 1
score_1 = lyrics_sentiment_analysis('canonballers_lyrics.json', 'Canonballers')
print('Title: ', score_1[0], '\n', 'Polarity Score: ', score_1[1])

Title:  Canonballers 
 Polarity Score:  -0.3080469794755509


I'm surprised that this song has a negative polarity score. To me, it sounds like an upbeat, summer/road trip type song. I assume the score is low because one of the repeated phrases in the chorus uses the phrase "can't borrow time," which on its own sounds regretful or sad.

In [57]:
# Test function 2
score_2 = lyrics_sentiment_analysis('way_less_sad_lyrics.json', 'Way Less Sad')
print('Title: ', score_2[0], '\n', 'Polarity Score: ', score_2[1])

Title:  Way Less Sad 
 Polarity Score:  -0.026028869778869767


While I consider this song fun, I'm not surprised that the score is negative. The overall feel of the song is upbeat, but the lyrics still sound a little negative. As the title, Way Less Sad, implies, it's about starting to feel better after feeling sad or depressed for an extended period of time. Part of the chorus is the phrase "I'm not happy yet, but I'm way less sad." It's a hopeful, but realistic message that someone's mood doesn't turn around immediately, but progress is still good. Also, the score is still close to 0, which indicates that it's not that negative.

In [63]:
# Test function 3
score_3 = lyrics_sentiment_analysis('thunderstruck_lyrics.json', 'Thunderstruck')
print('Title: ', score_3[0], '\n', 'Polarity Score: ', score_3[1])

Title:  Thunderstruck 
 Polarity Score:  -0.0030864197530864226


The score for this song is very close to 0, which is surprising to me, since it seems like either a love song or at least a song about feeling very strongly about someone. My only guess is the repeated use of the phrase "my dreams are shattered" may have lowered the score. I definitely would have guessed the score would have been higher than this.

In [59]:
# Test function 4
score_4 = lyrics_sentiment_analysis('in_love_with_a_camera_lyrics.json', 'In Love With A Camera')
print('Title: ', score_4[0], '\n', 'Polarity Score: ', score_4[1])

Title:  In Love With A Camera 
 Polarity Score:  0.4342857142857143


The score for this song makes sense to me. It's an upbeat song about someone loving the camera/taking pictures of themselves. It's not portrayed in a negative or self-obsessed way, it's more like someone is talking about a trait of someone they care about. It's a fun song (in my opinion) with positive lyrics, and I think that's reflected accurately by this score.

In [64]:
!jupyter nbconvert --to html requests-json-nlp.ipynb

[NbConvertApp] Converting notebook requests-json-nlp.ipynb to html
[NbConvertApp] Writing 615249 bytes to requests-json-nlp.html
