# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

### Student Name: Kristen Finley

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

In [67]:
# Create and activate a Python virtual environment. 
# Before starting the project, try all these imports FIRST
# Address any errors you get running this code cell 
# by installing the necessary packages into your active Python environment.
# Try to resolve issues using your materials and the web.
# If that doesn't work, ask for help in the discussion forums.
# You can't complete the exercises until you import these - start early! 
# We also import json and pickle (included in the Python Standard Library).

import json
import pickle

import requests
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

result = json.loads(requests.get('https://api.lyrics.ovh/v1/They Might Be Giants/Birdhouse in your soul').text)

print('All prereqs installed.')
!pip list

All prereqs installed.
Package                   Version
------------------------- ------------
annotated-types           0.6.0
anyio                     4.2.0
appnope                   0.1.2
argon2-cffi               21.3.0
argon2-cffi-bindings      21.2.0
asttokens                 2.0.5
async-lru                 2.0.4
attrs                     23.1.0
Babel                     2.11.0
beautifulsoup4            4.12.2
bleach                    4.1.0
blis                      0.7.9
Brotli                    1.0.9
catalogue                 2.0.10
certifi                   2024.2.2
cffi                      1.16.0
charset-normalizer        2.0.4
click                     8.1.7
cloudpathlib              0.16.0
colorama                  0.4.6
comm                      0.2.1
confection                0.1.4
contourpy                 1.2.0
cycler                    0.11.0
cymem                     2.0.6
debugpy                   1.6.7
decorator                 5.1.1
defusedxml                0.

1. The following code accesses the [lyrics.ovh](https://lyricsovh.docs.apiary.io/#reference/0/lyrics-of-a-song/search) public api, searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

In [68]:
#Example using pickle file

def get_lyrics(artist, title):
    url = f"https://api.lyrics.ovh/v1/{artist}/{title}"
    response = requests.get(url)
    data = response.json()
    return data

def main():
    artist = input("Enter the artist's name: ")
    title = input("Enter the song's title: ")
    
    # Get lyrics
    lyrics_data = get_lyrics(artist, title)
    
    # Write to pickle file
    with open("lyrics_data.pickle", "wb") as f:
        pickle.dump(lyrics_data, f)
    
    print("Lyrics data has been written to 'lyrics_data.pickle'.")

if __name__ == "__main__":
    main()


Lyrics data has been written to 'lyrics_data.pickle'.


In [69]:
#Example using JSON file

def get_lyrics(artist, title):
    url = f"https://api.lyrics.ovh/v1/{artist}/{title}"
    response = requests.get(url)
    data = response.json()
    return data

def main():
    artist = input("Enter the artist's name: ")
    title = input("Enter the song's title: ")
    
    # Get lyrics
    lyrics_data = get_lyrics(artist, title)
    
    # Write to JSON file
    with open("lyrics_data.json", "w") as f:
        json.dump(lyrics_data, f)
    
    print("Lyrics data has been written to 'lyrics_data.json'.")

if __name__ == "__main__":
    main()



Lyrics data has been written to 'lyrics_data.json'.


1. Continued:

Important: Use a Different API
The current instructions are no longer working. APIs are notorious for changing out from under us. The provided link no longer returns lyrics. 

There are other ways to get song-lyric or poem data in JSON format. Any lyrics or poem data in JSON format will do. To proceed:

- Try this instead: https://gist.github.com/neloe/e38e4d3283418e096aac80fa48bb66bd Instead of author/poem, use artist/song to fetch your content.
- Or try this:  lyricsgenius · PyPI. The site gives good instructions. When you create JSON documents it pulls everything from the search for the song, so  just manually go through and find the lyrics and delete the rest.  You have to create an access token for it to work. 
- Or this - provide a value for the variables artist and song - inspect the result (you'll need to convert xml to json):

```python:
result = requests.get('http://api.chartlyrics.com/apiv1.asmx/SearchLyricDirect?artist='+artist+'&song='+song).text]
```

Then
- Review these examples to learn how easy it is to access public APIs from Python.
- Check out: https://github.com/public-apis/public-apis. to explore other free public APIs.
Web requests are key skills in data mining. Don't miss out!  

In [70]:

AUTHOR='Edgar Allan Poe'
POEM = 'A Dream Within A Dream'

#only certain poets and titles are available
#to see the available poets, go to (in a web browser)
# https://poetrydb.org/author
#To see which poems that author has available, go to 
# https://poetrydb.org/author/AUTHOR NAME
# e.g.: https://poetrydb.org/author/Edgar Allan Poe
#The spaces will get handled by your web browser

# A cool pythonism (introduced in Python 3): f strings
# https://docs.python.org/3/tutorial/inputoutput.html#tut-f-strings
URL = f'https://poetrydb.org/author,title/{AUTHOR};{POEM}'
result = json.loads(requests.get(URL).text)
poem = '\n'.join(result[0]['lines'])

# write the result to a JSON file
with open('poem.json', 'w') as file:
    json.dump(result, file)

# Print the poem to get results on the screen
poem = '\n'.join(result[0]['lines'])
print(poem)

Take this kiss upon the brow!
And, in parting from you now,
Thus much let me avow--
You are not wrong, who deem
That my days have been a dream:
Yet if hope has flown away
In a night, or in a day,
In a vision or in none,
Is it therefore the less _gone_?
_All_ that we see or seem
Is but a dream within a dream.

I stand amid the roar
Of a surf-tormented shore,
And I hold within my hand
Grains of the golden sand--
How few! yet how they creep
Through my fingers to the deep
While I weep--while I weep!
O God! can I not grasp
Them with a tighter clasp?
O God! can I not save
_One_ from the pitiless wave?
Is _all_ that we see or seem
But a dream within a dream?


2. Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

In [71]:
# pickle file example
# Load spaCy model with TextBlob
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("spacytextblob")

def load_lyrics_data():
    with open("lyrics_data.pickle", "rb") as f:
        lyrics_data = pickle.load(f)
    return lyrics_data

def main():
    # Load lyrics data
    lyrics_data = load_lyrics_data()
    
    # Extract lyrics
    lyrics = lyrics_data.get("lyrics", "")
    
    # Print the lyrics of the song
    print("Lyrics of the song:")
    print(lyrics)
    
    # Perform sentiment analysis
    doc = nlp(lyrics)
    polarity_score = doc._.polarity
    
    # Print the polarity score
    print("\nPolarity score:", polarity_score)
    
    # Determine sentiment
    if polarity_score > 0:
        print("The lyrics have a more positive connotation.")
    elif polarity_score < 0:
        print("The lyrics have a more negative connotation.")
    else:
        print("The lyrics are neutral.")

if __name__ == "__main__":
    main()


Lyrics of the song:
Paroles de la chanson All Too Well par Taylor Swift
I walked through the door with you, the air was cold
But something about it felt like home somehow
And I left my scarf there at your sister's house
And you still got it in your drawer, even now

Oh, your sweet disposition and my wide-eyed gaze
We're singing in the car getting lost upstate
Autumn leaves falling down like pieces into place
And I can picture it after all these days

And I know it's long gone
And that magic's not here no more
And I might be okay
But I'm not fine at all


'Cause there we are again, on that little town street
You almost ran the red 'cause you were looking over me
Wind in my hair, I was there, I remember it all too well

Photo album on the counter, your cheeks were turning red
You used to be a little kid with glasses in a twin-size bed
Your mother's telling stories about you on the tee ball team
You tell me about your past, thinking your future was me

And I know it's long gone
And there'

In [72]:
# JSON file example
# Load spaCy model with TextBlob
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("spacytextblob")

def load_lyrics_data():
    with open("lyrics_data.json", "r") as f:
        lyrics_data = json.load(f)
    return lyrics_data

def main():
    # Load lyrics data
    lyrics_data = load_lyrics_data()
    
    # Extract lyrics
    lyrics = lyrics_data.get("lyrics", "")
    
    # Print the lyrics of the song
    print("Lyrics of the song:")
    print(lyrics)
    
    # Perform sentiment analysis
    doc = nlp(lyrics)
    polarity_score = doc._.polarity
    
    # Print the polarity score
    print("\nPolarity score:", polarity_score)
    
    # Determine sentiment
    if polarity_score > 0:
        print("The lyrics have a more positive connotation.")
    elif polarity_score < 0:
        print("The lyrics have a more negative connotation.")
    else:
        print("The lyrics are neutral.")

if __name__ == "__main__":
    main()


Lyrics of the song:
Paroles de la chanson willow par Taylor Swift
I'm like the water when your ship rolled in that night
Rough on the surface, but you cut through like a knife
And if it was an open-shut case
I never would have known from the look on your face
Lost in your current like a priceless wine

The more that you say, the less I know
Wherever you stray, I follow
I'm begging for you to take my hand
Wreck my plans, that's my man

Life was a willow and it bent right to your wind
Head on the pillow, I can feel you sneakin' in

'Cause if you are a mythical thing
Like you were a trophy or a champion ring
But there was one prize I'd cheat to win

The more that you say, the less I know
Wherever you stray, I follow
I'm begging for you to take my hand
Wreck my plans, that's my man
You know that my train could take you home
Anywhere else is hollow
I'm begging for you to take my hand
Wreck my plans, that's my man

Life was a willow and it bent right to your wind
They count me out time and t

3. Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

In [73]:
def get_lyrics_write_file(artist, song, filename):
    url = f"https://api.lyrics.ovh/v1/{artist}/{song}"
    response = requests.get(url)
    data = response.json()
    
    if 'lyrics' in data:
        with open(filename, 'w') as f:
            f.write(data['lyrics'])
        print(f"Lyrics for {song} by {artist} have been saved to {filename}")
    else:
        print(f"Failed to retrieve lyrics for {song} by {artist}")

# Test the function with the specified songs
get_lyrics_write_file("The Beatles", "let it be", "letitbe_lyrics.txt")
get_lyrics_write_file("Kacey Musgraves", "Biscuits", "biscuits_lyrics.txt")
get_lyrics_write_file("Harry Styles", "Sign of the times", "signofthetimes_lyrics.txt")
get_lyrics_write_file("Taylor Swift", "cruel summer", "cruelsummer_lyrics.txt")
get_lyrics_write_file("Alanis Morissette", "You Oughta Know", "yououghtaknow_lyrics.txt")

Lyrics for let it be by The Beatles have been saved to letitbe_lyrics.txt
Lyrics for Biscuits by Kacey Musgraves have been saved to biscuits_lyrics.txt
Lyrics for Sign of the times by Harry Styles have been saved to signofthetimes_lyrics.txt
Lyrics for cruel summer by Taylor Swift have been saved to cruelsummer_lyrics.txt
Lyrics for You Oughta Know by Alanis Morissette have been saved to yououghtaknow_lyrics.txt


4. Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

In [74]:
# Load spaCy model with TextBlob
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("spacytextblob")

def perform_sentiment_analysis(filename):
    with open(filename, 'r') as f:
        lyrics = f.read()
    
    # Perform sentiment analysis
    doc = nlp(lyrics)
    polarity_score = doc._.polarity
    
    return polarity_score

# Test the function with the three files created in question 3
filenames = ["letitbe_lyrics.txt", "biscuits_lyrics.txt", "signofthetimes_lyrics.txt", "cruelsummer_lyrics.txt","yououghtaknow_lyrics.txt"]
for filename in filenames:
    song_name = filename.split("_")[0].title()  # Extract song name from filename
    polarity_score = perform_sentiment_analysis(filename)
    print(f"Polarity score for '{song_name}': {polarity_score}")


Polarity score for 'Letitbe': 0.09714285714285714
Polarity score for 'Biscuits': 0.4301169590643273
Polarity score for 'Signofthetimes': 0.15202020202020197
Polarity score for 'Cruelsummer': -0.12630282415996702
Polarity score for 'Yououghtaknow': 0.09361861861861862


Now, let's address the question regarding whether the reported polarity matches the understanding of the song's lyrics:

- Polarity score for 'Letitbe': 0.09714285714285714
    - Agree 
    - While this song has a somewhat bittersweet sentiment to me, it is slightly more positive than negative.
- Polarity score for 'Biscuits': 0.4301169590643273
    - Agree
    - This song essentially says that people can all get along if everyone just minds their own business. It has a bright, cheery melody. I would consider this a positive song.
- Polarity score for 'Signofthetimes': 0.15202020202020197
    - Disagree
    - I've seen interpretations of this song being that it is about gun violence, getting over a bad breakup, World War III, a mother dying after childbirth, and suicide. I haven't seen any interpretations that have a positive sentiment. I also do not get a positive vibe from this song.
- Polarity score for 'Cruelsummer': -0.12630282415996702
    - Disagree
    - Taylor said: "This song is one that I wrote about the feeling of a summer romance, and how often times a summer romance can be layered with all these feelings of, like, pining away and sometimes even secrecy. It deals with the idea of being in a relationship where there’s some element of desperation and pain in it, where you’re yearning for something that you don’t quite have yet, it’s just right there, and you just, like, can’t reach it. So, this has some of my favorite lyrics on it." I wouldn't necessarily call this a positive sentiment. I would say it is more neutral to negative.
- Polarity score for 'Yououghtaknow': 0.09361861861861862
    - Strongly disagree
    - I added this song because it epitomizes the meaning of a negative song in my mind. I was curious if the algorithm would agree. It did not. It is apparent that it did not take sarcasm into consideration when assessing the sentiment.

The reported polarity score from sentiment analysis provides a quantitative measure of the overall sentiment of the lyrics. However, it's essential to consider that sentiment analysis algorithms like spaCyTextBlob analyze text based on predefined rules and patterns, which may not always perfectly capture the full context, nuances, or subjective interpretations of human emotions conveyed in lyrics.

Furthermore, the interpretation of song lyrics can vary greatly among individuals due to personal experiences, cultural backgrounds, and other factors. Therefore, while the reported polarity score can provide some insight into the overall sentiment of the lyrics, it may not always perfectly align with everyone's understanding or perception of the song.

