# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

### Student Name: Kim Leach https://github.com/Kleach112/620-mod4

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

In [3]:
# Create and activate a Python virtual environment. 
# Before starting the project, try all these imports FIRST
# Address any errors you get running this code cell 
# by installing the necessary packages into your active Python environment.
# Try to resolve issues using your materials and the web.
# If that doesn't work, ask for help in the discussion forums.
# You can't complete the exercises until you import these - start early! 
# We also import json and pickle (included in the Python Standard Library).

import json
import pickle

import requests
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

print('All prereqs installed.')
!pip list



All prereqs installed.
Package                Version
---------------------- ---------
anyio                  3.5.0
argon2-cffi            21.3.0
argon2-cffi-bindings   21.2.0
astroid                2.11.7
asttokens              2.0.5
attrs                  21.4.0
Babel                  2.9.1
backcall               0.2.0
beautifulsoup4         4.11.1
bleach                 4.1.0
blis                   0.7.9
Bottleneck             1.3.5
brotlipy               0.7.0
catalogue              2.0.8
certifi                2022.6.15
cffi                   1.15.1
charset-normalizer     2.0.4
click                  8.1.4
colorama               0.4.6
conda                  4.13.0
conda-package-handling 1.8.1
confection             0.1.0
cryptography           37.0.1
cycler                 0.11.0
cymem                  2.0.7
debugpy                1.5.1
decorator              5.1.1
defusedxml             0.7.1
dill                   0.3.5.1
dodgy                  0.2.1
entrypoints            0.4
e

1. The following code accesses the [lyrics.ovh](https://lyricsovh.docs.apiary.io/#reference/0/lyrics-of-a-song/search) public api, searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

## Question 1

In [33]:
import requests
import json
from bs4 import BeautifulSoup

url = "https://genius-song-lyrics1.p.rapidapi.com/song/lyrics/"
querystring = {"id": "4625737"}

headers = {
    "X-RapidAPI-Key": "10317f81eemshff864e28c131be0p133cf4jsn748ec0cc2545",
    "X-RapidAPI-Host": "genius-song-lyrics1.p.rapidapi.com"
}

response = requests.get(url, headers=headers, params=querystring)
data = response.json()

# Check if the lyrics are present in the API response
if "lyrics" in data and "lyrics" in data["lyrics"]:
    html_lyrics = data["lyrics"]["lyrics"]["body"]["html"]

    # Remove HTML tags from the lyrics using BeautifulSoup
    soup = BeautifulSoup(html_lyrics, "html.parser")
    lyrics = soup.get_text()

    # Print the lyrics of the song
    print("Lyrics of the Song:")
    print(lyrics)

    # Writing the lyrics data to a JSON file
    with open("lyrics.json", "w") as json_file:
        json.dump({"lyrics": lyrics}, json_file)
else:
    print("Lyrics Not Found")


Lyrics of the Song:
[Verse 1]
You are somebody that I don't know
But you're takin' shots at me like it's Patrón
And I'm just like, damn, it's 7 AM
Say it in the street, that's a knock-out
But you say it in a Tweet, that's a cop-out
And I'm just like, "Hey, are you okay?"

[Pre-Chorus]
And I ain't tryna mess with your self-expression
But I've learned a lesson that stressin' and obsessin' 'bout somebody else is no fun
And snakes and stones never broke my bones

[Chorus]
So oh-oh, oh-oh, oh-oh, oh-oh, oh-oh
You need to calm down, you're being too loud
And I'm just like oh-oh, oh-oh, oh-oh, oh-oh, oh-oh (Oh)
You need to just stop, like can you just not step on my gown?
You need to calm down

[Verse 2]
You are somebody that we don't know
But you're comin' at my friends like a missile
Why are you mad when you could be GLAAD? (You could be GLAAD)
Sunshine on the street at the parade
But you would rather be in the dark ages
Makin' that sign must've taken all night

[Pre-Chorus]
You just need t

2. Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

## Question 2

In [36]:
import json
import spacy
from textblob import TextBlob

# Read the contents of the JSON file
with open("lyrics.json", "r") as json_file:
    data = json.load(json_file)

# Extract the lyrics from the JSON data
lyrics = data["lyrics"]

# Print the lyrics of the song
print("Lyrics of the Song:")
print(lyrics)

# Load spaCy model
nlp = spacy.load("en_core_web_sm")

# Perform sentiment analysis using TextBlob
blob = TextBlob(lyrics)
polarity_score = blob.sentiment.polarity

# Print the polarity score of the sentiment analysis
print("Polarity Score:", polarity_score)

# Given that the scale is -1.0 to 1.0, this particular song seems fairly neutral but to the positive side. 
# It's interesting given the lyrics. I would probably say my personal interpretation of the lyrics would be the same. 

Lyrics of the Song:
[Verse 1]
You are somebody that I don't know
But you're takin' shots at me like it's Patrón
And I'm just like, damn, it's 7 AM
Say it in the street, that's a knock-out
But you say it in a Tweet, that's a cop-out
And I'm just like, "Hey, are you okay?"

[Pre-Chorus]
And I ain't tryna mess with your self-expression
But I've learned a lesson that stressin' and obsessin' 'bout somebody else is no fun
And snakes and stones never broke my bones

[Chorus]
So oh-oh, oh-oh, oh-oh, oh-oh, oh-oh
You need to calm down, you're being too loud
And I'm just like oh-oh, oh-oh, oh-oh, oh-oh, oh-oh (Oh)
You need to just stop, like can you just not step on my gown?
You need to calm down

[Verse 2]
You are somebody that we don't know
But you're comin' at my friends like a missile
Why are you mad when you could be GLAAD? (You could be GLAAD)
Sunshine on the street at the parade
But you would rather be in the dark ages
Makin' that sign must've taken all night

[Pre-Chorus]
You just need t

3. Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

## Question 3

In [2]:
import requests
import json
from bs4 import BeautifulSoup
import re

def sanitize_filename(name):
    # Remove any characters that are not suitable for file naming
    return re.sub(r'[<>:"/\\|?*]', '', name)

def get_lyrics_by_id(song_id, song_title):
    url = "https://genius-song-lyrics1.p.rapidapi.com/song/lyrics/"
    querystring = {"id": song_id}

    headers = {
        "X-RapidAPI-Key": "10317f81eemshff864e28c131be0p133cf4jsn748ec0cc2545",
        "X-RapidAPI-Host": "genius-song-lyrics1.p.rapidapi.com"
    }

    response = requests.get(url, headers=headers, params=querystring)
    data = response.json()

    # Check if the lyrics are present in the API response
    if "lyrics" in data and "lyrics" in data["lyrics"]:
        html_lyrics = data["lyrics"]["lyrics"]["body"]["html"]

        # Remove HTML tags from the lyrics using BeautifulSoup
        soup = BeautifulSoup(html_lyrics, "html.parser")
        lyrics = soup.get_text()

        # Print the song title and lyrics of the song
        print(f"Song Title: {song_title}")
        print("Lyrics of the Song:")
        print(lyrics)

        # Sanitize the song title for file naming
        sanitized_title = sanitize_filename(song_title)

        # Find the next available file number
        file_number = 1
        while True:
            file_name = f"lyrics_{file_number}.json"
            try:
                # Try opening the file to check if it already exists
                with open(file_name, "r"):
                    file_number += 1
            except FileNotFoundError:
                # If the file doesn't exist, use this filename
                with open(file_name, "w") as json_file:
                    json.dump({"title": song_title, "lyrics": lyrics}, json_file)
                print(f"Lyrics saved to {file_name}.")
                break

    else:
        print("Lyrics Not Found")

url = "https://genius-song-lyrics1.p.rapidapi.com/search/"
headers = {
    "X-RapidAPI-Key": "10317f81eemshff864e28c131be0p133cf4jsn748ec0cc2545",
    "X-RapidAPI-Host": "genius-song-lyrics1.p.rapidapi.com"
}

# Get user input for the song title
song_title = input("Enter the song title: ")

querystring = {
    "q": song_title,
    "per_page": "5",
    "page": "1"
}

response = requests.get(url, headers=headers, params=querystring)

# Check if the request was successful
if response.status_code == 200:
    data = response.json()
    hits = data.get("hits", [])

    if hits:
        print("Search Results:")
        for idx, hit in enumerate(hits, start=1):
            song = hit.get("result", {}).get("title")
            artist = hit.get("result", {}).get("primary_artist", {}).get("name")
            print(f"{idx}. Song: {song}, Artist: {artist}")

        # Prompt the user to select a result
        selection = input("Enter the number of the result you want to use: ")

        # Convert the input to an integer and check if it's a valid selection
        try:
            selection = int(selection)
            if 1 <= selection <= len(hits):
                selected_result = hits[selection - 1]
                song_id = selected_result.get("result", {}).get("id")
                print(f"Selected Song ID: {song_id}")

                # Call the function to get and save the lyrics with the appropriate file name
                get_lyrics_by_id(song_id, song_title)
            else:
                print("Invalid selection. Please choose a valid number.")
        except ValueError:
            print("Invalid input. Please enter a number.")
    else:
        print("No results found for the given song title.")
else:
    print("Failed to retrieve data. Please try again later.")



Enter the song title:  Harper Valley PTA


Search Results:
1. Song: Harper Valley P.T.A., Artist: Jeannie C. Riley
2. Song: Harper Valley P.T.A., Artist: Dolly Parton
3. Song: Harper Valley PTA, Artist: Billy Ray Cyrus
4. Song: Harper Valley P.T.A., Artist: Loretta Lynn
5. Song: Harper Valley P.T.A., Artist: Lynn Anderson


Enter the number of the result you want to use:  1


Selected Song ID: 149340
Song Title: Harper Valley PTA
Lyrics of the Song:
[Verse 1]
I wanna tell you all a story
About a Harper Valley widowed wife
Who had a teenage daughter
Who attended Harper Valley Junior High
Well, her daughter came home one afternoon
And didn't even stop to play
And she said, "Mom, I got a note here
From the Harper Valley PTA"

[Verse 2]
Well the note said, "Mrs. Johnson
You're wearin' your dresses way too high
It's reported you've been drinking
And a-running round with men and goin' wild
And we don't believe you oughta be
A-bringin' up your little girl this way"
And it was signed by the Secretary
Harper Valley PTA

[Verse 3]
Well, it happened that the PTA
Was gonna meet that very afternoon
And they were sure surprised when Mrs. Johnson
Wore her miniskirt into the room
And as she walked up to the blackboard
I can still recall the words she had to say
She said, "I'd like to address this meeting
Of the Harper Valley PTA"

[Verse 4]
"Well, there's Bobby Taylor sitt

4. Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

## Question 4
### Lyrics_1

In [None]:
import json
import spacy
from textblob import TextBlob

def perform_sentiment_analysis(filename):
    try:
        # Read the contents of the JSON file
        with open(filename, "r") as json_file:
            data = json.load(json_file)

        # Extract the song title and lyrics from the JSON data
        song_title = data.get("title")
        lyrics = data.get("lyrics")

        # Print the song title and lyrics of the song
        print(f"Song Title: {song_title}")
        print("Lyrics of the Song:")
        print(lyrics)

        # Load spaCy model
        nlp = spacy.load("en_core_web_sm")

        # Perform sentiment analysis using TextBlob
        blob = TextBlob(lyrics)
        polarity_score = blob.sentiment.polarity

        # Print the polarity score of the sentiment analysis
        print("Polarity Score:", polarity_score)

        return song_title, polarity_score

    except FileNotFoundError:
        print("File not found. Please enter a valid filename.")
        return None, None
    except Exception as e:
        print("An error occurred:", str(e))
        return None, None

# Get user input for the filename
file_name = input("Enter the filename: ")

# Call the function to perform sentiment analysis using the provided filename
song_title, polarity_score = perform_sentiment_analysis(file_name)

if song_title is not None and polarity_score is not None:
    print(f"Sentiment Analysis Result for '{song_title}':")
    print("Given that the scale is -1.0 to 1.0, this particular song seems:")
    if polarity_score < -0.5:
        print("Strongly negative.")
    elif polarity_score < 0:
        print("Negative.")
    elif polarity_score == 0:
        print("Neutral.")
    elif polarity_score <= 0.5:
        print("Positive.")
    else:
        print("Strongly positive.")


Enter the filename:  lyrics_1.json


Song Title: Lady May
Lyrics of the Song:
[Verse 1]
I'm a stone's throw from the mill
And I'm a good walk to the river
When my workin' day is over
We'll go swim our cares away
Put your toes down in the water
And a smile across your face
And tell me that you love me
Lovely Lady May

[Verse 2]
Now I ain't the sharpest chisel
That your hands have ever held
But darlin' I could love you well
'Til the roll is called on high
I've seen my share of trouble
And I've held my weight in shame
But I'm baptized in your name
Lovely Lady May

[Verse 3]
Lord the wind can leave you shiverin'
As it waltzes o'er the leaves
It's been rushin' through my timber
'Til your love brought on the spring
Now the mountains all are blushin'
And they don't know what to say
'Cept a good long line of praises
For my lovely Lady May

[Verse 4]
Now I ain't the toughest hickory
That your ax has ever fell
But I'm a hickory just as well
I'm a hickory all the same
I came crashin' through the forest
As you cut my roots away
And I

I think the sentiment analysis matches my interpretation of the song. It's positive, it's a love song but some of the imagery isn't necessarily happy. 

### Lyrics_2

In [3]:
import json
import spacy
from textblob import TextBlob

def perform_sentiment_analysis(filename):
    try:
        # Read the contents of the JSON file
        with open(filename, "r") as json_file:
            data = json.load(json_file)

        # Extract the song title and lyrics from the JSON data
        song_title = data.get("title")
        lyrics = data.get("lyrics")

        # Print the song title and lyrics of the song
        print(f"Song Title: {song_title}")
        print("Lyrics of the Song:")
        print(lyrics)

        # Load spaCy model
        nlp = spacy.load("en_core_web_sm")

        # Perform sentiment analysis using TextBlob
        blob = TextBlob(lyrics)
        polarity_score = blob.sentiment.polarity

        # Print the polarity score of the sentiment analysis
        print("Polarity Score:", polarity_score)

        return song_title, polarity_score

    except FileNotFoundError:
        print("File not found. Please enter a valid filename.")
        return None, None
    except Exception as e:
        print("An error occurred:", str(e))
        return None, None

# Get user input for the filename
file_name = input("Enter the filename: ")

# Call the function to perform sentiment analysis using the provided filename
song_title, polarity_score = perform_sentiment_analysis(file_name)

if song_title is not None and polarity_score is not None:
    print(f"Sentiment Analysis Result for '{song_title}':")
    print("Given that the scale is -1.0 to 1.0, this particular song seems:")
    if polarity_score < -0.5:
        print("Strongly negative.")
    elif polarity_score < 0:
        print("Negative.")
    elif polarity_score == 0:
        print("Neutral.")
    elif polarity_score <= 0.5:
        print("Positive.")
    else:
        print("Strongly positive.")


Enter the filename:  lyrics_2.json


Song Title: Turtles All The Way Down
Lyrics of the Song:
[Spoken Intro]

I've seen Jesus play with flames
In a lake of fire that I was standing in
Met the devil in Seattle
And spent 9 months inside the lions den
Met Buddha yet another time
And he showed me a glowing light within
But I swear that God is there
Every time I glare in the eyes of my best friend

Says my son, "It's all been done
And someday you're gonna wake up old and gray
So go and try to have some fun
Showing warmth to everyone
You meet and greet and cheat along the way"

There's a gateway in our minds
That leads somewhere out there, far beyond this plane
Where reptile aliens made of light
Cut you open and pull out all your pain
Tell me how you make illegal
Something that we all make in our brain
Some say you might go crazy
But then again it might make you go sane

Every time I take a look
Inside that old and fabled book
I'm blinded and reminded of
The pain caused by some old man in the sky
Marijuana, LSD
Psilocybin, and 

The sentiment analysis is positive, I'm not sure I totally agree. I think this song is a solid mix of positive and negative but I can see where it might skew positive. The polarity score isn't that high so I guess my interpretation matches. 

### Lyrics_3

In [4]:
import json
import spacy
from textblob import TextBlob

def perform_sentiment_analysis(filename):
    try:
        # Read the contents of the JSON file
        with open(filename, "r") as json_file:
            data = json.load(json_file)

        # Extract the song title and lyrics from the JSON data
        song_title = data.get("title")
        lyrics = data.get("lyrics")

        # Print the song title and lyrics of the song
        print(f"Song Title: {song_title}")
        print("Lyrics of the Song:")
        print(lyrics)

        # Load spaCy model
        nlp = spacy.load("en_core_web_sm")

        # Perform sentiment analysis using TextBlob
        blob = TextBlob(lyrics)
        polarity_score = blob.sentiment.polarity

        # Print the polarity score of the sentiment analysis
        print("Polarity Score:", polarity_score)

        return song_title, polarity_score

    except FileNotFoundError:
        print("File not found. Please enter a valid filename.")
        return None, None
    except Exception as e:
        print("An error occurred:", str(e))
        return None, None

# Get user input for the filename
file_name = input("Enter the filename: ")

# Call the function to perform sentiment analysis using the provided filename
song_title, polarity_score = perform_sentiment_analysis(file_name)

if song_title is not None and polarity_score is not None:
    print(f"Sentiment Analysis Result for '{song_title}':")
    print("Given that the scale is -1.0 to 1.0, this particular song seems:")
    if polarity_score < -0.5:
        print("Strongly negative.")
    elif polarity_score < 0:
        print("Negative.")
    elif polarity_score == 0:
        print("Neutral.")
    elif polarity_score <= 0.5:
        print("Positive.")
    else:
        print("Strongly positive.")

Enter the filename:  Lyrics_3.json


Song Title: Ode to Billy Joe
Lyrics of the Song:
It was the third of June
Another sleepy, dusty Delta day
I was out chopping cotton
And my brother was baling hay
And at dinner time we stopped
And walked back to the house to eat

And Mama hollered out the back door
Y'all remember to wipe your feet
And then she said
I got some news this morning from Choctaw Ridge
Today, Billy Joe MacAllister
Jumped off the Tallahatchie Bridge

And Papa said to Mama
As he passed around the black eyed peas
Well, Billy Joe never had a lick of sense
Pass the biscuits, please
There's five more acres in the lower forty I've got to plow

And Mama said it was shame about Billy Joe, anyhow
Seems like nothing ever comes
To no good up on Choctaw Ridge
And now Billy Joe MacAllister's jumped off the Tallahatchie Bridge

And Brother said he recollected
When he and Tom and Billie Joe
Put a frog down my back
At the Carroll County picture show

And wasn't I talking to him
After church last Sunday night?
I'll have another

I don't agree with the polarity score at all. I think it's a really negative song with some pretty dark imagery if you think about it. What makes me curious is if the analysis is picking up the nuance that jumped off the bridge implies suicide. I could see a tiny bit of positive interpretation as the polarity score suggests if you don't understand that Billy Joe is dead which the song never implicitly states.

### Lyrics _4

In [5]:
import json
import spacy
from textblob import TextBlob

def perform_sentiment_analysis(filename):
    try:
        # Read the contents of the JSON file
        with open(filename, "r") as json_file:
            data = json.load(json_file)

        # Extract the song title and lyrics from the JSON data
        song_title = data.get("title")
        lyrics = data.get("lyrics")

        # Print the song title and lyrics of the song
        print(f"Song Title: {song_title}")
        print("Lyrics of the Song:")
        print(lyrics)

        # Load spaCy model
        nlp = spacy.load("en_core_web_sm")

        # Perform sentiment analysis using TextBlob
        blob = TextBlob(lyrics)
        polarity_score = blob.sentiment.polarity

        # Print the polarity score of the sentiment analysis
        print("Polarity Score:", polarity_score)

        return song_title, polarity_score

    except FileNotFoundError:
        print("File not found. Please enter a valid filename.")
        return None, None
    except Exception as e:
        print("An error occurred:", str(e))
        return None, None

# Get user input for the filename
file_name = input("Enter the filename: ")

# Call the function to perform sentiment analysis using the provided filename
song_title, polarity_score = perform_sentiment_analysis(file_name)

if song_title is not None and polarity_score is not None:
    print(f"Sentiment Analysis Result for '{song_title}':")
    print("Given that the scale is -1.0 to 1.0, this particular song seems:")
    if polarity_score < -0.5:
        print("Strongly negative.")
    elif polarity_score < 0:
        print("Negative.")
    elif polarity_score == 0:
        print("Neutral.")
    elif polarity_score <= 0.5:
        print("Positive.")
    else:
        print("Strongly positive.")


Enter the filename:  Lyrics_4.json


Song Title: Harper Valley PTA
Lyrics of the Song:
[Verse 1]
I wanna tell you all a story
About a Harper Valley widowed wife
Who had a teenage daughter
Who attended Harper Valley Junior High
Well, her daughter came home one afternoon
And didn't even stop to play
And she said, "Mom, I got a note here
From the Harper Valley PTA"

[Verse 2]
Well the note said, "Mrs. Johnson
You're wearin' your dresses way too high
It's reported you've been drinking
And a-running round with men and goin' wild
And we don't believe you oughta be
A-bringin' up your little girl this way"
And it was signed by the Secretary
Harper Valley PTA

[Verse 3]
Well, it happened that the PTA
Was gonna meet that very afternoon
And they were sure surprised when Mrs. Johnson
Wore her miniskirt into the room
And as she walked up to the blackboard
I can still recall the words she had to say
She said, "I'd like to address this meeting
Of the Harper Valley PTA"

[Verse 4]
"Well, there's Bobby Taylor sittin' there
And seven times

I've always thought of this song as positive just based on the singer being the narrator and the daughter of the woman who's the subject of the song. In my opinion she's proud of her mother for standing up to the PTA members putting her down. The sentiment analysis didn't interprety it very positively.
