# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

### Student Name: Kyle Hudson (https://github.com/cnk77)

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

1. The following code accesses the [lyrics.ovh](https://lyricsovh.docs.apiary.io/#reference/0/lyrics-of-a-song/search) public api, searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

## Question 1 Solution

In [12]:
# Due to persistent 504 Gateway Timeout errors from the lyrics.ovh API, I used the Genius API via RapidAPI as a reliable alternative. 
# The structure and logic of the solution remain consistent with the original requirements

import requests
import json
from bs4 import BeautifulSoup

# RapidAPI key required for this API
api_key = "df8f6060f5mshd26e83f3f243051p1bfbf8jsn8c7a692fce7b"

# Define the song to search
song_query = "Birdhouse in Your Soul"

# Step 1: Search for the song
search_url = "https://genius-song-lyrics1.p.rapidapi.com/search/"
search_headers = {
    "X-RapidAPI-Key": api_key,
    "X-RapidAPI-Host": "genius-song-lyrics1.p.rapidapi.com"
}
search_params = {"q": song_query}

search_response = requests.get(search_url, headers=search_headers, params=search_params)
search_data = search_response.json()

# Step 2: Extract song ID
hits = search_data.get("hits", [])
if not hits:
    print("No results found.")
else:
    song_id = hits[0]["result"]["id"]

    # Step 3: Get lyrics
    lyrics_url = "https://genius-song-lyrics1.p.rapidapi.com/song/lyrics/"
    lyrics_params = {"id": song_id}
    lyrics_response = requests.get(lyrics_url, headers=search_headers, params=lyrics_params)
    lyrics_data = lyrics_response.json()

    # Step 4: Extract and clean lyrics
    html_lyrics = lyrics_data.get("lyrics", {}).get("lyrics", {}).get("body", {}).get("html", "")
    soup = BeautifulSoup(html_lyrics, "html.parser")
    clean_lyrics = soup.get_text(separator="\n")

    # Step 5: Store in dictionary and save to file
    lyrics_dict = {
        "song": song_query,
        "lyrics": clean_lyrics
    }

    with open("lyrics.json", "w", encoding="utf-8") as f:
        json.dump(lyrics_dict, f, ensure_ascii=False, indent=4)

    print("Lyrics saved to lyrics.json")


Lyrics saved to lyrics.json


2. Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

## Question 2 Solution

In [13]:
import json
import spacy

# Make sure spacytextblob is installed and registered
import spacytextblob

# Load the lyrics from the JSON file
with open("lyrics.json", "r", encoding="utf-8") as f:
    lyrics_data = json.load(f)

lyrics_text = lyrics_data.get("lyrics", "")

# Print only the lyrics
print("\n" + "="*40)
print(f"Lyrics for: {lyrics_data.get('song', 'Unknown Song')}")
print("="*40)
print(lyrics_text)

# Load spaCy and add the TextBlob component by name
nlp = spacy.load("en_core_web_sm")
if "spacytextblob" not in nlp.pipe_names:
    nlp.add_pipe("spacytextblob", last=True)  # ✅ Use the registered name

# Analyze sentiment
doc = nlp(lyrics_text)
polarity = doc._.polarity

print("\nPolarity Score:", polarity)

# Interpret the result
# Based on the polarity score, which ranges from -1.0 (negative) to 1.0 (positive),
# we can determine the overall sentiment of the lyrics.
if polarity > 0:
    print("# The lyrics have a more positive connotation.")
elif polarity < 0:
    print("# The lyrics have a more negative connotation.")
else:
    print("# The lyrics are neutral in tone.")








Lyrics for: Birdhouse in Your Soul
[Bridge]


I'm your only friend

I'm not your only friend

But I'm a little glowing friend

But really I'm not actually your friend

But I am



[Chorus]


Blue canary in the outlet by the light switch


Who watches over you


Make a little birdhouse in your soul


Not to put too fine a point on it


Say I'm the only bee in your bonnet


Make a little birdhouse in your soul



[Verse 1]


I have a secret to tell

From my electrical well


It's a simple message and I'm

Leaving out the whistles and bells


So the room must listen to me

Filibuster vigilantly


My name is blue canary

One note spelled l-i-t-e


My story's infinite

Like the Longines Symphonette

It doesn't rest



[Chorus]


Blue canary in the outlet by the light switch


Who watches over you


Make a little birdhouse in your soul


Not to put too fine a point on it


Say I'm the only bee in your bonnet

Make a little birdhouse in your soul



[Bridge]


I'm your only friend

I'm not y

3. Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

## Question 3 Solution

In [14]:
import requests
import json
from bs4 import BeautifulSoup

# RapidAPI key required for this API
api_key = "df8f6060f5mshd26e83f3f243051p1bfbf8jsn8c7a692fce7b"

def fetch_and_save_lyrics(artist, song, filename):
    query = f"{artist} {song}"
    search_url = "https://genius-song-lyrics1.p.rapidapi.com/search/"
    headers = {
        "X-RapidAPI-Key": api_key,
        "X-RapidAPI-Host": "genius-song-lyrics1.p.rapidapi.com"
    }
    search_params = {"q": query}
    
    search_response = requests.get(search_url, headers=headers, params=search_params)
    search_data = search_response.json()
    
    hits = search_data.get("hits", [])
    if not hits:
        print(f"No results found for {query}")
        return
    
    song_id = hits[0]["result"]["id"]
    
    lyrics_url = "https://genius-song-lyrics1.p.rapidapi.com/song/lyrics/"
    lyrics_params = {"id": song_id}
    lyrics_response = requests.get(lyrics_url, headers=headers, params=lyrics_params)
    lyrics_data = lyrics_response.json()
    
    html_lyrics = lyrics_data.get("lyrics", {}).get("lyrics", {}).get("body", {}).get("html", "")
    soup = BeautifulSoup(html_lyrics, "html.parser")
    clean_lyrics = soup.get_text(separator="\n")
    
    lyrics_dict = {
        "artist": artist,
        "song": song,
        "lyrics": clean_lyrics
    }
    
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(lyrics_dict, f, ensure_ascii=False, indent=4)
    
    print(f"Lyrics for '{song}' by {artist} saved to {filename}")

# Test the function with the four specified songs
fetch_and_save_lyrics("Wilco", "Theologians", "theologians_lyrics.json")
fetch_and_save_lyrics("R.E.M.", "Stand", "stand_lyrics.json")
fetch_and_save_lyrics("George Michael", "Careless Whisper", "careless_whisper_lyrics.json")
fetch_and_save_lyrics("Taylor Swift", "Welcome to New York", "welcome_to_new_york_lyrics.json")



Lyrics for 'Theologians' by Wilco saved to theologians_lyrics.json
Lyrics for 'Stand' by R.E.M. saved to stand_lyrics.json
Lyrics for 'Careless Whisper' by George Michael saved to careless_whisper_lyrics.json
Lyrics for 'Welcome to New York' by Taylor Swift saved to welcome_to_new_york_lyrics.json


4. Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

## Question 4 Solution

In [15]:
import json
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

# Load spaCy model and add TextBlob component
nlp = spacy.load("en_core_web_sm")
if "spacytextblob" not in nlp.pipe_names:
    nlp.add_pipe("spacytextblob", last=True)

# Function to analyze sentiment from a lyrics file
def analyze_lyrics_sentiment(filename):
    with open(filename, "r", encoding="utf-8") as f:
        data = json.load(f)
    lyrics = data.get("lyrics", "")
    song = data.get("song", "Unknown Song")
    doc = nlp(lyrics)
    polarity = doc._.polarity
    return song, polarity

# List of files to analyze
files = [
    "theologians_lyrics.json",
    "stand_lyrics.json",
    "careless_whisper_lyrics.json",
    "welcome_to_new_york_lyrics.json"
]

# Analyze and print polarity scores
for file in files:
    song_name, score = analyze_lyrics_sentiment(file)
    print(f"Polarity Score for '{song_name}': {score}")


Polarity Score for 'Theologians': -0.03430555555555556
Polarity Score for 'Stand': 0.09039772727272725
Polarity Score for 'Careless Whisper': 0.09062500000000002
Polarity Score for 'Welcome to New York': 0.42042780748663106


### Sentiment Analysis Reflection

**Theologians by Wilco**

This is a song I really like and was surpised to see a negative sentiment. I see it as a song about understanding your place in the spiritual world and defining yourself on your own terms. I find that to be a positive rather than negative.

**Stand by REM**

This result makes sense to me. I think it is a mainly postive song but the lyrics are a bit cryptic to fully understand.

**Careless Whisper by George Micheal**

I'm surprised that this registers positive. I've always viewed it as a song about lost love and regret.

**Welcome to New York by Taylor Swift**

This song is about hope in a new setting, I agree with the strongly positive result.