# Web Mining and Applied NLP (CSIS 44-620)

## P4: Employ Requests, JSON, NLP & Engage

### 
Author: Data-Git-Hub <br>
GitHub Project Repository Link: https://github.com/Data-Git-Hub/Pyplot <br>
6 July 2025 <br>

### Introduction
In this project, I explore how to interact with web APIs, process JSON responses, and perform sentiment analysis using natural language processing (NLP) techniques in Python. By leveraging tools such as `requests`, `json`, and the `spaCy` library with the `spacytextblob` extension, I demonstrate how to fetch text data—such as song lyrics or poems—from online sources, analyze sentiment, and store results in structured JSON files. This project highlights essential data analytics skills including API interaction, text processing, and writing reusable functions. All work is conducted within a Jupyter Notebook and version-controlled using GitHub, with final outputs exported to HTML to ensure accessibility and reproducibility. <br>

### Imports
Python libraries are collections of pre-written code that provide specific functionalities, making programming more efficient and reducing the need to write code from scratch. These libraries cover a wide range of applications, including data analysis, machine learning, web development, and automation. Some libraries, such as os, sys, math, json, and datetime, come built-in with Python as part of its standard library, providing essential functions for file handling, system operations, mathematical computations, and data serialization. Other popular third-party libraries, like `pandas`, `numpy`, `matplotlib`, `seaborn`, and `scikit-learn`, must be installed separately and are widely used in data science and machine learning. The extensive availability of libraries in Python's ecosystem makes it a versatile and powerful programming language for various domains. <br>

`ipykernel` allows Jupyter Notebooks to run Python code by providing the kernel interface used to execute cells and handle communication between the front-end and the Python interpreter. <br>
https://ipykernel.readthedocs.io/en/latest/ <br>

The `json` module provides functions to parse JSON strings and convert Python objects to JSON format, enabling easy exchange of data with web APIs. <br>
https://docs.python.org/3/library/json.html <br>

The `requests` library simplifies making HTTP requests in Python, allowing you to send GET, POST, and other types of requests to interact with APIs or web services. <br>
 https://docs.python-requests.org/en/latest/ <br>

`spaCy` is an advanced NLP library for Python that provides tools for tokenization, part-of-speech tagging, named entity recognition, and more, using pre-trained pipelines. <br>
https://spacy.io/ <br>

`spacytextblob` is a plugin for spaCy that adds sentiment analysis capabilities by integrating TextBlob's polarity and subjectivity scores into spaCy’s pipeline. <br>
https://github.com/AndrewIbrahim/spacy-textblob <br>

In [12]:
import requests
import lyricsgenius
import json


### Task
Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository. <br>

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob) <br>

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question. <br>

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well. <br>

#### Section 1. Accessing and Storing Song Lyrics Using a Public API
The following code accesses the [lyrics.ovh](https://lyricsovh.docs.apiary.io/#reference/0/lyrics-of-a-song/search) public api, searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

In [15]:
# Use your Genius access token
genius = lyricsgenius.Genius("ZrtG7V89LbOxHkAOCs_LzajRANkD1K8k8uV6W4OwjmWQd8WL2tmKsXItmaEH04kr")

# Search for the song
song = genius.search_song("Conditions of My Parole", "Puscifer")

# Check if song was found
if song:
    print("Lyrics preview:\n")
    print(song.lyrics[:500])  # Preview the first 500 characters

    # Save lyrics to a JSON file
    lyrics_data = {"lyrics": song.lyrics}
    with open("puscifer_conditions_of_my_parole_lyrics.json", "w", encoding="utf-8") as f:
        json.dump(lyrics_data, f, indent=4, ensure_ascii=False)

    print("\nLyrics saved to 'puscifer_conditions_of_my_parole_lyrics.json'")
else:
    print("Lyrics not found.")


Searching for "Conditions of My Parole" by Puscifer...
Done.
Lyrics preview:

20 ContributorsConditions of My Parole Lyrics[Verse 1]
Sweet baby Jesus on fire
I'ma need a damn lawyer and a miracle
To pull my ass out of this
Devil kept pokin' the bull
So I shipped her ass to Mozambique
'Cause I was over it

[Verse 2]
Shoulda dumped my gat into the Verde
But what if she's a zombie or a Dracula?
I better hang on to this
Lordy with my hand upon the Bible
Swear I shot the damn devil, not a bitch
But the po-po don't give a shit

[Verse 3]
Lordy, won't you show a little mercy?
I'

Lyrics saved to 'puscifer_conditions_of_my_parole_lyrics.json'


#### Section 2. 
Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

#### Section 3. 
Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

#### Section 4. 
Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.