# Web Mining and Applied NLP (44-620)
## Requests, JSON, and NLP
### Student Name: Jacob Sellinger

Perform the tasks described in the Markdown cells below. When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have installed spaCy and its pipeline and spaCyTextBlob

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (.py), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

### Instructions

- At the end we will turn this into an HTML use jupyter nbconvert --to html notebook_name.ipynb

- pip install SpaCy and SpaCyTextBlob and requests (All in the requirements file)

-Make sure to activate the virtual environment with 
.venv\Scripts\Activate.ps1

Also check you do not have an environment and interpreter mismatch - I do this way too often

In [None]:
import sys
import os

print("--- Python Environment Diagnostics ---")
print("Python Executable:", sys.executable)
print("Python Version:", sys.version)
print("PYTHONPATH (Environment Variable):", os.environ.get('PYTHONPATH', 'Not set'))
print("sys.path (Interpreter's Search Path):")
for p in sys.path:
    print(f"  - {p}")
print("--- End Diagnostics ---\n")

### Question 1

1. The following code accesses the lyrics.ovh public api, searches for the lyrics of a song, and stores it in a dictionary object. Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

In [9]:
import requests
import json

AUTHOR='Edgar Allan Poe'
POEM = 'A Dream Within A Dream'

#only certain poets and titles are available
#to see the available poets, go to (in a web browser)
# https://poetrydb.org/author
#To see which poems that author has available, go to 
# https://poetrydb.org/author/AUTHOR NAME
# e.g.: https://poetrydb.org/author/Edgar Allan Poe
#The spaces will get handled by your web browser

# https://docs.python.org/3/tutorial/inputoutput.html#tut-f-strings
URL = f'https://poetrydb.org/author,title/{AUTHOR};{POEM}'

#Error Handling
response = requests.get(URL)
print(response)
#Raises an exception for status codes such as 404
response.raise_for_status()

#Poem Extraction
result = json.loads(requests.get(URL).text)
poem = '\n'.join(result[0]['lines']) 

print(result)
print(poem)

#Dump everything into a JSON file
file_name = f"{AUTHOR.replace(' ', '_')}_{POEM.replace(' ', '_')}.json"

try:
    with open(file_name, 'w', encoding='utf-8') as f:
        json.dump(result, f, indent=4, ensure_ascii=False)
    print(f"\nSuccessfully wrote the JSON data to '{file_name}'")
except IOError as e:
    print(f"Error writing to file '{file_name}': {e}")


<Response [200]>
[{'title': 'A Dream Within A Dream', 'author': 'Edgar Allan Poe', 'lines': ['Take this kiss upon the brow!', 'And, in parting from you now,', 'Thus much let me avow--', 'You are not wrong, who deem', 'That my days have been a dream:', 'Yet if hope has flown away', 'In a night, or in a day,', 'In a vision or in none,', 'Is it therefore the less _gone_?', '_All_ that we see or seem', 'Is but a dream within a dream.', '', 'I stand amid the roar', 'Of a surf-tormented shore,', 'And I hold within my hand', 'Grains of the golden sand--', 'How few! yet how they creep', 'Through my fingers to the deep', 'While I weep--while I weep!', 'O God! can I not grasp', 'Them with a tighter clasp?', 'O God! can I not save', '_One_ from the pitiless wave?', 'Is _all_ that we see or seem', 'But a dream within a dream?'], 'linecount': '24'}]
Take this kiss upon the brow!
And, in parting from you now,
Thus much let me avow--
You are not wrong, who deem
That my days have been a dream:
Yet if 

### Question 2

2. Read in the contents of your file. Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics. Print the polarity score of the sentiment analysis. Given that the range of the polarity score is [-1.0,1.0] which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion? Answer this question in a comment in your code cell.

In [10]:
#We'll take the JSON file and print the lyrics only

# Open the JSON file in read mode and load its content
with open(file_name, 'r', encoding='utf-8') as f:
    data = json.load(f)

# Access the list of lines from the loaded data
# Based on the PoetryDB structure, it's data[0]['lines']
lyrics_lines = data[0]['lines']

# Join the list of lines into a single string, separated by newlines
poem_text = '\n'.join(lyrics_lines)

# Print the extracted poem
print(poem_text)


Take this kiss upon the brow!
And, in parting from you now,
Thus much let me avow--
You are not wrong, who deem
That my days have been a dream:
Yet if hope has flown away
In a night, or in a day,
In a vision or in none,
Is it therefore the less _gone_?
_All_ that we see or seem
Is but a dream within a dream.

I stand amid the roar
Of a surf-tormented shore,
And I hold within my hand
Grains of the golden sand--
How few! yet how they creep
Through my fingers to the deep
While I weep--while I weep!
O God! can I not grasp
Them with a tighter clasp?
O God! can I not save
_One_ from the pitiless wave?
Is _all_ that we see or seem
But a dream within a dream?


### Question 3

3. Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename. Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

In [18]:
# We will simply re-use the code we had before and wrap it in a function

def retrieve_poem_lyrics(AUTHOR, POEM, PRINT_TEXT=True):
    #only certain poets and titles are available 
    # #to see the available poets, go to (in a web browser) 
    # https://poetrydb.org/author
    # #To see which poems that author has available, go to 
    # # https://poetrydb.org/author/AUTHOR NAME
    # # e.g.: https://poetrydb.org/author/Edgar Allan Poe
    # #The spaces will get handled by your web browser
    
    URL = f'https://poetrydb.org/author,title/{AUTHOR};{POEM}'
    #Error Handling
    response = requests.get(URL)

    if response.status_code != 200:
        print(response)
        print("There has been a response error")
        #Raises an exception for status codes such as 404
        # response.raise_for_status()
    else:
        #Poem Extraction
        result = json.loads(requests.get(URL).text)
        poem = '\n'.join(result[0]['lines']) 

        #Dump everything into a JSON file
        file_name = f"{AUTHOR.replace(' ', '_')}_{POEM.replace(' ', '_')}.json"

        try:
            with open(file_name, 'w', encoding='utf-8') as f:
                json.dump(result, f, indent=4, ensure_ascii=False)
            print(f"\nSuccessfully wrote the JSON data to '{file_name}'")
        except IOError as e:
            print(f"Error writing to file '{file_name}': {e}")
    if PRINT_TEXT == True:
        #We'll take the JSON file and print the lyrics only

        # Open the JSON file in read mode and load its content
        with open(file_name, 'r', encoding='utf-8') as f:
            data = json.load(f)

        # Access the list of lines from the loaded data
        # Based on the PoetryDB structure, it's data[0]['lines']
        lyrics_lines = data[0]['lines']

        # Join the list of lines into a single string, separated by newlines
        poem_text = '\n'.join(lyrics_lines)

        # Print the extracted poem
        print(poem_text)

def retrieve_poem_lyricsv2(author, poem_title, write_to_file=True, print_poem_text=True):
    """
    Retrieves poem lyrics from PoetryDB, optionally writes the full JSON
    response to a file, and optionally prints the poem text.

    Args:
        author (str): The author of the poem (e.g., 'Edgar Allan Poe').
        poem_title (str): The title of the poem (e.g., 'A Dream Within A Dream').
        write_to_file (bool, optional): If True, the raw JSON data will be
                                        written to a .json file. Defaults to True.
        print_poem_text (bool, optional): If True, the extracted poem text will
                                          be printed to the console. Defaults to True.

    Returns:
        str: The full text of the poem if successful, None otherwise.
    """

    URL = f'https://poetrydb.org/author,title/{author};{poem_title}'

    # Error Handling for the API request
    try:
        response = requests.get(URL)
        # Use .raise_for_status() for robust error checking
        # It handles 4xx (client errors) and 5xx (server errors)
        response.raise_for_status()

        # Poem Extraction - Use response.json() and avoid a second API call
        result = response.json()

        # Basic validation of the API response structure
        if not result or not isinstance(result, list) or not result[0] or 'lines' not in result[0]:
            print(f"Error: Could not find 'lines' in the API response for '{poem_title}' by '{author}'.")
            print(f"API response: {result}")
            return None # Indicate failure

        poem_text = '\n'.join(result[0]['lines'])

        # Optional file writing (controlled by write_to_file argument)
        if write_to_file:
            # Create a filesystem-friendly filename
            file_name = f"{author.replace(' ', '_')}_{poem_title.replace(' ', '_')}.json"
            try:
                with open(file_name, 'w', encoding='utf-8') as f:
                    json.dump(result, f, indent=4, ensure_ascii=False)
                print(f"\nSuccessfully wrote the JSON data to '{file_name}'")
            except IOError as e:
                print(f"Error writing to file '{file_name}': {e}")
                # You might choose to return None or propagate the error here
                pass # Continue if file write fails but poem text is available

        # Optional poem text printing (controlled by print_poem_text argument)
        if print_poem_text:
            print("\n--- Extracted Poem Text ---")
            print(poem_text)

        return poem_text # Return the poem text for external use

    except requests.exceptions.HTTPError as err:
        print(f"HTTP Error for {author} - {poem_title}: {err}")
        print(f"Response content: {response.text}")
        return None # Indicate failure
    except requests.exceptions.RequestException as err:
        print(f"Network or request error for {author} - {poem_title}: {err}")
        return None # Indicate failure
    except json.JSONDecodeError as err:
        print(f"JSON Decode Error for {author} - {poem_title}: {err}")
        print(f"Raw response text: {response.text}")
        return None # Indicate failure
    except Exception as err: # Catch any other unexpected errors
        print(f"An unexpected error occurred for {author} - {poem_title}: {err}")
        return None


retrieve_poem_lyricsv2('Edmund Spenser', 'Sonnet 54')
retrieve_poem_lyricsv2('Emma Lazarus', 'Success')
retrieve_poem_lyricsv2('Jane Taylor', 'The Apple-Tree')
retrieve_poem_lyricsv2('Emily Dickinson', 'Nature can do no more')


Successfully wrote the JSON data to 'Edmund_Spenser_Sonnet_54.json'

--- Extracted Poem Text ---
Of this worlds theatre in which we stay,
My love like the spectator ydly sits
Beholding me that all the pageants play,
Disguysing diversly my troubled wits.
Sometimes I joy when glad occasion fits,
And mask in myrth lyke to a comedy:
Soone after when my joy to sorrow flits,
I waile and make my woes a tragedy.
Yet she, beholding me with constant eye,
Delights not in my merth nor rues my smart:
But when I laugh she mocks, and when I cry
She laughs and hardens evermore her heart.
What then can move her? if nor merth nor mone,
She is no woman, but a senceless stone.

Successfully wrote the JSON data to 'Emma_Lazarus_Success.json'

--- Extracted Poem Text ---
Oft have I brooded on defeat and pain,
The pathos of the stupid, stumbling throng.
These I ignore to-day and only long
To pour my soul forth in one trumpet strain,
One clear, grief-shattering, triumphant song,
For all the victories of man'

"Nature can do no more\nShe has fulfilled her Dyes\nWhatever Flower fail to come\nOf other Summer days\nHer crescent reimburse\nIf other Summers be\nNature's imposing negative\nNulls opportunity --"

### Question 4

4. Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score. Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3. Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be? Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

In [37]:
import json
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

# Load a SpaCy English model and add the SpacyTextBlob pipeline component
# You might need to run: python -m spacy download en_core_web_sm
# And: pip install spacytextblob; python -m textblob.download_corpora
try:
    nlp = spacy.load('en_core_web_sm')
    nlp.add_pipe('spacytextblob')
except OSError:
    print("SpaCy 'en_core_web_sm' model not found. Please run: python -m spacy download en_core_web_sm")
    print("Also ensure 'spacytextblob' is installed: pip install spacytextblob")
    print("And TextBlob corpora downloaded: python -m textblob.download_corpora")
    exit() # Exit if model isn't loaded, as it's crucial for the function


def analyze_poem_sentiment(author, poem_title):
    """
    Performs sentiment analysis on a poem's text from a local JSON file
    using SpaCy and returns the sentiment score and poem title.

    Args:
        author (str): The author of the poem. Used to construct the filename.
        poem_title (str): The title of the poem. Used to construct the filename.

    Returns:
        tuple: A tuple containing (poem_title, sentiment_score) if successful,
               or (None, None) if the file cannot be read or processed.
    """
    file_name = f"{author.replace(' ', '_')}_{poem_title.replace(' ', '_')}.json"

    try:
        with open(file_name, 'r', encoding='utf-8') as f:
            data = json.load(f)

        # Extract the poem text, similar to your retrieve_poem_lyricsv2 function
        if not data or not isinstance(data, list) or not data[0] or 'lines' not in data[0]:
            print(f"Error: Could not find 'lines' in the JSON data for '{poem_title}' by '{author}'.")
            return None, None

        poem_text = '\n'.join(data[0]['lines'])

        # Perform sentiment analysis using SpaCy and SpacyTextBlob
        doc = nlp(poem_text)
        sentiment_score = doc._.polarity # Polarity ranges from -1.0 (negative) to 1.0 (positive)

        return poem_title, sentiment_score

    except FileNotFoundError:
        print(f"Error: JSON file '{file_name}' not found. Please ensure it was created.")
        return None, None
    except json.JSONDecodeError as e:
        print(f"Error decoding JSON from '{file_name}': {e}")
        return None, None
    except KeyError as e:
        print(f"Error: Missing expected key in JSON data ({e}) from '{file_name}'.")
        return None, None
    except Exception as e:
        print(f"An unexpected error occurred during sentiment analysis for '{file_name}': {e}")
        return None, None



ModuleNotFoundError: No module named 'spacy'