# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

### Student Name: Matt Goeckel
https://github.com/GeckoG/json-sentiment

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

## Question 1

1. The following code accesses the [lyrics.ovh](https://lyricsovh.docs.apiary.io/#reference/0/lyrics-of-a-song/search) public api, searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

In [64]:
import requests
import json

AUTHOR='Ralph Waldo Emerson'
POEM = 'Mithridates'

#only certain poets and titles are available
#to see the available poets, go to (in a web browser)
# https://poetrydb.org/author
#To see which poems that author has available, go to 
# https://poetrydb.org/author/AUTHOR NAME
# e.g.: https://poetrydb.org/author/Edgar Allan Poe
#The spaces will get handled by your web browser

# A cool pythonism (introduced in Python 3): f strings
# https://docs.python.org/3/tutorial/inputoutput.html#tut-f-strings


URL = f'https://poetrydb.org/author,title/{AUTHOR};{POEM}'
result = json.loads(requests.get(URL).text)
poem = '\n'.join(result[0]['lines'])
print(poem)

with open('poem.txt', 'w') as file:
    file.write(poem)
    #print("success")

I cannot spare water or wine,
Tobacco-leaf, or poppy, or rose;
From the earth-poles to the Line,
All between that works or grows,
Every thing is kin of mine.

Give me agates for my meat,
Give me cantharids to eat,
From air and ocean bring me foods,
From all zones and altitudes.

From all natures, sharp and slimy,
Salt and basalt, wild and tame,
Tree, and lichen, ape, sea-lion,
Bird and reptile be my game.

Ivy for my fillet band,
Blinding dogwood in my hand,
Hemlock for my sherbet cull me,
And the prussic juice to lull me,
Swing me in the upas boughs,
Vampire-fanned, when I carouse.

Too long shut in strait and few,
Thinly dieted on dew,
I will use the world, and sift it,
To a thousand humors shift it,
As you spin a cherry.
O doleful ghosts, and goblins merry,
O all you virtues, methods, mights;
Means, appliances, delights;
Reputed wrongs, and braggart rights;
Smug routine, and things allowed;
Minorities, things under cloud!
Hither! take me, use me, fill me,
Vein and artery, though ye 

## Question 2

2. Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

In [65]:
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

nlp = spacy.load('en_core_web_sm')
nlp.add_pipe('spacytextblob')
doc = nlp(poem)
polarity = doc._.blob.polarity                               # Polarity: -0.13411458333333334
subjectivity = doc._.blob.subjectivity                       # Subjectivity: 0.5083333333333333
assessments = doc._.blob.sentiment_assessments.assessments   
ngrams = doc._.blob.ngrams()
print(polarity)

## This polarity score of -0.13 looks fairly neutral. The example text from our reading had a similar
## score of -0.125 and it had both positive and negative sentences, ultimately balancing out to what
## I would consider to be almost neutral with a slight bit of negativity, just as the polarity suggests.
## With this score being nearly the same, I would say it also is almost neutral with a slight bit of
## negativity.

-0.13411458333333334


## Question 3

3. Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

In [66]:
def getPoem(author, title, filename):
    URL = f'https://poetrydb.org/author,title/{author};{title}'
    result = json.loads(requests.get(URL).text)
    poem = '\n'.join(result[0]['lines'])

    with open(f'{filename}.txt', 'w') as file:
        file.write(poem)
        print("success")

getPoem('Amy Levy', 'The Dream', 'thedream')
getPoem('Thomas Moore', 'Though the Last Glimpse of Erin With Sorrow I See', 'mooresad')
getPoem('Sir Walter Scott', 'Harp of the North, Farewell!', 'magicpoem')


success
success
success


## Question 4

4. Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

In [68]:
def sentimentAnalyze(filename):
    with open(f'{filename}.txt', 'r') as file:
        #print(file.read())
        nlp = spacy.load('en_core_web_sm')
        nlp.add_pipe('spacytextblob')
        doc = nlp(file.read())
        return doc._.blob.polarity                           

print('The Dream: ' + str(sentimentAnalyze('thedream')))
print('Though the Last Glimpse of Erin With Sorrow I See: ' + str(sentimentAnalyze('mooresad')))
print('Harp of the North, Farewell!: ' + str(sentimentAnalyze('magicpoem')))

The Dream: 0.07712121212121213
Though the Last Glimpse of Erin With Sorrow I See: -0.08636363636363635
Harp of the North, Farewell!: -0.07067901234567903


I chose these three poems because I thought them to be happy, sad, and neutral. I expected to see a significantly positive polarity for "The Dream" and negative for "Though the Last...". However, that really wasn't the case. In all three instances, the sentiment was fairly neutral. Yes, "The Dream" was the most positive, "Though the Last..." was most negative, and "Harp of the North, Farewell" was the closest to neutral, but not to the extent I was expecting.

I think the reasoning for this may be the old English vocabulary used by the poets. Spacytextblob is likely not trained on old English, so it doesn't know what to think of many of the words that are used. While it may pick up on a few things, it can't capture the full sentiment due to the lack of training on texts that use this kind of speech. This would explain the order being correct, but not the magnitude of each.