# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

### Student Name: Amber Speer

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

## Question 1

1. The following code accesses the [lyrics.ovh](https://lyricsovh.docs.apiary.io/#reference/0/lyrics-of-a-song/search) public api, searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

In [5]:
import requests
import json


AUTHOR='William Wordsworth'
POEM = 'I wandered lonely as a cloud'

#only certain poets and titles are available
#to see the available poets, go to (in a web browser)
# https://poetrydb.org/author
#To see which poems that author has available, go to 
# https://poetrydb.org/author/AUTHOR NAME
# e.g.: https://poetrydb.org/author/Edgar Allan Poe
#The spaces will get handled by your web browser

# A cool pythonism (introduced in Python 3): f strings
# https://docs.python.org/3/tutorial/inputoutput.html#tut-f-strings
URL = f'https://poetrydb.org/author,title/{AUTHOR};{POEM}'
result = json.loads(requests.get(URL).text)

print(result)



[{'title': 'I Wandered Lonely As A Cloud', 'author': 'William Wordsworth', 'lines': ['I wandered lonely as a cloud', "That floats on high o'er vales and hills,", 'When all at once I saw a crowd,', 'A host, of golden daffodils;', 'Beside the lake, beneath the trees,', 'Fluttering and dancing in the breeze.', '', 'Continuous as the stars that shine', 'And twinkle on the milky way,', 'They stretched in never-ending line', 'Along the margin of a bay:', 'Ten thousand saw I at a glance,', 'Tossing their heads in sprightly dance.', '', ' The waves beside them danced, but they', 'Out-did the sparkling leaves in glee;', 'A poet could not be but gay,', 'In such a jocund company!', 'I gazed—and gazed—but little thought', 'What wealth the show to me had brought:', '', 'For oft, when on my couch I lie', 'In vacant or in pensive mood,', 'They flash upon that inward eye', 'Which is the bliss of solitude;', 'And then my heart with pleasure fills,', 'And dances with the daffodils.'], 'linecount': '24'}

## Question 2

2. Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

In [64]:

import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

nlp = spacy.load("en_core_web_md")
nlp.add_pipe('spacytextblob')
poem = '\n'.join(result[0]['lines']) 
print(poem)

doc = nlp(poem)
doc._.blob.polarity
doc._.blob.subjectivity
doc._.blob.sentiment_assessments.assessments
doc._.blob.ngrams()

print('\n' + 'Polarity: ')
print( doc._.blob.polarity)

# Given that the polarity score of this poem is 0.10435185185185186 the sentiment analysis would suggest that the poem is just slightly positive or almost nuetral.
# However, I can't help but wonder if the vocabulary this analysis is using doesn't quite line up with the language of the poem because I would expect it to be VERY posititve.
# This poem referenced lots of positive actions like fluttering and dancing.  It also talks about the author's heart being filled with pleasure and says he could not be be but gay.
# These terms, though obviously positive when read in context, would not necescarily be recorded as such with a more modern vocabulary.


I wandered lonely as a cloud
That floats on high o'er vales and hills,
When all at once I saw a crowd,
A host, of golden daffodils;
Beside the lake, beneath the trees,
Fluttering and dancing in the breeze.

Continuous as the stars that shine
And twinkle on the milky way,
They stretched in never-ending line
Along the margin of a bay:
Ten thousand saw I at a glance,
Tossing their heads in sprightly dance.

 The waves beside them danced, but they
Out-did the sparkling leaves in glee;
A poet could not be but gay,
In such a jocund company!
I gazed—and gazed—but little thought
What wealth the show to me had brought:

For oft, when on my couch I lie
In vacant or in pensive mood,
They flash upon that inward eye
Which is the bliss of solitude;
And then my heart with pleasure fills,
And dances with the daffodils.

Polarity: 
0.10435185185185186


## Question 3

3. Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

The api that I am using does not have every poem and author.  These are the ones I used:
Lines Written In Early Spring  by: William Wordsworth
Blow, Blow, Thou Winter Wind  by: William Shakespeare
Not at Home to Callers    by: Emily Dickinson
The words the happy say   by: Emily Dickinson

For the filename I made an acronym of the title.

In [68]:
import requests
import json

author1 = input("Enter author name:")
poem1 = input("Enter poem name:")
filename1 = input("Enter filename:")

def poemfile(author, poem, filename):
   
    URL = f'https://poetrydb.org/author,title/{author};{poem}'
    result = json.loads(requests.get(URL).text)
    print(result)
    
    with open(filename + '.json', 'w') as json_file:
        json.dump(result, json_file)
    
poemfile(author1, poem1, filename1)

Enter author name:Emily Dickinson
Enter poem name:The words the happy say
Enter filename:TWTHS
[{'title': 'The words the happy say', 'author': 'Emily Dickinson', 'lines': ['The words the happy say', 'Are paltry melody', 'But those the silent feel', 'Are beautiful --'], 'linecount': '4'}]


## Question 4

4. Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

In [124]:
filename1 = input("Enter first filename:")
filename2 = input("Enter second filename:")
filename3 = input("Enter third filename:")

nlp = spacy.load("en_core_web_md")
nlp.add_pipe('spacytextblob')


def poemsent(fn):
    with open(fn + '.json', 'r') as my_data_file:
        result2 = json.load(my_data_file)
             
        title = result2[0]['title']
        print('Title: ' + title)
    
    poem2 = '\n'.join(result2[0]['lines'])
    doc = nlp(poem2)
    doc._.blob.polarity
    doc._.blob.subjectivity
    doc._.blob.sentiment_assessments.assessments
    doc._.blob.ngrams()
    print('\n' + poem2)
    print('\n' + 'Polarity: ')
    print( doc._.blob.polarity)
    print('\n' +'\n')

poemsent(filename1)
poemsent(filename2)
poemsent(filename3)
    


Enter first filename:twths
Enter second filename:nahtc
Enter third filename:lwies
Title: The words the happy say

The words the happy say
Are paltry melody
But those the silent feel
Are beautiful --

Polarity: 
0.5499999999999999



Title: Not at Home to Callers

Not at Home to Callers
Says the Naked Tree --
Bonnet due in April --
Wishing you Good Day --

Polarity: 
0.19166666666666665



Title: Lines Written In Early Spring

I heard a thousand blended notes,
While in a grove I sate reclined,
In that sweet mood when pleasant thoughts
Bring sad thoughts to the mind.

To her fair works did Nature link
The human soul that through me ran;
And much it grieved my heart to think
What man has made of man.

Through primrose tufts, in that green bower,
The periwinkle trailed its wreaths;
And 'tis my faith that every flower
Enjoys the air it breathes.

The birds around me hopped and played,
Their thoughts I cannot measure:--
But the least motion which they made
It seemed a thrill of pleasure.

Th

 The Words the Happy Say- Polarity: .55
     This score seems about right.  It has a lot of positive words in the poem, but there are not many to go off of.  There is a deeper meaning that the analysis is not picking up on, but it is overall positive anyway.
     
  Not at Home to Callers-  Polarity: .192
      This score seems a little low.  The poem itself is pretty neutral until the end when it is whishing you a good day.  That last part feels like it shoudl make it much more positive.
      
  Lines Written in Early Spring-  Polarity: .108
      This score seems a little high.  Although there are many positive words in the poem, the whole point of the poem is lamenting what man done to man.  Perhaps the more archaic language is not well suited for the vocabulary of this analysis, or perhaps the analysis is not great at pulling the main point out of the poem.
      
In general, I would say that this analysis is not well suited for poetry or any very figurative/sybolic text.
     
 

The active project can be found at: https://github.com/aspeer05/baserepo/blob/main/python-ds.ipynb