# Web Mining and Applied NLP (44-620)

## Requests, JSON, and NLP

### Student Name: Lee Jones
https://github.com/IamLimaEchoEcho?tab=repositories

Perform the tasks described in the Markdown cells below.  When you have completed the assignment make sure your code cells have all been run (and have output beneath them) and ensure you have committed and pushed ALL of your changes to your assignment repository.

Make sure you have [installed spaCy and its pipeline](https://spacy.io/usage#quickstart) and [spaCyTextBlob](https://spacy.io/universe/project/spacy-textblob)

Every question that requires you to write code will have a code cell underneath it; you may either write your entire solution in that cell or write it in a python file (`.py`), then import and run the appropriate code to answer the question.

This assignment requires that you write additional files (either JSON or pickle files); make sure to submit those files in your repository as well.

1. The following code accesses the [lyrics.ovh](https://lyricsovh.docs.apiary.io/#reference/0/lyrics-of-a-song/search) public api, searches for the lyrics of a song, and stores it in a dictionary object.  Write the resulting json to a file (either a JSON file or a pickle file; you choose). You will read in the contents of this file for future questions so we do not need to frequently access the API.

I tried to use lyrics.ovh.  I created a profile and tried to use the API at lyricsgenius.  Every place I tried online would not work because of SSL Certificate errors.  I suspect this is due to me working from my work issued laptop or possibly blocked by my corporate internet security.  I had to go the Kaggle route.  

In [1]:
import requests
import json

In [2]:
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

In [32]:
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe('spacytextblob')

<spacytextblob.spacytextblob.SpacyTextBlob at 0x140865fbd60>

In [4]:
#Since I went the Kaggle csv route, I need to convert my file to JSON.  In this module, I convert the CSV to JSON.  
import csv 

def csv_to_json(csvFilePath, jsonFilePath):
    jsonArray = []
      
    #read csv file
    with open(csvFilePath, encoding='utf-8') as csvf: 
        #load csv file data using csv library's dictionary reader
        csvReader = csv.DictReader(csvf) 

        #convert each csv row into python dict
        for row in csvReader: 
            #add this python dict to json array
            jsonArray.append(row)
  
    #convert python jsonArray to JSON String and write to file
    with open(jsonFilePath, 'w', encoding='utf-8') as jsonf: 
        jsonString = json.dumps(jsonArray, indent=4)
        jsonf.write(jsonString)
          
csvFilePath = r'PostMalone.csv'
jsonFilePath = r'PostMalone.json'
csv_to_json(csvFilePath, jsonFilePath)

2. Read in the contents of your file.  Print the lyrics of the song (not the entire dictionary!) and use spaCyTextBlob to perform sentiment analysis on the lyrics.  Print the polarity score of the sentiment analysis.  Given that the range of the polarity score is `[-1.0,1.0]` which corresponds to how positive or negative the text in question is, do you think the lyrics have a more positive or negative connotaion?  Answer this question in a comment in your code cell.

In [11]:
print(jsonFilePath)
#"": "6",
#"Artist": "Post Malone",
#"Title": "Better Now",
#"Album": "beerbongs & bentleys",
#"Year": "2018",
#"Date": "2018-04-27",
#"Lyric":

PostMalone.json


In [38]:
with open (jsonFilePath) as f:
   data = json.load(f)

print( data[6]["Title"] )

text = data[6]["Lyric"]
doc = nlp(text)

print(doc._.blob.polarity)


Better Now
0.12966269841269837


In [None]:
# With a polarity score of .129, the tone is overall positive, but it is not very high.  It's closer to neutral than positive. 

3. Write a function that takes an artist, song, and filename, accesses the lyrics.ovh api to get the song lyrics, and writes the results to the specified filename.  Test this function by getting the lyrics to any four songs of your choice and storing them in different files.

In [84]:
#If I understand this task correctly, you want the user to input artist, song, and filename and then output the song lyrics to a file.  I assume that's a JSON output file. 

def lyrics_to_json(artist, song, filename):
    i = 0 
    ifound = False
    with open(jsonFilePath, 'r') as file:
            data = json.load(file)
            while i < len(data):
                iartist = data[i]["Artist"]
                isong = data[i]["Title"]
                ilyric = data[i]["Lyric"]

                if iartist == artist and isong == song:
                    idict = {
                        "Artist": iartist,
                        "Title": isong,
                        "Lyric": ilyric
                    }
                    ifound = True
                    print(iartist, isong)
                i+= 1 

    if ifound == True:
        # Serializing json
        json_object = json.dumps(idict, indent=4)
    
        # Writing to sample.json
        with open(filename+".json", "w") as outfile:
            json.dump(idict, outfile)
            


In [88]:
artist = input("Enter Artist: ")
song = input("Enter Song: ")
filename = input("Enter Filename: ")

lyrics_to_json(artist,song,filename)



Post Malone Paranoid


4. Write a function that takes the name of a file that contains song lyrics, loads the file, performs sentiment analysis, and returns the polarity score.  Use this function to print the polarity scores (with the name of the song) of the three files you created in question 3.  Does the reported polarity match your understanding of the song's lyrics? Why or why not do you think that might be?  Answer the questions in either a comment in the code cell or a markdown cell under the code cell.

In [112]:
#my filenames: Better_Now.json, Blame_It_On_Me.json, Jonestown_(Interlude).json, Paranoid.json
def sentiment_json(file4):

    with open(file4) as f:
        data4 = json.load(f)
        artist = data4['Artist'] 
        song =  data4['Title'] 
        lyric = data4['Lyric'] 

    doc = nlp(lyric)
    polarity = doc._.blob.polarity
    return artist, song, polarity

In [114]:
#Song 1
file4 = 'Better_Now.json'
artist, song, polarity = sentiment_json(file4)
print(artist,':', song,':', polarity)

#Song 2
file4 = 'Blame_It_On_Me.json'
artist, song, polarity = sentiment_json(file4)
print(artist,':', song,':', polarity)

#Song 3
file4 = 'Jonestown_(Interlude).json'
artist, song, polarity = sentiment_json(file4)
print(artist,':', song,':', polarity)

#Song 4
file4 = 'Paranoid.json'
artist, song, polarity = sentiment_json(file4)
print(artist,':', song,':', polarity)


Post Malone : Better Now : 0.12966269841269837
Post Malone : Blame It On Me : 0.02688492063492063
Post Malone : Jonestown (Interlude) : 0.0
Post Malone : Paranoid : 0.11839285714285715


##Does the reported polarity match your understanding of the song's lyrics? 
   # Yes - the polarity matches what I would expect.  
##Why or why not do you think that might be?  
   # The songs are not overly positive or negative... but they do have a slightly more positive sentiment than negative. 

In [117]:
import os
os.system('jupyter nbconvert --to html requests-json-nlp.ipynb')

0