<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#NLP-Workflow-in-AWS-Comprehend" data-toc-modified-id="NLP-Workflow-in-AWS-Comprehend-1">NLP Workflow in AWS Comprehend</a></span><ul class="toc-item"><li><span><a href="#Set-Up-Sentiment-Analysis-API" data-toc-modified-id="Set-Up-Sentiment-Analysis-API-1.1">Set Up Sentiment Analysis API</a></span></li><li><span><a href="#Read-in-text-files---Yelp-Reviews" data-toc-modified-id="Read-in-text-files---Yelp-Reviews-1.2">Read in text files - Yelp Reviews</a></span></li><li><span><a href="#Sentiment-Analysis-on-Entire-Document" data-toc-modified-id="Sentiment-Analysis-on-Entire-Document-1.3">Sentiment Analysis on Entire Document</a></span></li></ul></li><li><span><a href="#NLP-Workflow-in-Google-Cloud-Platform" data-toc-modified-id="NLP-Workflow-in-Google-Cloud-Platform-2">NLP Workflow in Google Cloud Platform</a></span></li></ul></div>

# NLP Workflow in AWS Comprehend

Reference: 
* Noah's notebook: https://github.com/noahgift/recommendations/blob/master/notebooks/NLP_AWS.ipynb
* Amazon Comprehend Developer Guide: https://docs.aws.amazon.com/comprehend/latest/dg/comprehend-dg.pdf

## Set Up Sentiment Analysis API

In [4]:
import pandas as pd
import boto3
import json

In [2]:
comprehend = boto3.client(service_name='comprehend')

To set up the confidential:
    
* Step 2: Set Up the AWS Command Line Interface (AWS CLI)
    * setup: https://docs.aws.amazon.com/cli/latest/userguide/cli-install-macos.html
        * the only difference is : I didn’t use “ls -a ~”, instead, I use sublime ~/.zshrc 
        * and added a line of “export PATH=~/.local/bin:$PATH” at the begining
    * configure
        * Access key ID: see evernote
        * Secret access key: see evernote

In [3]:
text = "It is sunny today in San Francisco"
print('Calling DetectSentiment')
print(json.dumps(comprehend.detect_sentiment(Text=text, LanguageCode='en'), sort_keys=True, indent=4))
print('End of DetectSentiment\n')

Calling DetectSentiment
{
    "ResponseMetadata": {
        "HTTPHeaders": {
            "connection": "keep-alive",
            "content-length": "163",
            "content-type": "application/x-amz-json-1.1",
            "date": "Wed, 07 Mar 2018 07:20:43 GMT",
            "x-amzn-requestid": "0c3a25ca-21d8-11e8-a9ff-3f6c9b5ef217"
        },
        "HTTPStatusCode": 200,
        "RequestId": "0c3a25ca-21d8-11e8-a9ff-3f6c9b5ef217",
        "RetryAttempts": 0
    },
    "Sentiment": "NEUTRAL",
    "SentimentScore": {
        "Mixed": 0.0008793864399194717,
        "Negative": 0.00784251093864441,
        "Neutral": 0.9757477641105652,
        "Positive": 0.015530400909483433
    }
}
End of DetectSentiment



## Read in text files - Yelp Reviews

In [5]:
path = "/Users/Crystal/src/Data_Analytics_Projects/Yelp_Dataset_Challenge_NLP_Project/yelp_dataset_challenge_round9/yelp_academic_dataset_review.json"
doc1 = open(path, "r")
output = doc1.readlines()

In [11]:
output[2]

'{"review_id":"wslW2Lu4NYylb1jEapAGsw","user_id":"r1NUhdNmL6yU9Bn-Yx6FTw","business_id":"2aFiy99vNLklCx3T_tGS9A","stars":5,"date":"2011-04-29","text":"Great service! Corey is very service oriented. Works fast and very effiecient with his time. Going to use him again real soon to do additional IT services. thanks Corey.","useful":0,"funny":0,"cool":0,"type":"review"}\n'

In [13]:
python_obj = json.loads(output[2])

In [14]:
python_obj['text']

'Great service! Corey is very service oriented. Works fast and very effiecient with his time. Going to use him again real soon to do additional IT services. thanks Corey.'

In [17]:
print(json.dumps(comprehend.detect_sentiment(Text=python_obj['text'], LanguageCode='en'), sort_keys=True, indent=4))

{
    "ResponseMetadata": {
        "HTTPHeaders": {
            "connection": "keep-alive",
            "content-length": "166",
            "content-type": "application/x-amz-json-1.1",
            "date": "Wed, 07 Mar 2018 07:45:21 GMT",
            "x-amzn-requestid": "7ceab1a3-21db-11e8-9342-67a4d2892763"
        },
        "HTTPStatusCode": 200,
        "RequestId": "7ceab1a3-21db-11e8-9342-67a4d2892763",
        "RetryAttempts": 0
    },
    "Sentiment": "POSITIVE",
    "SentimentScore": {
        "Mixed": 0.0017196304397657514,
        "Negative": 0.0004199223767500371,
        "Neutral": 0.017272725701332092,
        "Positive": 0.9805876612663269
    }
}


## Sentiment Analysis on Entire Document

In [18]:
whole_doc = ', '.join(map(str, output))

In [20]:
# print(json.dumps(comprehend.detect_sentiment(Text=whole_doc, LanguageCode='en'), sort_keys=True, indent=4))
# To be fix: Error here: "OverflowError: string longer than 2147483647 bytes" 

# NLP Workflow in Google Cloud Platform

Reference:
* Noah's notebook: https://github.com/noahgift/recommendations/blob/master/notebooks/NLP_GCP.ipynb

In [21]:
def implicit():
    from google.cloud import storage

    # If you don't specify credentials when constructing the client, the
    # client library will look for credentials in the environment.
    storage_client = storage.Client()

    # Make an authenticated API request
    buckets = list(storage_client.list_buckets())
    print(buckets)

In [22]:
implicit()

[<Bucket: elevated-watch-181504>, <Bucket: practicum_zqy>]


In [26]:
#! pip install google.cloud

In [25]:
# Imports the Google Cloud client library
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types