## Part 1: Existing Machine Learning Services

<a href="https://colab.research.google.com/github/peckjon/hosting-ml-as-microservice/blob/master/part1/score_reviews_via_service.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Obtain labelled reviews

In order to test any of the sentiment analysis APIs, we need a labelled dataset of reviews and their sentiment polarity. We'll use NLTK to download the movie_reviews corpus.

In [3]:
from nltk import download

download('movie_reviews')

[nltk_data] Downloading package movie_reviews to
[nltk_data]     C:\Users\spdf\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\movie_reviews.zip.


True

### Load the data

The files in movie_reviews have already been divided into two sets: positive ('pos') and negative ('neg'), so we can load the raw text of the reviews into two lists, one for each polarity.

In [1]:
from nltk.corpus import movie_reviews

# extract words from reviews, pair with label

reviews_pos = []
for fileid in movie_reviews.fileids('pos'):
    review = movie_reviews.raw(fileid)
    reviews_pos.append(review)

reviews_neg = []
for fileid in movie_reviews.fileids('neg'):
    review = movie_reviews.raw(fileid)
    reviews_neg.append(review)

In [2]:
reviews_neg[1]

'the happy bastard\'s quick movie review \ndamn that y2k bug . \nit\'s got a head start in this movie starring jamie lee curtis and another baldwin brother ( william this time ) in a story regarding a crew of a tugboat that comes across a deserted russian tech ship that has a strangeness to it when they kick the power back on . \nlittle do they know the power within . . . \ngoing for the gore and bringing on a few action sequences here and there , virus still feels very empty , like a movie going for all flash and no substance . \nwe don\'t know why the crew was really out in the middle of nowhere , we don\'t know the origin of what took over the ship ( just that a big pink flashy thing hit the mir ) , and , of course , we don\'t know why donald sutherland is stumbling around drunkenly throughout . \nhere , it\'s just " hey , let\'s chase these people around with some robots " . \nthe acting is below average , even from the likes of curtis . \nyou\'re more likely to get a kick out of h

### Connect to the scoring API

Fill in this function with code that connects to one of these APIs, and uses it to score a single review:

* [Amazon Comprehend: Detect Sentiment](https://docs.aws.amazon.com/comprehend/latest/dg/API_DetectSentiment.html)
* [Google Natural Language: Analyzing Sentiment](https://cloud.google.com/natural-language/docs/analyzing-sentiment)
* [Azure Cognitive Services: Sentiment Analysis](https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-sentiment-analysis)
* [Algorithmia: Sentiment Analysis](https://algorithmia.com/algorithms/nlp/SentimentAnalysis)

Your function must return either 'pos' or 'neg', so you'll need to make some decisions about how to map the results of the API call to one of these values. For example, Amazon Comprehend can return "NEUTRAL" or "MIXED" for the Sentiment -- if this happens, you may with to inspect the numeric values under the SentimentScore to see whether it leans toward positive or negative.


In [7]:
import Algorithmia

def score_review(review):
    
    api_key = 'sim31XblIm/7DdL2s2frvJLW+611'
    client = Algorithmia.client(api_key)
    algo = client.algo('nlp/SentimentAnalysis/1.0.5')
    algo.set_options(timeout=300)
    
    _input = {'document': review}
    print('getting result from remote API...')
    result = algo.pipe(_input).result
    print('result obtianed!')
    if result[0]['sentiment'] < 0:
        return 'neg'
    
    # TBD: call the service and return 'pos' or 'neg'
    return 'pos'

### Score each review

Now, we can use the function you defined to score each of the reviews

In [9]:
results_pos = []
for review in reviews_pos:
    result = score_review(review)  # Every input is only one line of sentence
    results_pos.append(result)

results_neg = []
for review in reviews_neg:
    result = score_review(review)
    results_neg.append(result)

getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API.

result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result from remote API...
result obtianed!
getting result

AlgorithmException: "Account doesn't have any remaining credits"

### Calculate accuracy

For each of our known positive reviews, we can count the number which our function scored as 'pos', and use this to calculate the % accuracy. We repeaty this for negative reviews, and also for overall accuracy.

In [None]:
correct_pos = results_pos.count('pos')
accuracy_pos = float(correct_pos) / len(results_pos)
correct_neg = results_neg.count('neg')
accuracy_neg = float(correct_neg) / len(results_neg)
correct_all = correct_pos + correct_neg
accuracy_all = float(correct_all) / (len(results_pos)+len(results_neg))

print('Positive reviews: {}% correct'.format(accuracy_pos*100))
print('Negative reviews: {}% correct'.format(accuracy_neg*100))
print('Overall accuracy: {}% correct'.format(accuracy_all*100))