## Part 1: Existing Machine Learning Services

<a href="https://colab.research.google.com/github/peckjon/hosting-ml-as-microservice/blob/master/part1/score_reviews_via_service.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Obtain labelled reviews

In order to test any of the sentiment analysis APIs, we need a labelled dataset of reviews and their sentiment polarity. We'll use NLTK to download the movie_reviews corpus.

In [1]:
from nltk import download

download('movie_reviews')

[nltk_data] Downloading package movie_reviews to
[nltk_data]     /Users/jeff/nltk_data...
[nltk_data]   Package movie_reviews is already up-to-date!


True

### Load the data

The files in movie_reviews have already been divided into two sets: positive ('pos') and negative ('neg'), so we can load the raw text of the reviews into two lists, one for each polarity.

In [2]:
from nltk.corpus import movie_reviews

# extract words from reviews, pair with label

reviews_pos = []
for fileid in movie_reviews.fileids('pos'):
    review = movie_reviews.raw(fileid)
    reviews_pos.append(review)

print(len(reviews_pos))

reviews_neg = []
for fileid in movie_reviews.fileids('neg'):
    review = movie_reviews.raw(fileid)
    reviews_neg.append(review)

print(len(reviews_neg))

1000
1000


### Connect to the scoring API

Fill in this function with code that connects to one of these APIs, and uses it to score a single review:

* [Amazon Comprehend: Detect Sentiment](https://docs.aws.amazon.com/comprehend/latest/dg/API_DetectSentiment.html)
* [Google Natural Language: Analyzing Sentiment](https://cloud.google.com/natural-language/docs/analyzing-sentiment)
* [Azure Cognitive Services: Sentiment Analysis](https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-sentiment-analysis)
* [Algorithmia: Sentiment Analysis](https://algorithmia.com/algorithms/nlp/SentimentAnalysis)

Your function must return either 'pos' or 'neg', so you'll need to make some decisions about how to map the results of the API call to one of these values. For example, Amazon Comprehend can return "NEUTRAL" or "MIXED" for the Sentiment -- if this happens, you may wish to inspect the numeric values under the SentimentScore to see whether it leans toward positive or negative.


In [15]:
import Algorithmia

client = Algorithmia.client('simry8CSKtrsYlG24hNJ/YvsuCj1')

review = "I like the rain in Seattle"

input = {"document": review}

algo = client.algo('nlp/SentimentAnalysis/1.0.5')

try:
    # Get the summary result of your file's contents
    print(algo.pipe(input).result[0]['sentiment'])
except Exception as error:
    # Algorithm error if, for example, the input is not correctly formatted
    print(error)


0.3612


In [20]:
def score_review(review):
    input = {"document": review}
    s = algo.pipe(input).result[0]['sentiment']
    return 'pos' if s>=0 else 'neg'

### Score each review

Now, we can use the function you defined to score each of the reviews

In [21]:
results_pos = []
for review in reviews_pos:
    print(len(results_pos))
    if (len(review) < 4900):
        result = score_review(review)
        results_pos.append(result)

results_neg = []
for review in reviews_neg:
    print(len(results_neg))
    if (len(review) < 4900):
        result = score_review(review)
        results_neg.append(result)

0
1
2
3
3
4
4
5
6
7
8
8
9
10
11
11
12
13
14
15
16
17
18
19
20
20
21
22
22
22
22
22
22
23
23
24
25
26
27
28
29
30
30
31
32
33
34
35
36
37
38
39
40
41
41
42
43
44
45
46
46
46
47
47
48
49
49
50
51
52
53
54
54
55
56
56
57
57
58
59
60
61
62
63
64
65
66
66
66
67
68
69
70
71
72
73
74
75
76
77
77
78
79
80
81
82
82
83
84
85
86
86
87
87
87
88
88
88
89
90
91
91
92
92
92
93
94
95
96
97
98
99
99
100
101
102
103
103
104
105
106
106
107
108
108
109
110
110
110
111
112
113
114
115
116
117
118
119
120
120
121
122
123
123
123
124
124
124
125
126
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
158
158
159
160
160
161
162
163
164
164
165
166
167
168
169
170
171
172
172
173
174
175
176
177
178
179
180
181
182
182
183
183
184
185
185
186
187
188
188
188
189
190
191
192
193
193
193
194
194
195
195
196
197
198
198
199
200
201
202
203
203
204
205
206
207
208
209
209
210
210
211
212
212
212
213
214
214
215
216
217
217
218
218
21

AlgorithmException: "Account doesn't have any remaining credits"

### Calculate accuracy

For each of our known positive reviews, we can count the number which our function scored as 'pos', and use this to calculate the % accuracy. We repeaty this for negative reviews, and also for overall accuracy.

In [5]:
correct_pos = results_pos.count('pos')
accuracy_pos = float(correct_pos) / len(results_pos)
correct_neg = results_neg.count('neg')
accuracy_neg = float(correct_neg) / len(results_neg)
correct_all = correct_pos + correct_neg
accuracy_all = float(correct_all) / (len(results_pos)+len(results_neg))

print('Positive reviews: {}% correct'.format(accuracy_pos*100))
print('Negative reviews: {}% correct'.format(accuracy_neg*100))
print('Overall accuracy: {}% correct'.format(accuracy_all*100))

ZeroDivisionError: float division by zero