<a href="https://colab.research.google.com/github/TheBlackRus/Manning_LP_deploy_ml_as_microservice/blob/main/LP_predict_sentiment_analysis_part_03.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Part 3: Deploying as a FaaS

<a href="https://colab.research.google.com/github/peckjon/hosting-ml-as-microservice/blob/master/part3/predict_sentiment_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Download corpuses

Since we won't be doing any model-training in this step, we don't need the 'movie_reviews' corpus. However, we will still need to extract features from our input before each prediction, so we make sure 'punkt' and 'stopwords' are available for tokenization and stopword-removal. If you added any other corpuses in Part 2, consider whether they'll be needed in the prediction step.

In [1]:
from nltk import download

download('punkt')
download('stopwords')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

### Define feature extractor and bag-of-words converter

IMPORTANT: your predictions will only work properly if you use the same feature extractor that you trained your model with, so copy your updated `extract_features` method over from Part 2, replacing the method below. 

In [2]:
from nltk.corpus import stopwords
from string import punctuation

stopwords_eng = stopwords.words('english')
#better_words = ['boring', 'stupid', 'marvelous', 'wonderful', 'ludicrous', 'sucks', 'apparently', 'filmmakers', 'astounding', 'avoids', 'atrocious', 'worst', 'magnificent', 'best', 'stark', 'strengths', 'outstanding', 'imaginative', 'strongest', 'headache', 'insulting', 'breathtaking', 'finest', 'illogical', 'dream', 'accessible', 'effective', 'regard', 'team', 'elliot', 'controversial', 'wasted', 'great', 'palpable', 'keen', 'unbearable', 'fascination', 'seamless', 'sans', 'conveys', 'hilarious', 'exquisite', 'country', 'questioning', 'embarrassing', 'matches', 'profit', 'literal', 'fairness', 'shannon', 'wonderfully', 'saddled', 'mcconaughey', 'sundance', 'secondly', 'memorable', 'idiotic', 'depicted', 'uninvolving', 'vulnerability', 'gaining', 'manipulation', 'shoddy', 'hatred', 'yawn', 'naval', 'ugh', 'treasure', 'frances', 'fishing', 'farther', 'anger', 'bad', 'chilling', 'best', 'insipid', 'uplifting', 'everyday', 'bore', 'commanding', 'quite', 'feeble', 'rude', 'detract', 'goo', 'aliens', 'nasa', 'mark', 'bad', 'mixing', '4', 'gump', 'surrealistic', 'gives', 'dread', 'liam', 'kurt', 'apocalyptic', 'poetry', 'glasses']
def extract_features(words):
    return [w for w in words if w not in stopwords_eng and w not in punctuation ]#and (w in better_words)]

def bag_of_words(words):
    bag = {}
    for w in words:
        bag[w] = bag.get(w,0)+1
    return bag

### Import your pickled model file (non-Colab version)

In Part 2, we saved the trained model as "sa_classifier.pickle". Now we'll unpickle that file to get it back into memory. Either copy that file into the same folder as this notebook ("part3"), or adjust the path below to "../part2/sa_classifier.pickle" so it reads the file from the folder where it was saved.

In [3]:
import os

In [4]:
os.listdir(".")

['.config',
 'sa_classifier_better.pickle',
 'sa_classifier.pickle',
 'sample_data']

In [5]:
import pickle
import sys

#if not 'google.colab' in sys.modules:
#model_file = open('./sa_classifier_better.pickle', 'rb')
model_file = open('./sa_classifier.pickle', 'rb')
model = pickle.load(model_file)
model_file.close()

### Import your pickled model file (Colab version)

If you're running this notebook on Colab, we need to retrieve the pickled model from [Google Drive](https://drive.google.com) before we can unpickle it. This code looks for "sa_classifier.pickle" in a folder called "Colab Output"; if you have moved the file elsewhere, change the path below.

In [None]:
import pickle
import sys

if 'google.colab' in sys.modules:
    from google.colab import drive
    drive.mount('/content/gdrive')
    !ls '/content/gdrive/My Drive/Colab Output'
    model_file = open('/content/gdrive/My Drive/Colab Output/sa_classifier.pickle','rb')
    model = pickle.load(model_file)
    model_file.close()
    print('Model loaded from /content/gdrive/My Drive/Colab Output')

### Define a method for prediction

In the prediction step, we'll be taking a single piece of text input and asking the model to classify it. Models need the input for the prediction step to have the same format as the data provided during training -- so we must tokenize the input, run the same `extract_features` method that we used during training, and convert it to a bag of words before sending it to the model's `classify` method.

Note: if you have (from Part 2) changed your `extract_features` method to accept the full text instead of a tokenized list, then you can omit the tokenization step here.

In [6]:
from nltk.tokenize import word_tokenize

def get_sentiment(review):
    words = word_tokenize(review)
    words = extract_features(words)
    words = bag_of_words(words)
    return model.classify(words)

### Run a prediction

Test out your `get_sentiment` method on some sample inputs of your own devising: try altering the two reviews below and see how your model performs. It won't be 100% correct, and we're mostly just looking to see that it is able to run at all, but if it sems to *always* be wrong, that may indicate you've missed a critical step above (e.g. you haven't copied over all the changes to your feature extractor from Part 2, or you've loaded the wrong model file, or provided un-tokenized text when a list of words was expected).

In [7]:
positive_review = 'This movie is amazing, with witty dialog and beautiful shots.'
print('positive_review: '+get_sentiment(positive_review))

negative_review = 'I hated everything about this unimaginitive mess. Two thumbs down!'
print('negative_review: '+get_sentiment(negative_review))

positive_review: pos
negative_review: neg


In [16]:
#%pip install Algorithmia

Collecting Algorithmia
  Downloading https://files.pythonhosted.org/packages/77/e0/5e8d8794e0f07ac993524e2bce3c4168c73269b0ec2d030118cb46d1e9e6/algorithmia-1.8.0-py2.py3-none-any.whl
Collecting enum-compat
  Downloading https://files.pythonhosted.org/packages/55/ae/467bc4509246283bb59746e21a1a2f5a8aecbef56b1fa6eaca78cd438c8b/enum_compat-0.0.3-py3-none-any.whl
Collecting algorithmia-adk<1.1,>=1.0
  Downloading https://files.pythonhosted.org/packages/ba/7a/bde356f95cb4c10b92b7af4ffa8a137876bdff8f437b9b827ad318310aed/algorithmia_adk-1.0.2-py2.py3-none-any.whl
Collecting argparse
  Downloading https://files.pythonhosted.org/packages/f2/94/3af39d34be01a24a6e65433d19e107099374224905f1e0cc6bbe1fd22a2f/argparse-1.4.0-py2.py3-none-any.whl
Collecting algorithmia-api-client==1.3.1
[?25l  Downloading https://files.pythonhosted.org/packages/63/1b/9e8d578c72863b8bef58ec4d62d6fd654417e476498fa39b67c77be65c8a/algorithmia_api_client-1.3.1-py2.py3-none-any.whl (151kB)
[K     |█████████████████████████

In [10]:
import Algorithmia

input = "This is a great movie!"
client = Algorithmia.client(API_KEY)
algo = client.algo('The_Black_Rus/lp_alg/1.0.0')
algo.set_options(timeout=300) # optional
print(algo.pipe(input).result)

pos
