# Imports

In [1]:
from sklearn.utils import shuffle
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
import joblib
import pandas as pd
import os
import sagemaker
import re

# Model Deploy

By using the artifacts created by the mode, SageMaker can deploy the model so it is a service running. 
For this part we will create an endpoint (service of the model running) from the predictor defined and fit before. Next, we will use a single tweet to process it and send it to the endpoint (the model deployed) and wait for an answer from it. 

## Processing of a single tweet

First, since we will process a single tweet, which should be a line of text, we have to pre process it. It is important to remember that the model hadels an matrix of numbers representing the text (word vectorization).

In [2]:
tweet = "@user why not @user mocked obama for being black.  @user @user @user @user #brexit" 
#this tweet is from the database, it is labelled as 1, which means its cathegorized as violent.

To transform this single tweet we should use the same function we used in the data exploration step. Since that function is defined in a different notebook, we will copy it and re define it here.

In [3]:
def tweet_to_words(tweet):
    REPLACE_NO_SPACE = re.compile("(\.)|(\;)|(\:)|(\!)|(\')|(\?)|(\,)|(\")|(\()|(\))|(\[)|(\])")
    REPLACE_WITH_SPACE = re.compile("(<br\s*/><br\s*/>)|(\-)|(\/)")
    words = re.sub("(?P<url>https?://[^\s]+)"," ",tweet)
    words = re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)","",words)
    words = REPLACE_NO_SPACE.sub("", words.lower())
    words = REPLACE_WITH_SPACE.sub(" ", words)
    words = str(words)
    return words

In [4]:
test_tweet = tweet_to_words(tweet)
print(test_tweet)

 why not  mocked obama for being black      brexit


Now we need to transformed this cleaned tweet into a bag of words. For this, we need a vocabulary to process it. We need to use the same vocabulary created by the CountVectorizer in the data preparation step. This one we saved as an object in the vocabulary.json file.

In [5]:
import json
with open("data/vocabulary.json", "r") as file:
    vocabulary = json.load(file)

We will define a function for which we will pass the processed tweet and the vocabulary as parameters and will return a list.
This list can be understood as a row of a dataframe. The list will have the length of the vocabulary (5000) meaning that each element (column) would represent a letter in the vocabulary, and the number in that element will be the count of the word in the new tweet.

In [6]:
def bow_encoding(tweet, vocabulary): #function bag of words encoding.
    bow = [0] * len(vocabulary) # Start by setting the count for each word in the vocabulary to zero. #That is, a list filled 5000 elements, all of them are 0.
    for word in tweet.split():  # For each word in the string
        if word in vocabulary:  # If the word is one that occurs in the vocabulary, increase its count.
            bow[vocabulary[word]] += 1 #the element in the position of the word in the vocabulary will increase by 1 everytime that that words appears in the vocabulary/
    return bow

In [7]:
test_bow = bow_encoding(test_tweet,vocabulary)
print(test_bow)

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

In [8]:
len(test_bow)

5000

## Deployment

For the deployment, first we re create the estimator fit in the last part of the modeling step. 
To do this, we can create a Sagemaker Estimator but using the "attach" method we can link the new estimator to the one created before. We only need to pass the name of the training job used.
This name is the one we saved in the last line in the model notebook.

In [15]:
import json
with open("data/training_job_AWS.json","r") as file:
    training_job = json.load(file)
    training_job = training_job["training_job"]

print(training_job)

xgboost-2022-04-15-13-27-03-050


Now, having the training job, we can associate the new estimator to the training job saved before. That is to say, we re create the estimator.

In [16]:
xgb_combined = sagemaker.estimator.Estimator.attach(training_job)


2022-04-15 13:46:08 Starting - Preparing the instances for training
2022-04-15 13:46:08 Downloading - Downloading input data
2022-04-15 13:46:08 Training - Training image download completed. Training in progress.
2022-04-15 13:46:08 Uploading - Uploading generated training model
2022-04-15 13:46:08 Completed - Training job completed


The next line deploys the model fit before. By deploying it, the function returns the direction of the endpoint where the service is working.

In [17]:
xgb_combined_predictor = xgb_combined.deploy(initial_instance_count = 1, instance_type = 'ml.m4.xlarge')

----------!

Now, a library is needed to interact with Amazon services (now and for the lambda function that will be needed later). This library allows to communicate with AWS and can gives us access to the SageMaker environment. We need this access to send and get information from the endpoint deployed. 
The library is boto3

In [18]:
import boto3
runtime = boto3.Session().client('sagemaker-runtime')

The runtime variable is containing the sagemaker runtime. We will use a method from this runtime, this method will allow to get a response from the model created. 
But the name of the endpoint is needed. The name of the endpoint (xgb_combined_predictor) can be accessed by a method:

In [19]:
print(xgb_combined_predictor.endpoint_name)

xgboost-2022-04-15-18-09-59-588


Now a response variable is created by calling (invoking) the endpoint.

In [24]:
response = runtime.invoke_endpoint(EndpointName = xgb_combined_predictor.endpoint_name, # The name of the endpoint we created
                                       ContentType = 'text/csv',                     # The data format that is expected
                                       Body = ','.join([str(val) for val in test_bow]).encode('utf-8'))
#the test is transformed from an integer array to a string document encoded so it can be read by the predictor (endpoint)

In [25]:
print(response)

{'ResponseMetadata': {'RequestId': 'cbc75595-9d43-4a6b-8edc-13f588974c0c', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'cbc75595-9d43-4a6b-8edc-13f588974c0c', 'x-amzn-invoked-production-variant': 'AllTraffic', 'date': 'Fri, 15 Apr 2022 18:16:36 GMT', 'content-type': 'text/csv; charset=utf-8', 'content-length': '18'}, 'RetryAttempts': 0}, 'ContentType': 'text/csv; charset=utf-8', 'InvokedProductionVariant': 'AllTraffic', 'Body': <botocore.response.StreamingBody object at 0x7f418159a0f0>}


The response gotten from the endpoint is in a json format. For this, we nead to read it and decode it. 

In [26]:
response = response['Body'].read().decode('utf-8')

In [29]:
print(response)

0.9954820871353149


In [33]:
print(round(float(response)))

1


We can see that the response 1 (closer to 1), meaning that the model succesfully calsified the example.

Now that we have tested that the endpoint is running, we can set up the rest of the app and for now.

# The endpoint will be deleted when not used, otherwise charges might apply.

In [34]:
xgb_combined_predictor.delete_endpoint()