# TensorFlow/Keras Script Mode: NLP/Sentiment Analysis Example

## Key Notes

- Make sure TF framework version 2.2 when working with LSTM, bug with TF2.1 and lower to serve an LSTM/GRU model to SageMaker.
- For script we have preprocessing within our script, generally best practice to preprocess and upload dataset to S3 to have a more clear idea of your pipeline

In [None]:
import tensorflow as tf
import pandas as pd
import numpy as np
import re
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

# Data Reading and Preprocessing

In [26]:
df = pd.read_csv('Sentiment.csv')
df = df[['sentiment', 'text']]

#only positive and negative sentiments
df = df[df.sentiment != "Neutral"]

#Punctuation
df['text'] = df['text'].apply(lambda x: x.lower())
df['text'] = df['text'].apply((lambda x: re.sub('[^a-zA-z0-9\s]','',x)))

#Removing RT
for idx,row in df.iterrows():
    row[0] = row[0].replace('rt','')
df.head()

Unnamed: 0,sentiment,text
1,Positive,rt scottwalker didnt catch the full gopdebate ...
3,Positive,rt robgeorge that carly fiorina is trending h...
4,Positive,rt danscavino gopdebate w realdonaldtrump deli...
5,Positive,rt gregabbott_tx tedcruz on my first day i wil...
6,Negative,rt warriorwoman91 i liked her and was happy wh...


In [27]:
print(len(df))

10729


# Upload Data to S3

In [28]:
#Create a sagemaker session to be able to upload data to s3
import boto3
import sagemaker
sagemaker_session = sagemaker.Session()

In [29]:
#Splitting into train test data
train = df.iloc[:8000,:]
test = df.iloc[8001:,:]
train.to_csv('train.csv', index=False)
test.to_csv('test.csv', index=False)

In [30]:
#Upload to S3
prefix = "tf-sentiment-data"
training_input_path = sagemaker_session.upload_data('train.csv', key_prefix=prefix + '/training')
test_input_path = sagemaker_session.upload_data('test.csv', key_prefix=prefix + '/test')

In [32]:
#verify uploaded to S3 properly
train = pd.read_csv(training_input_path)
print(len(train))
test = pd.read_csv(test_input_path)
print(len(test))

8000
2728


In [113]:
training_input_path

's3://sagemaker-us-east-1-906815961619/tf-sentiment-data/training/train.csv'

# Create TensorFlow Estimator

In [120]:
role = sagemaker.get_execution_role()
role

'arn:aws:iam::906815961619:role/service-role/AmazonSageMaker-ExecutionRole-20200704T205931'

- Adjust epochs to more sensible number
- Adjust other hyperparams for model through argparse in script

In [124]:
#Use a tensorflow estimator from sagemaker to train model
from sagemaker.tensorflow import TensorFlow

tf_estimator = TensorFlow(entry_point='sentiment.py', 
                          role=role,
                          instance_count=1, 
                          instance_type='ml.c5.18xlarge',
                          framework_version='2.2', 
                          py_version='py37',
                          script_mode=True,
                          hyperparameters={
                              'epochs': 2
                          }
                         )

# Training

In [125]:
tf_estimator.fit({'train': training_input_path})

2021-07-26 19:33:05 Starting - Starting the training job...
2021-07-26 19:33:32 Starting - Launching requested ML instancesProfilerReport-1627327985: InProgress
......
2021-07-26 19:34:32 Starting - Preparing the instances for training.........
2021-07-26 19:36:04 Downloading - Downloading input data...
2021-07-26 19:36:33 Training - Downloading the training image..[34m2021-07-26 19:36:41.569246: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:425] Initializing the SageMaker Profiler.[0m
[34m2021-07-26 19:36:41.573223: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:106] SageMaker Profiler is not enabled. The timeline writer thread will not be started, future recorded events will be dropped.[0m
[34m2021-07-26 19:36:41.699852: W tensorflow/core/profiler/internal/smprofiler_timeline.cc:425] Initializing the SageMaker Profiler.[0m
[34m2021-07-26 19:36:45,321 sagemaker-training-toolkit INFO     Imported framework sagemaker_tensorflow_container.training[0m
[34

# Deployment

In [143]:
import time
from sagemaker.deserializers import JSONDeserializer
from sagemaker.serializers import CSVSerializer
tf_endpoint_name = 'tf-sentiment-model'+time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
tf_predictor = tf_estimator.deploy(initial_instance_count=1,
                                   instance_type='ml.m5.4xlarge',
                                   endpoint_name=tf_endpoint_name,
                                   serializer=CSVSerializer(),
                                   deserializer=JSONDeserializer())

update_endpoint is a no-op in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


-----------!

# Inference
- Preprocessing data because our model expects the numeric version of test data
- Utilize Batch Transform for entire test dataset
- Batch Transform: https://github.com/aws-samples/amazon-sagemaker-script-mode/blob/master/tf-sentiment-script-mode/sentiment-analysis.ipynb

In [176]:
#Taking 20 test data points for inference and preprocessing
testDF = pd.read_csv('test.csv')
testX = testDF['text'][:20]
print(testDF[:20])
testY = testDF['sentiment']
max_fatures = 2000
tokenizer = Tokenizer(num_words=max_fatures, split=' ')
tokenizer.fit_on_texts(testX.values)
testX = tokenizer.texts_to_sequences(testX.values)
testX = pad_sequences(testX)

#inference with deployed endpoint
sampPredictions = tf_predictor.predict(testX)

#returns a list of negative, positive can index for higher value sentiment
print("------------")
print("Model Predictions")
print("------------")
sampPred = sampPredictions['predictions']
for pred in sampPred:
    print("Negative") if pred[0] > .5 else print("Positive")             

   sentiment                                               text
0   Negative  rt puestoloco cenkuygur\n cancel primaries fox...
1   Negative  rt rwsurfergirl it is very disappointing that ...
2   Positive  rt rwsurfergirl tedcruz and realdonaldtrump ne...
3   Negative  rt jordiemojordie yall better not lie about go...
4   Negative  rt rwsurfergirl it is very disappointing that ...
5   Negative  rt rwsurfergirl jeb bush reminds me of elevato...
6   Negative  rt rwsurfergirl why doesnt chris wallace ask t...
7   Negative  rt monaeltahawy has any candidate received a w...
8   Positive  kasich i do believe in miracles restore common...
9   Negative  rt lattisaw no minority in their right mind sh...
10  Positive  i know god hes a friend of mine im glad to cou...
11  Negative  rt rwsurfergirl fox news is obviously trying t...
12  Negative  rt jamiaw the purpose of the military is to ki...
13  Positive  rt usveteran2 foxnews shutting sentedcruz out ...
14  Positive  rt rwsurfergirl ask trump 