# IMDB Sentimental analyser

This is the first project from Machine Learning Engineer Nanodegree Program Udacity course
(https://www.udacity.com/course/machine-learning-engineer-nanodegree--nd009t)

# First part

This is the first part of getting data and preparing it like the course's code in https://github.com/udacity/sagemaker-deployment/blob/master/Tutorials/IMDB%20Sentiment%20Analysis%20-%20XGBoost%20-%20Web%20App.ipynb

### Dowloading data

In [1]:
%mkdir -p data/all_data
!wget -O data/aclImdb_v1.tar.gz http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
!tar -zxf data/aclImdb_v1.tar.gz -C data

data_dir="data/all_data"

--2020-09-13 09:21:37--  http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
Resolvendo ai.stanford.edu (ai.stanford.edu)... 171.64.68.10
Conectando-se a ai.stanford.edu (ai.stanford.edu)|171.64.68.10|:80... conectado.
A requisição HTTP foi enviada, aguardando resposta... 200 OK
Tamanho: 84125825 (80M) [application/x-gzip]
Salvando em: “data/aclImdb_v1.tar.gz”


2020-09-13 09:21:48 (7,70 MB/s) - “data/aclImdb_v1.tar.gz” salvo [84125825/84125825]



# Preparing 

In [2]:
import os
import glob

def read_imdb_data(data_dir='data/aclImdb'):
    data = {}
    labels = {}
    
    for data_type in ['train', 'test']:
        data[data_type] = {}
        labels[data_type] = {}
        
        for sentiment in ['pos', 'neg']:
            data[data_type][sentiment] = []
            labels[data_type][sentiment] = []
            
            path = os.path.join(data_dir, data_type, sentiment, '*.txt')
            files = glob.glob(path)
            
            for f in files:
                with open(f) as review:
                    data[data_type][sentiment].append(review.read())
                    # Here we represent a positive review by '1' and a negative review by '0'
                    labels[data_type][sentiment].append(1 if sentiment == 'pos' else 0)
                    
            assert len(data[data_type][sentiment]) == len(labels[data_type][sentiment]), \
                    "{}/{} data size does not match labels size".format(data_type, sentiment)
                
    return data, labels

In [3]:
data, labels = read_imdb_data()
print("IMDB reviews: train = {} pos / {} neg, test = {} pos / {} neg".format(
            len(data['train']['pos']), len(data['train']['neg']),
            len(data['test']['pos']), len(data['test']['neg'])))

IMDB reviews: train = 12500 pos / 12500 neg, test = 12500 pos / 12500 neg


In [4]:
from sklearn.utils import shuffle

def prepare_imdb_data(data, labels):
    """Prepare training and test sets from IMDb movie reviews."""
    
    #Combine positive and negative reviews and labels
    data_train = data['train']['pos'] + data['train']['neg']
    data_test = data['test']['pos'] + data['test']['neg']
    labels_train = labels['train']['pos'] + labels['train']['neg']
    labels_test = labels['test']['pos'] + labels['test']['neg']
    
    #Shuffle reviews and corresponding labels within training and test sets
    data_train, labels_train = shuffle(data_train, labels_train)
    data_test, labels_test = shuffle(data_test, labels_test)
    
    # Return a unified training data, test data, training labels, test labets
    return data_train, data_test, labels_train, labels_test

In [5]:
train_X, test_X, train_y, test_y = prepare_imdb_data(data, labels)
print("IMDb reviews (combined): train = {} test = {}".format(len(train_X), len(test_X)))
print(train_X[540])
print(train_y[540])

IMDb reviews (combined): train = 25000 test = 25000
This film is probably pro-Muslimization. <br /><br />Why do I write that? The main character has a Muslim father and a Christian mother. He lives his first 20 years in a Christian village. In the end of the film he seemingly is a Muslim because of his head-wear, that he has kept his amulet, and his general clothing. He has a six year old child, who wears the same head-wear and therefore is probably a Muslim, although the mother is a Christian. The main character thus chooses to, it seems, to be a Muslim and his child becomes a Muslim. No one of the other male main characters, which are Christians, seems to breed a child. There are more Muslims in the world of this movie at the end of it, it therefore seems.
0


# Filtering the data

In [6]:
import re
import nltk
nltk.download('stopwords')
nltk.download('punkt')
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize 

def remove_html_tags(text):
    """Remove html tags from a string"""
    import re
    clean = re.compile(r'<.*?>')
    return re.sub(clean, '', text)

def filter_text_list(text_list):
    cleanr = re.compile(r"[^a-zA-Z0-9]")
    stop_words = set(stopwords.words('english')) 
    
    filtered_sentence = []
    docs = []
    for text in text_list:
        word_tokens = word_tokenize(re.sub(cleanr, ' ', remove_html_tags(text.lower()))) # All words in lower case
        filtered_sentence.append([w for w in word_tokens if not w in stop_words])   
        
    return filtered_sentence

[nltk_data] Downloading package stopwords to /home/robson/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /home/robson/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [7]:
train_filtered_X = filter_text_list(train_X)
# Just checking
print(('Original train_X length = {} - filtered_train_X length = {}').format(len(train_X),len(train_filtered_X)))

test_filtered_X = filter_text_list(test_X)
# Just checking
print(('Original test_X length = {} - filtered_test_X length = {}').format(len(test_X),len(test_filtered_X)))

Original train_X length = 25000 - filtered_train_X length = 25000
Original test_X length = 25000 - filtered_test_X length = 25000


In [8]:
print(train_filtered_X[540])
print(train_y[540])

['film', 'probably', 'pro', 'muslimization', 'write', 'main', 'character', 'muslim', 'father', 'christian', 'mother', 'lives', 'first', '20', 'years', 'christian', 'village', 'end', 'film', 'seemingly', 'muslim', 'head', 'wear', 'kept', 'amulet', 'general', 'clothing', 'six', 'year', 'old', 'child', 'wears', 'head', 'wear', 'therefore', 'probably', 'muslim', 'although', 'mother', 'christian', 'main', 'character', 'thus', 'chooses', 'seems', 'muslim', 'child', 'becomes', 'muslim', 'one', 'male', 'main', 'characters', 'christians', 'seems', 'breed', 'child', 'muslims', 'world', 'movie', 'end', 'therefore', 'seems']
0


In [9]:
# Just looking the diference
before = train_X[37]
after = train_filtered_X[37]
print("### Before ###")
print(before)
print("### After ###")
print(after)

### Before ###
This is a pretty strange movie. It does comes across as an exploitation film with over-the-top violence and unrealistic situations, but unusual for being constructed around rural characters at war with each other, as opposed to an invading 'other'.<br /><br />The movie is an excessive stereotype of Vietnam veterans, in a long line of films that portrayed the vets of that war as dangerous psycopaths. Kris Kristofferson's last line is 'I ain't lost a war yet', as he meets his demise after wreaking a long trail of murder and destruction, including the town's chief of police and his brother's girlfriend in a particularly chilling scene. However, Kristofferson is a good enough actor, and charismatic enough, to carry this villain with a surprising depth. Vincent is clearly the golden boy, but with enough intensity layered over his clean cut goodness. The movie bears some plot resemblance to Winchester 73 where Jimmy Stewart tries to tolerate a criminal brother until being forc

In [10]:
from collections import Counter
import numpy as np

train_sentences=np.copy(train_filtered_X)

count = 0
wordsDic = Counter() #Dictionary that will map a word to the number of times it appeared in all the training sentences
for i, sentence in enumerate(train_sentences):
    #The sentences will be stored as a list of words/tokens
    train_sentences[i] = []
    for word in nltk.word_tokenize(' '.join(sentence)): #Tokenizing the words
        train_sentences[i].append(word)
        wordsDic.update([word])
        count=count+1


  return array(a, order=order, subok=subok, copy=True)


### Top 5 used words 

In [11]:
i = 0

dicSorted = {}

for k in sorted(wordsDic, key=wordsDic.get, reverse=True):
    dicSorted[k]=wordsDic[k]
    if i == 4:
        break
    i = i + 1
    
print("Top 5 of words : {}".format(list(dicSorted)))


Top 5 of words : ['movie', 'film', 'one', 'like', 'good']


In [12]:
wordsList = {k:v for k,v in wordsDic.items() if v>1}
wordsList = sorted(wordsList, key=wordsList.get, reverse=True)
# Creating unknown and padding index 
wordsList = ['FILL_','UNKN_'] + wordsList
words2index = {o:i for i,o in enumerate(wordsList)}

In [13]:
# Applying unknown and padding index
for i, sentence in enumerate(train_filtered_X):
    train_filtered_X[i] = [words2index[word] if word in words2index else words2index['UNKN_'] for word in sentence]
    

In [14]:
for i, sentence in enumerate(test_filtered_X):
    sentence = re.sub("[^a-zA-Z]",  " ", str(sentence))
    test_filtered_X[i] = [words2index[word] if word in words2index else words2index['UNKN_'] for word in nltk.word_tokenize(sentence)]


In [15]:
# Just checking
print(train_filtered_X[45])
print(train_y[45])
print(test_filtered_X[45])
print(test_y[45])

[3, 824, 1, 2866, 154, 1197, 3514, 7140, 3, 967, 2585, 678, 710, 3, 1078, 680, 547, 390, 1272, 3514, 7140, 80, 242, 285, 1011, 168, 24, 227, 2230, 619, 53, 1410, 54, 3, 14119, 536, 125, 65, 1142, 43, 4989, 570, 786, 4314, 12, 3, 547, 451, 22, 392, 224, 26, 4105, 76, 535, 24, 227, 163, 908, 460, 3, 242, 60, 242, 315, 3191, 315, 799, 2606, 9, 5, 12, 1, 7140, 3, 29, 4, 21, 1620, 168, 61, 115]
0
[3, 402, 101, 684, 181, 467, 1678, 1, 108, 915, 100, 773, 771, 31, 94, 79, 6, 887, 55, 3562, 4040, 685, 395, 181, 6, 1697, 4856, 101, 684, 4, 691, 60, 43, 11, 5, 1635, 5698, 91, 1, 2524, 136, 16401, 1157, 3, 2924, 86, 2872, 15551, 228, 2294, 63, 1011, 1094, 100, 26, 10, 722, 8, 3538, 638, 462, 842, 733, 10, 11265, 5698, 72, 2428, 1297, 1823, 8758, 440, 9245, 6201, 199, 108, 432, 994, 1, 40157, 62, 6201, 276, 8133, 8758, 83, 14, 829, 127, 842, 733, 20, 3, 15, 1392, 3667]
1


In [16]:
def redefineSentences(sentences, seq_len):
    redefinedSentences = np.zeros((len(sentences), seq_len),dtype=int)
    for ii, review in enumerate(sentences):
        if len(review) != 0:
            redefinedSentences[ii, -len(review):] = np.array(review)[:seq_len]
    return redefinedSentences

In [17]:
seq_len = 200 #The length that the sentences will be cut

train_filtered_X = redefineSentences(train_filtered_X, seq_len)
test_filtered_X = redefineSentences(test_filtered_X, seq_len)

train_y = np.array(train_y)
test_y = np.array(test_y)

In [18]:
# Just checking
print(train_filtered_X[45])
print(train_y[45])
print(test_filtered_X[45])
print(test_y[45])

[    0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     3   824
     1  2866   154  1197  3514  7140     3   967  2585   678   710     3
  1078   680   547   390  1272  3514  7140    80   242   285  1011   168
    24   227  2230   619    53  1410    54     3 14119   536   125    65
  1142    43  4989   570   786  4314    12     3   

# Set up and upload the data

In [19]:
%%time

import os
import boto3
import re
import sagemaker
from sagemaker import get_execution_role

# NAO ESQUECER DO GET
role = get_execution_role()
region = boto3.Session().region_name
sagemaker_session = sagemaker.Session(boto3.Session())
print(region)
bucket = sagemaker_session.default_bucket()
print(bucket)
prefix = 'sagemaker/IMDB-data'

us-east-2
sagemaker-us-east-2-214237513994
CPU times: user 830 ms, sys: 85.4 ms, total: 915 ms
Wall time: 3.76 s


In [20]:
def write_to_s3(local_directory, work_directory):
    return sagemaker_session.upload_data(local_directory, key_prefix=work_directory)

In [21]:
import numpy as np
from sklearn.model_selection import train_test_split

# Dividing test data in test and validation
X_valid, X_test, y_valid, y_test = train_test_split(test_filtered_X, test_y, test_size=0.85, random_state=42)

# Just checking
print(('X_test length = {} - y_test length = {}').format(len(X_test),len(y_test)))
print(('X_valid length = {} - y_valid length = {}').format(len(X_valid),len(y_valid)))

X_test length = 21250 - y_test length = 21250
X_valid length = 3750 - y_valid length = 3750


In [48]:
X_train, X_train1, y_train, y_train1 = train_test_split(train_filtered_X, train_y, test_size=0.95, random_state=42)

print(('X_test length = {} - y_test length = {}').format(len(X_train),len(y_train)))
print(('X_valid length = {} - y_valid length = {}').format(len(X_train1),len(y_train1)))

X_test length = 1250 - y_test length = 1250
X_valid length = 23750 - y_valid length = 23750


In [104]:
import pandas as pd
import torch
    
pd.concat([pd.DataFrame(train_y), pd.DataFrame(train_filtered_X)], axis=1) \
        .to_csv(os.path.join(data_dir, 'train'), header=False, index=False)

#pd.concat([pd.DataFrame(y_train), pd.DataFrame(X_train)], axis=1) \
#        .to_csv(os.path.join(data_dir, 'train'), header=False, index=False)



pd.concat([pd.DataFrame(y_test), pd.DataFrame(X_test)], axis=1) \
        .to_csv(os.path.join(data_dir, 'test'), header=False, index=False)


pd.concat([pd.DataFrame(y_valid), pd.DataFrame(X_valid)], axis=1) \
        .to_csv(os.path.join(data_dir, 'valid'), header=False, index=False)

with open(os.path.join(data_dir, 'dictionary.dic'), 'wb') as f:
    torch.save(words2index, f)              

#Saving all data.
s3_input_train = write_to_s3(data_dir, prefix)


In [82]:
%load_ext autoreload

%autoreload 2

from RNN.RNN import IMDBClassifier

modelCfg = {}    
model_cfg = './model.cfg'
with open(model_cfg, 'rb') as f:                                
    modelCfg = torch.load(f)  

print("### MODEL CONFIG {}".format(modelCfg))

completeModel = IMDBClassifier(modelCfg['vocab_size'], modelCfg['output_size'], modelCfg['embedding_dim'], modelCfg['hidden_dim'], modelCfg['n_layers'])    

complete_model = './model.pth'
with open(complete_model, 'rb') as f:
    completeModel.load_state_dict(torch.load(f))

complete_dict = './model.dic'
with open(complete_dict, 'rb') as f:
    completeModelDict = torch.load(f)

completeModel.to(torch.device(modelCfg['device']))
completeModel.dictionary = completeModelDict

print("Dictionary {}".format(completeModel.dictionary))
print("Batch size {}".format(completeModel.batch_size))

print(type(completeModel))
print('____')
print(completeModel)
print('____')

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
### MODEL CONFIG {'embedding_dim': 400, 'hidden_dim': 512, 'vocab_size': 46816, 'output_size': 1, 'n_layers': 2, 'device': 'cpu', 'batch_size': 400}
Batch size 400
<class 'RNN.RNN.IMDBClassifier'>
____
IMDBClassifier(
  (embedding): Embedding(46816, 400, padding_idx=0)
  (lstm): LSTM(400, 512, num_layers=2, batch_first=True, dropout=0.5)
  (dropout): Dropout(p=0.2, inplace=False)
  (fc): Linear(in_features=512, out_features=1, bias=True)
  (sigmoid): Sigmoid()
)
____


In [103]:
from RNN.data import Dataset

dataset = Dataset('')
h = completeModel.init_hidden(1)
h = tuple([each.data for each in h])
    
test_review = 'The simplest pleasures in life are the best, and this film is one of them. Combining a rather basic storyline of love and adventure this movie transcends the usual weekend fair with wit and unmitigated charm.'

x = dataset.transformRawData(completeModel.dictionary, [test_review], 200)   

print("### X {}".format(x))

counter = 0
for inp in x:
    print("### inp {}".format(inp))
    counter +=1
    inputs = inp 
    
completeModel.eval()

device = torch.device("cpu")

inputs.to(device)

im = next(iter(x))

output, h = completeModel(im, h)

print(" ### H {}".format(h))
print(" ### Output {}".format(output))
print(" ### Output {}".format(int(np.round(output.detach().numpy()))))

['Great film, but I hate it.']
[['great', 'film', 'hate']]
[[20, 3, 647]]
[[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  20
    3 647]]
<torch.utils.data.dataloader.DataLoader object at 0x7fddd34aaaf0>
### X <torch.utils.data.dataloader

In [105]:
from sagemaker.pytorch import PyTorch

estimator = PyTorch(source_dir ='./RNN',
                    entry_point='train.py',
                    role=role,
                    sagemaker_session = sagemaker_session,
                    framework_version='1.5.0',
                    train_instance_count=1,
                    py_version='py3',
                    train_instance_type='ml.m5.xlarge',
                    hyperparameters={
                        'epochs'               : 1,
                        'n_layers'             : 2,
                        'embedding_dim'        : 400,
                        'hidden_dim'           : 512,
                        'vocab_size'           : len(words2index)+1,
                        'dictionary_file_name' : 'dictionary.dic',
                    })



In [106]:
estimator.fit({'training': s3_input_train})


'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.
'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


2020-09-13 23:18:02 Starting - Starting the training job...
2020-09-13 23:18:05 Starting - Launching requested ML instances...
2020-09-13 23:19:09 Starting - Preparing the instances for training......
2020-09-13 23:19:59 Downloading - Downloading input data
2020-09-13 23:19:59 Training - Downloading the training image..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2020-09-13 23:20:26,815 sagemaker-containers INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2020-09-13 23:20:26,818 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2020-09-13 23:20:26,832 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2020-09-13 23:20:27,065 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2020-09-13 23:20:27,283 sagemaker-containers INFO     Module default_user_module_nam


2020-09-13 23:20:26 Training - Training image download completed. Training in progress.[34mInstalling collected packages: regex, nltk[0m
[34mSuccessfully installed nltk-3.5 regex-2020.7.14[0m
[34mGPU not available, CPU used[0m
[34mUsing device cpu.[0m
[34mTrain dataset len = 25000[0m
[34mTensor train dataset <torch.utils.data.dataset.TensorDataset object at 0x7effb239b2b0>[0m
[34mInitializing training.[0m
[34m[2020-09-13 23:20:38.351 algo-1:44 INFO json_config.py:90] Creating hook from json_config at /opt/ml/input/config/debughookconfig.json.[0m
[34m[2020-09-13 23:20:38.351 algo-1:44 INFO hook.py:183] tensorboard_dir has not been set for the hook. SMDebug will not be exporting tensorboard summaries.[0m
[34m[2020-09-13 23:20:38.351 algo-1:44 INFO hook.py:228] Saving to /opt/ml/output/tensors[0m
[34m[2020-09-13 23:20:38.352 algo-1:44 INFO hook.py:364] Monitoring the collections: losses[0m
[34m[2020-09-13 23:20:38.353 algo-1:44 INFO hook.py:422] Hook is writing fro

[34mResult = tensor([0.1260, 0.6284, 0.6713, 0.0260, 0.7326, 0.5912, 0.7225, 0.4744, 0.6746,
        0.3237, 0.7610, 0.0524, 0.4998, 0.0354, 0.4948, 0.3586, 0.0985, 0.5834,
        0.6743, 0.0901, 0.5971, 0.7577, 0.7660, 0.7695, 0.7883, 0.6253, 0.7564,
        0.7245, 0.4626, 0.7049, 0.7267, 0.1760, 0.6981, 0.6308, 0.6832, 0.4997,
        0.3835, 0.4387, 0.5681, 0.7451, 0.6663, 0.4700, 0.8135, 0.5541, 0.0754,
        0.7469, 0.3309, 0.6933, 0.3535, 0.0898, 0.2701, 0.3605, 0.7350, 0.8024,
        0.5682, 0.7418, 0.1716, 0.0407, 0.7426, 0.0239, 0.7939, 0.6960, 0.6098,
        0.7118, 0.6830, 0.7176, 0.5394, 0.7176, 0.5643, 0.0331, 0.5823, 0.7122,
        0.7190, 0.6504, 0.5527, 0.4735, 0.5251, 0.7965, 0.5319, 0.6892, 0.5099,
        0.6703, 0.6511, 0.7112, 0.7098, 0.7599, 0.7402, 0.0362, 0.7540, 0.2317,
        0.2000, 0.6900, 0.7153, 0.1105, 0.2693, 0.6916, 0.7870, 0.5280, 0.1512,
        0.7828, 0.6006, 0.6829, 0.3717, 0.1723, 0.7097, 0.6653, 0.3518, 0.5646,
        0.5026, 0.8000, 0.

[34mResult = tensor([0.2125, 0.0202, 0.3261, 0.5467, 0.5262, 0.2265, 0.0582, 0.7154, 0.7221,
        0.3642, 0.7656, 0.7530, 0.0939, 0.5502, 0.1114, 0.0342, 0.4038, 0.6406,
        0.7717, 0.2014, 0.7583, 0.6827, 0.5575, 0.2018, 0.0702, 0.5317, 0.7068,
        0.3890, 0.0917, 0.2016, 0.6524, 0.7361, 0.5981, 0.5753, 0.4921, 0.3484,
        0.7043, 0.5077, 0.0574, 0.7527, 0.4732, 0.2533, 0.0173, 0.4138, 0.6944,
        0.6416, 0.2577, 0.8179, 0.7422, 0.6665, 0.3601, 0.7201, 0.0303, 0.7782,
        0.7379, 0.5316, 0.5972, 0.2090, 0.6407, 0.0149, 0.7191, 0.6287, 0.6329,
        0.6161, 0.7017, 0.7153, 0.6951, 0.6084, 0.6307, 0.6476, 0.6362, 0.5922,
        0.7333, 0.2629, 0.6714, 0.7372, 0.7790, 0.6536, 0.1751, 0.6058, 0.6831,
        0.2117, 0.6505, 0.6782, 0.7092, 0.6613, 0.6640, 0.3894, 0.6986, 0.6237,
        0.5281, 0.7059, 0.0653, 0.4013, 0.0383, 0.4676, 0.7793, 0.1256, 0.4600,
        0.7534, 0.7577, 0.1998, 0.3495, 0.5857, 0.3367, 0.3395, 0.7160, 0.7475,
        0.5690, 0.6250, 0.

[34mResult = tensor([0.3602, 0.2430, 0.7635, 0.6805, 0.0485, 0.5670, 0.1474, 0.4332, 0.0258,
        0.7143, 0.7276, 0.0551, 0.1588, 0.8149, 0.4742, 0.5741, 0.5163, 0.4141,
        0.0232, 0.7128, 0.2006, 0.0618, 0.6025, 0.7415, 0.6327, 0.5011, 0.6796,
        0.5061, 0.8329, 0.3557, 0.6563, 0.0689, 0.2129, 0.4228, 0.7651, 0.8500,
        0.6586, 0.6984, 0.7556, 0.7410, 0.0758, 0.5817, 0.0954, 0.0768, 0.5336,
        0.6207, 0.8426, 0.5622, 0.2748, 0.0314, 0.6197, 0.2603, 0.6666, 0.7302,
        0.7257, 0.7337, 0.5067, 0.6373, 0.3330, 0.3772, 0.4637, 0.3193, 0.5925,
        0.6860, 0.6690, 0.7133, 0.7212, 0.7731, 0.6866, 0.2644, 0.6529, 0.7263,
        0.1536, 0.6507, 0.7268, 0.2912, 0.7594, 0.4709, 0.2437, 0.7038, 0.4636,
        0.5922, 0.4593, 0.1061, 0.4277, 0.7389, 0.6090, 0.7485, 0.6236, 0.0534,
        0.4221, 0.3629, 0.8168, 0.0834, 0.7387, 0.6737, 0.1251, 0.6743, 0.6501,
        0.0599, 0.7216, 0.6714, 0.7125, 0.8173, 0.3253, 0.8081, 0.6541, 0.7049,
        0.7200, 0.5790, 0.

[34mResult = tensor([0.5600, 0.6815, 0.2341, 0.3820, 0.6520, 0.6843, 0.4815, 0.7862, 0.0617,
        0.6737, 0.4608, 0.4436, 0.5100, 0.4681, 0.2160, 0.7185, 0.1205, 0.5673,
        0.3019, 0.0485, 0.1744, 0.0799, 0.0749, 0.2470, 0.3312, 0.6704, 0.6281,
        0.7732, 0.7410, 0.4824, 0.0883, 0.7318, 0.2456, 0.0825, 0.0745, 0.5744,
        0.0419, 0.4573, 0.6553, 0.6365, 0.6978, 0.7286, 0.6105, 0.6839, 0.3304,
        0.0590, 0.7827, 0.4267, 0.6001, 0.5449, 0.6962, 0.0592, 0.5399, 0.0643,
        0.5862, 0.6014, 0.8172, 0.6416, 0.2980, 0.4236, 0.1728, 0.6171, 0.5379,
        0.6742, 0.7556, 0.7095, 0.6375, 0.7890, 0.1347, 0.6921, 0.4060, 0.1393,
        0.1928, 0.7354, 0.6270, 0.0528, 0.7277, 0.7621, 0.0237, 0.3583, 0.3684,
        0.8262, 0.0952, 0.5376, 0.6405, 0.7660, 0.3269, 0.7523, 0.3966, 0.2285,
        0.7220, 0.7009, 0.3908, 0.0209, 0.2396, 0.7689, 0.1162, 0.7276, 0.7217,
        0.6440, 0.7885, 0.6839, 0.6218, 0.4932, 0.7421, 0.3940, 0.6853, 0.3436,
        0.6166, 0.7397, 0.

[34mResult = tensor([0.7362, 0.8460, 0.7351, 0.5834, 0.1473, 0.5732, 0.0318, 0.2873, 0.6196,
        0.7000, 0.6590, 0.6459, 0.7637, 0.7254, 0.1817, 0.4929, 0.0571, 0.7846,
        0.0888, 0.8115, 0.7431, 0.4894, 0.7447, 0.3775, 0.4599, 0.8152, 0.5454,
        0.4337, 0.6672, 0.8265, 0.5218, 0.7697, 0.4875, 0.7073, 0.6918, 0.5770,
        0.6208, 0.6663, 0.0264, 0.7470, 0.6708, 0.7753, 0.7777, 0.6912, 0.6777,
        0.7090, 0.0859, 0.6842, 0.6298, 0.3003, 0.6680, 0.7675, 0.4903, 0.6854,
        0.5512, 0.8136, 0.2622, 0.2128, 0.7756, 0.7156, 0.1189, 0.0708, 0.6084,
        0.7704, 0.4230, 0.7583, 0.0306, 0.3223, 0.0487, 0.7795, 0.5750, 0.7570,
        0.8442, 0.8187, 0.5105, 0.7438, 0.7303, 0.5012, 0.0247, 0.5353, 0.7976,
        0.2374, 0.0746, 0.5585, 0.3133, 0.2789, 0.6090, 0.7278, 0.7176, 0.7387,
        0.3292, 0.2808, 0.2008, 0.7551, 0.5914, 0.7537, 0.8001, 0.5974, 0.5881,
        0.6159, 0.7115, 0.0315, 0.7310, 0.6885, 0.7235, 0.7406, 0.5182, 0.6628,
        0.6856, 0.5619, 0.

[34mResult = tensor([0.3449, 0.6661, 0.6563, 0.2532, 0.4555, 0.7376, 0.7404, 0.6914, 0.8145,
        0.5742, 0.6803, 0.7509, 0.7193, 0.4411, 0.5706, 0.1608, 0.7560, 0.7314,
        0.3311, 0.7172, 0.4308, 0.7727, 0.6047, 0.7400, 0.5145, 0.5181, 0.7697,
        0.5937, 0.6591, 0.1343, 0.1537, 0.2185, 0.0753, 0.2087, 0.4891, 0.0978,
        0.7466, 0.2717, 0.5167, 0.7134, 0.7321, 0.6353, 0.2223, 0.2481, 0.7176,
        0.2370, 0.7461, 0.6475, 0.5959, 0.4079, 0.6589, 0.5842, 0.7591, 0.2098,
        0.1694, 0.6691, 0.2566, 0.6246, 0.7402, 0.8018, 0.5873, 0.5210, 0.1925,
        0.6815, 0.5014, 0.7428, 0.7280, 0.2374, 0.7235, 0.3955, 0.4399, 0.7952,
        0.7091, 0.2548, 0.4268, 0.7532, 0.6844, 0.1537, 0.4739, 0.7581, 0.7146,
        0.6925, 0.0282, 0.7187, 0.3970, 0.5851, 0.7756, 0.5644, 0.7672, 0.6500,
        0.7446, 0.7839, 0.7104, 0.6591, 0.4350, 0.7353, 0.5607, 0.6041, 0.5668,
        0.6937, 0.3581, 0.7461, 0.4141, 0.7407, 0.3711, 0.0328, 0.3385, 0.1748,
        0.6916, 0.6665, 0.

[34mResult = tensor([0.6636, 0.7751, 0.1119, 0.1311, 0.2029, 0.5420, 0.5174, 0.0850, 0.1014,
        0.7219, 0.3547, 0.5062, 0.4280, 0.5821, 0.5631, 0.6947, 0.7815, 0.0210,
        0.7190, 0.7566, 0.5909, 0.7541, 0.5127, 0.6948, 0.7845, 0.5410, 0.1313,
        0.6660, 0.7882, 0.7070, 0.4644, 0.5415, 0.8022, 0.2250, 0.7709, 0.2045,
        0.4344, 0.0369, 0.7331, 0.4743, 0.4392, 0.5066, 0.5439, 0.4297, 0.6227,
        0.5763, 0.6888, 0.6402, 0.6374, 0.7734, 0.3521, 0.4607, 0.0184, 0.4026,
        0.0578, 0.7134, 0.5068, 0.7091, 0.8267, 0.6989, 0.2943, 0.1542, 0.5121,
        0.6717, 0.2526, 0.1696, 0.7633, 0.3842, 0.0258, 0.0575, 0.7719, 0.3139,
        0.7030, 0.5102, 0.3558, 0.6560, 0.1953, 0.5800, 0.1590, 0.7468, 0.7633,
        0.7636, 0.6333, 0.3116, 0.5954, 0.2596, 0.5627, 0.7288, 0.5848, 0.4816,
        0.7751, 0.7684, 0.5782, 0.7250, 0.5186, 0.6897, 0.7442, 0.5664, 0.7703,
        0.6564, 0.6669, 0.2576, 0.2970, 0.0742, 0.1093, 0.6406, 0.2744, 0.4413,
        0.7581, 0.5821, 0.

[34mResult = tensor([0.7574, 0.5091, 0.1245, 0.6645, 0.7581, 0.0535, 0.6661, 0.1550, 0.7319,
        0.5368, 0.2253, 0.5602, 0.8052, 0.7467, 0.6328, 0.7188, 0.4792, 0.6427,
        0.0568, 0.0257, 0.6884, 0.7498, 0.4286, 0.2761, 0.4349, 0.0578, 0.1020,
        0.0902, 0.7799, 0.7122, 0.7289, 0.4580, 0.7182, 0.7484, 0.7394, 0.6752,
        0.8149, 0.0197, 0.4575, 0.7110, 0.6827, 0.1192, 0.6716, 0.6546, 0.6610,
        0.6825, 0.0188, 0.7579, 0.5716, 0.5494, 0.6962, 0.6340, 0.5021, 0.6491,
        0.0164, 0.0718, 0.7672, 0.1043, 0.7467, 0.7589, 0.1910, 0.0416, 0.6830,
        0.3092, 0.4444, 0.7652, 0.7255, 0.1959, 0.5440, 0.6040, 0.7995, 0.7439,
        0.0469, 0.2684, 0.8603, 0.7694, 0.6556, 0.0933, 0.6684, 0.7384, 0.7307,
        0.6698, 0.4216, 0.1757, 0.5701, 0.0611, 0.4162, 0.4225, 0.3275, 0.6903,
        0.6557, 0.6060, 0.7006, 0.7217, 0.0770, 0.0147, 0.7138, 0.7657, 0.2121,
        0.5698, 0.1783, 0.2417, 0.7939, 0.6559, 0.7328, 0.1449, 0.6728, 0.4574,
        0.2251, 0.7468, 0.

[34mResult = tensor([0.7357, 0.5715, 0.6346, 0.6678, 0.0808, 0.7347, 0.8208, 0.3865, 0.6810,
        0.0181, 0.7570, 0.0802, 0.2692, 0.3935, 0.1318, 0.6967, 0.7654, 0.8182,
        0.7595, 0.6622, 0.8247, 0.3580, 0.7580, 0.5377, 0.7243, 0.5532, 0.6163,
        0.7087, 0.6567, 0.1836, 0.7073, 0.1827, 0.2017, 0.4086, 0.5553, 0.6947,
        0.1695, 0.5770, 0.3823, 0.7569, 0.6605, 0.7342, 0.7376, 0.1537, 0.5575,
        0.6802, 0.7439, 0.6388, 0.5540, 0.7173, 0.3443, 0.5924, 0.2421, 0.7172,
        0.3666, 0.3378, 0.5508, 0.3669, 0.4145, 0.5537, 0.5284, 0.7254, 0.1203,
        0.6085, 0.4398, 0.4517, 0.7498, 0.7810, 0.3023, 0.6741, 0.4243, 0.4830,
        0.1961, 0.0258, 0.1754, 0.5184, 0.7356, 0.6004, 0.6622, 0.2359, 0.6255,
        0.2768, 0.7647, 0.7811, 0.4866, 0.5225, 0.5561, 0.7323, 0.7048, 0.5212,
        0.2522, 0.4514, 0.3127, 0.1200, 0.4921, 0.4901, 0.7745, 0.7012, 0.7536,
        0.5950, 0.5432, 0.7329, 0.7693, 0.6845, 0.0125, 0.6994, 0.0294, 0.6037,
        0.7013, 0.5950, 0.

[34mResult = tensor([0.8121, 0.4895, 0.6123, 0.0485, 0.5613, 0.5086, 0.6775, 0.3106, 0.6828,
        0.1418, 0.5448, 0.7222, 0.5545, 0.6676, 0.2225, 0.5904, 0.5372, 0.2730,
        0.1052, 0.0481, 0.4230, 0.7000, 0.7196, 0.1044, 0.6551, 0.4967, 0.5584,
        0.1407, 0.1154, 0.0525, 0.7448, 0.4914, 0.6263, 0.3036, 0.4492, 0.7016,
        0.2002, 0.1940, 0.4744, 0.7766, 0.1908, 0.6546, 0.5927, 0.4696, 0.7376,
        0.7106, 0.4536, 0.7033, 0.3279, 0.4179, 0.8239, 0.7534, 0.0618, 0.0329,
        0.7544, 0.6759, 0.4249, 0.7485, 0.6681, 0.6013, 0.7372, 0.7503, 0.6470,
        0.0937, 0.6806, 0.4449, 0.7344, 0.3407, 0.7283, 0.6663, 0.3047, 0.8158,
        0.7560, 0.5507, 0.1246, 0.0589, 0.4247, 0.5830, 0.1390, 0.6321, 0.2410,
        0.5001, 0.0224, 0.7150, 0.7915, 0.7788, 0.2170, 0.0226, 0.7259, 0.6682,
        0.7758, 0.7632, 0.7168, 0.6759, 0.7401, 0.5773, 0.1426, 0.3097, 0.6366,
        0.7011, 0.2036, 0.6930, 0.6323, 0.3107, 0.4428, 0.6912, 0.7421, 0.2946,
        0.5605, 0.6767, 0.

In [53]:
predictor = estimator.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


--------------!

In [61]:
print(X_test[37])
print(y_test[37])

x = [[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 
0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0 ,0,  0,  
0,  0,  0,  0,  0,  0,  0,  0,  0,  0 ,0,  0,  0,  0,  
0,  0,  0,  0,  0,  0,  0,  0 ,0,  0,  0,  0,  0,  0, 
 0,  0,  0,  0,  0,  0 ,0,  0,  0,  0,  0,  0,  0,  0,  
 0,  0,  0,  0 ,0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 
  0,  0 ,0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0 ,0,  
  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0 ,0,  0,  0,  0, 
   0,  0,  0,  0,  0,  0,  0,  0 ,0,  0,  0,  0,  0,  0,  0,  
   0,  0,  0,  0,  0 ,0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 
    0,  0 ,0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,0,  
0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0 ,0,  0,  0,  
0,  0,  0,  0,  0,  0,  0,  0,  0,10756,8498, 34, 40,  3,  
      4,8119, 139, 975, 630, 38, 994, 2,9852, 509,2284,1108,2055, 20259,1236]]

#print(np.array(x).size(0))
print(type(x))

print(type(X_test[37]))

[    0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0   167
     3  4202   229    52     5    10   397   724  6779   698   550 14533
     7   433   804    12    94    72    12  1132     2     9   432   804
 46190   890   299   359  1478   784  3081     2  2451  1290  8392  6408
     7   142  6408    93   128 23891   982   702   

In [55]:
predictor.predict(X_test[37])

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "forward() missing 1 required positional argument: 'hidden'". See https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logEventViewer:group=/aws/sagemaker/Endpoints/pytorch-training-2020-09-11-23-58-06-472 in account 214237513994 for more information.

In [15]:
from sagemaker.predictor import RealTimePredictor, json_serializer, json_deserializer

class JSONPredictor(RealTimePredictor):
    def __init__(self, endpoint_name, sagemaker_session):
        super(JSONPredictor, self).__init__(endpoint_name, sagemaker_session, json_serializer, json_deserializer)

In [113]:
from sagemaker.pytorch import PyTorchModel

training_job_name = estimator.latest_training_job.name
print(training_job_name)
desc = sagemaker_session.sagemaker_client.describe_training_job(TrainingJobName=training_job_name)
trained_model_location = desc['ModelArtifacts']['S3ModelArtifacts']
print(trained_model_location)
model = PyTorchModel(model_data=trained_model_location,
                     role=role,
                     framework_version='1.5.0',
                     entry_point='api.py',
                     sagemaker_session=sagemaker_session,
                     source_dir='./RNN') #,
                     #predictor_cls=JSONPredictor)

pytorch-training-2020-09-13-23-18-00-046


Parameter image will be renamed to image_uri in SageMaker Python SDK v2.


s3://sagemaker-us-east-2-214237513994/pytorch-training-2020-09-13-23-18-00-046/output/model.tar.gz


In [None]:
predictor = model.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')

'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


--------------

In [None]:
test_review = 'The simplest pleasures in life are the best, and this film is one of them. Combining a rather basic storyline of love and adventure this movie transcends the usual weekend fair with wit and unmitigated charm.'

In [63]:
text_filtered = filter_text_list([test_review])

In [64]:
print(text_filtered)

[['simplest', 'pleasures', 'life', 'best', 'film', 'one', 'combining', 'rather', 'basic', 'storyline', 'love', 'adventure', 'movie', 'transcends', 'usual', 'weekend', 'fair', 'wit', 'unmitigated', 'charm']]


In [65]:
rawData = [test_review]

clean = re.compile(r'<.*?>')

print(rawData)

cleanr = re.compile(r"[^a-zA-Z0-9]")
stop_words = set(stopwords.words('english')) 

filtered_sentence = []

for text in rawData:
    notTagData = re.sub(clean, '', text)
    word_tokens = word_tokenize(re.sub(cleanr, ' ', notTagData.lower())) # All words in lower case
    filtered_sentence.append([w for w in word_tokens if not w in stop_words])   

print(filtered_sentence)

result = [[]]

for i, sentence in enumerate(filtered_sentence):
    sentence = re.sub("[^a-zA-Z]",  " ", str(sentence))
    result[i] = [words2index[word] if word in words2index else words2index['UNKN_'] for word in nltk.word_tokenize(sentence)]

print(result)

redefinedText = np.zeros((len(result), seq_len),dtype=int)
for ii, review in enumerate(result):
    if len(review) != 0:
        redefinedText[ii, -len(review):] = np.array(review)[:seq_len]
        
print(redefinedText)

['The simplest pleasures in life are the best, and this film is one of them. Combining a rather basic storyline of love and adventure this movie transcends the usual weekend fair with wit and unmitigated charm.']
[['simplest', 'pleasures', 'life', 'best', 'film', 'one', 'combining', 'rather', 'basic', 'storyline', 'love', 'adventure', 'movie', 'transcends', 'usual', 'weekend', 'fair', 'wit', 'unmitigated', 'charm']]
[[10734, 8560, 34, 40, 3, 4, 8085, 139, 975, 629, 38, 996, 2, 9723, 508, 2291, 1108, 2061, 20355, 1239]]
[[    0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0

In [75]:
x = [[ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 
0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0 ,0,  0,  
0,  0,  0,  0,  0,  0,  0,  0,  0,  0 ,0,  0,  0,  0,  
0,  0,  0,  0,  0,  0,  0,  0 ,0,  0,  0,  0,  0,  0, 
 0,  0,  0,  0,  0,  0 ,0,  0,  0,  0,  0,  0,  0,  0,  
 0,  0,  0,  0 ,0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 
  0,  0 ,0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0 ,0,  
  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0 ,0,  0,  0,  0, 
   0,  0,  0,  0,  0,  0,  0,  0 ,0,  0,  0,  0,  0,  0,  0,  
   0,  0,  0,  0,  0 ,0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 
    0,  0 ,0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,0,  
0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0 ,0,  0,  0,  
0,  0,  0,  0,  0,  0,  0,  0,  0,10756,8498, 34, 40,  3,  
      4,8119, 139, 975, 630, 38, 994, 2,9852, 509,2284,1108,2055, 20259,1236]]
print(len(x[0]))


200


In [66]:
text_filtered_redefined = [[]]

for i, sentence in enumerate(filtered_sentence):
    sentence = re.sub("[^a-zA-Z]",  " ", str(sentence))
    text_filtered_redefined[i] = [words2index[word] if word in words2index else words2index['UNKN_'] for word in nltk.word_tokenize(sentence)]


In [67]:
print(text_filtered_redefined)

[[10734, 8560, 34, 40, 3, 4, 8085, 139, 975, 629, 38, 996, 2, 9723, 508, 2291, 1108, 2061, 20355, 1239]]


In [72]:
r = redefineSentences(text_filtered_redefined, seq_len)
print(r.shape)
print(type(r))


(1, 200)
<class 'numpy.ndarray'>


TypeError: 'int' object is not callable

In [33]:
%load_ext autoreload

%autoreload 2

from RNN.data import Dataset

test = Dataset()

text = test.transformRawData(words2index,[test_review], seq_len)

text.shape
print(text)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
['The simplest pleasures in life are the best, and this film is one of them. Combining a rather basic storyline of love and adventure this movie transcends the usual weekend fair with wit and unmitigated charm.']
[['simplest', 'pleasures', 'life', 'best', 'film', 'one', 'combining', 'rather', 'basic', 'storyline', 'love', 'adventure', 'movie', 'transcends', 'usual', 'weekend', 'fair', 'wit', 'unmitigated', 'charm']]
[[10821, 8591, 34, 40, 3, 4, 8091, 139, 975, 629, 38, 996, 2, 9889, 509, 2280, 1108, 2061, 20105, 1234]]
[[    0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0     0     0     0     0     0     0     0     0     0     0
      0     0    

In [31]:
predictor.predict(text)

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "Requested unsupported ContentType in content_type: application/x-npy". See https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logEventViewer:group=/aws/sagemaker/Endpoints/pytorch-inference-2020-09-12-14-28-15-875 in account 214237513994 for more information.

In [68]:
#predictor.predict(test)
print(predictor.endpoint)

pytorch-inference-2020-09-07-21-58-38-757


In [None]:
import boto3
from botocore.config import Config
from sagemaker.session import Session

config = Config(
    read_timeout=800,
    retries={
        'max_attempts': 0
    }
)
runtime_client = boto3.Session().client('sagemaker-runtime', config=config)

payload = '{"text":"'+test_review+'"}'

response = runtime_client.invoke_endpoint(EndpointName=predictor.endpoint, 
                                   ContentType='application/json', 
                                   Body=payload)
result = response['Body'].read()

print(result)

In [19]:
prd = predictor.predict(X_valid)

print(prd)

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "list indices must be integers or slices, not str". See https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logEventViewer:group=/aws/sagemaker/Endpoints/pytorch-inference-2020-08-26-23-04-51-920 in account 214237513994 for more information.