<div style="text-align: right"> &uarr;   Ensure Kernel is set to  &uarr;  </div><br><div style="text-align: right"> 
conda_python3  </div>

# PyTorch Estimator Bring your own Script

In this notebook we will go through and run a PyTorch model to classify the junctions as priority, signal and roundabout as seen in data prep.

The outline of this notebook is 

1. to prepare a training script (provided).

2. use the AWS provided PyTorch container and provide our script to it.

3. Run training.

4. deploy model to end point.

5. Test using an image in couple of possible ways 

Upgrade Sagemaker so we can access the latest containers

In [None]:
!pip install -U sagemaker

Next we will import the libraries and set up the initial variables we will be using in this lab

In [1]:
import os
import sagemaker
import numpy as np
from sagemaker.pytorch import PyTorch

ON_SAGEMAKER_NOTEBOOK = False

sagemaker_session = sagemaker.Session()
if ON_SAGEMAKER_NOTEBOOK:
    role = sagemaker.get_execution_role()
else:
    role = "arn:aws:iam::099295524168:role/service-role/AmazonSageMaker-ExecutionRole-20220209T134488"

import boto3
client = boto3.client('sagemaker-runtime')



In the cell below, replace **your-unique-bucket-name** with the name of bucket you created in the data-prep notebook

In [2]:
bucket = "dxhub-svvsd-labeled-images"
# key = "data-folder"   (in case you structure your data as your-bucket/data-folder) 
training_data_uri="s3://{}".format(bucket)

### PyTorch Estimator

Use AWS provided open source containers, these containers can be extended by starting with the image provided by AWS and the add additional installs in dockerfile

or you can use requirements.txt in source_dir to install additional libraries.

Below code is for PyTorch


In [3]:
estimator = PyTorch(entry_point='ptModelCode.py',
                    role=role,
                    framework_version='1.8',
                    instance_count=1,
                    instance_type='ml.m5.2xlarge',
                    py_version='py3',
                    # available hyperparameters: emsize, nhid, nlayers, lr, clip, epochs, batch_size,
                    #                            bptt, dropout, tied, seed, log_interval
                    )

Now we call the estimators fit method with the URI location of the training data to start the training <br>
**Note:** This cell takes approximately **20 mins** to run

In [None]:
%%time
estimator.fit(training_data_uri)

2022-10-13 18:45:01 Starting - Starting the training job...
2022-10-13 18:45:30 Starting - Preparing the instances for trainingProfilerReport-1665686701: InProgress
.........
2022-10-13 18:46:58 Downloading - Downloading input data......
2022-10-13 18:48:03 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2022-10-13 18:48:05,463 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2022-10-13 18:48:05,465 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[34m2022-10-13 18:48:05,475 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2022-10-13 18:48:05,482 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2022-10-13 18:48:05,984 sagemaker-training-toolkit INFO     No GPUs

## **NOTE:** <br>
If at this point your kernel disconnects from the server (you can tell because the kernel in the top right hand corner will say **No Kernel**),<br>you can reattach to the training job (so you dont to start the training job again).<br>Follow the steps below
1. Scoll your notebook to the top and set the kernel to the recommended kernel specified in the top right hand corner of the notebook
2. Go to your SageMaker console, Go to Training Jobs and copy the name of the training job you were disconnected from
3. Scoll to the bottom of this notebook, paste your training job name to replace the **your-training-job-name** in the cell
4. Replace **your-unique-bucket-name** with the name of bucket you created in the data-prep notebook
5. Run the edited cell
6. Return to this cell and continue executing the rest of this notebook

We can call the model_data method on the estimator to find the location of the trained model artifacts

In [11]:
estimator.model_data
latest_model = estimator.model_data

In [13]:
latest_model

's3://sagemaker-us-west-2-099295524168/pytorch-training-2022-10-13-18-45-00-645/output/model.tar.gz'

#### Deploying a model
Once trained, deploying a model is a simple call.

**Note:** Replace the **'your_model_uri'** with the URI from the cell above

In [21]:
from sagemaker.pytorch import PyTorchModel
pytorch_model = PyTorchModel(model_data=latest_model, 
                             role=role, 
                             entry_point='ptInfCode.py', 
                             framework_version='1.7',
                             py_version='py3')
predictor = pytorch_model.deploy(instance_type='ml.m5.2xlarge', initial_instance_count=1)

-----!

Now lets get the endpoint name from predictor

In [22]:
print(predictor.endpoint_name)

pytorch-inference-2022-10-13-22-53-20-516


Now that our endpoint is up and running, lets test it with all of our unseen images and see how well it does


In [18]:
s3_client = boto3.client('s3')
test_files=[]
response = s3_client.list_objects_v2(
    Bucket=bucket,
    Prefix='test'
)
for item in response['Contents']:
    test_files.append(item['Key'])




In [23]:
import io
import json
import tempfile
import pandas as pd


s3 = boto3.resource('s3', region_name='us-west-2')
s3_bucket = s3.Bucket(bucket)

endpoint_name = predictor.endpoint_name

# image category, fight probabily, no fight probability
inference_data = []

for file_object in test_files:
    #print(file_object)
    object = s3_bucket.Object(file_object)

    tmp = tempfile.NamedTemporaryFile()

    with open(tmp.name, 'wb') as f:
        object.download_fileobj(f)
    
    # whatever you need to do
    response = client.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType='application/x-image',
        Body=open(tmp.name, 'rb').read())
    result = json.loads(response['Body'].read().decode("utf-8"))
    inf_data_row = [file_object.split('/')[1], result[0]['Fight'], result[0]['No Fight']]
    inference_data.append(inf_data_row)

df = pd.DataFrame(inference_data, columns=['Category','FProb','NoFProb'])

# clean up inference instance
predictor.delete_endpoint()



In [None]:
df.info()

Now let us view the JSON response

In [24]:
POS_THRESHHOLD = 0.213
NEG_THRESHHOLD = 0.735

# convert probabilities to float
df['FProb'] = pd.to_numeric(df['FProb'], errors='coerce')
df['NoFProb'] = pd.to_numeric(df['NoFProb'], errors='coerce')

# separate frames into fight and no labeled photos
fight_fl = df['Category']=='Fight'
no_fight_fl = df['Category']=='No Fight'

fight_df = df[fight_fl]
no_fight_df = df[no_fight_fl]

fight_detected_fl = fight_df['FProb']>=POS_THRESHHOLD
fight_detected_df = fight_df[fight_detected_fl]

no_fight_detected_fl = no_fight_df['NoFProb']>=NEG_THRESHHOLD
no_fight_detected_df = no_fight_df[no_fight_detected_fl]

true_positive = fight_detected_df['Category'].count()
true_negative = no_fight_detected_df['Category'].count()
false_positive = no_fight_df['Category'].count()-true_negative
false_negative = fight_df['Category'].count()-true_positive

print("Labeled fights:", fight_df['Category'].count(), "Labeled No Fights:", no_fight_df['Category'].count())
print("True Positive:",true_positive,"True Negative:",true_negative, "False Negative:",false_negative, "False Positive:",false_positive)

print("Precision:",true_positive/(true_positive+false_positive))
print("Recall:",true_positive/(true_positive+false_negative))
print("Accuracy:",(true_positive+true_negative)/(df['Category'].count()))

Labeled fights: 172 Labeled No Fights: 600
True Positive: 121 True Negative: 556 False Negative: 51 False Positive: 44
Precision: 0.7333333333333333
Recall: 0.7034883720930233
Accuracy: 0.8769430051813472


In [None]:
pd.set_option('display.max_rows', 226)
no_fight_df
no_fight_detected_df.count()
fight_df

### Clean up

When we're done with the endpoint, we can just delete it and the backing instances will be released.  Run the following cell to delete the endpoint.

### Attach to a training job that has been left to run 

If your kernel becomes disconnected and your training has already started, you can reattach to the training job.<br>
In the cell below, replace **your-unique-bucket-name** with the name of bucket you created in the data-prep notebook<br>
Simply look up the training job name and replace the **your-training-job-name** and then run the cell below. <br>
Once the training job is finished, you can continue the cells after the training cell

In [None]:
import sagemaker
import boto3
from sagemaker.pytorch import PyTorch

sess = sagemaker.Session()
role = sagemaker.get_execution_role()
client = boto3.client('sagemaker-runtime')

bucket = "your-unique-bucket-name"

training_job_name = 'your-training-job-name'

if 'your-training' not in training_job_name:
    estimator = sagemaker.estimator.Estimator.attach(training_job_name=training_job_name, sagemaker_session=sess)