Load in required libraries, below.

In [1]:
# data 
import pandas as pd 
import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

%matplotlib inline

## SageMaker Resources

The below cell stores the SageMaker session and role (for creating estimators and models), and creates a default S3 bucket. After creating this bucket, locally stored data can be uploaded to S3.

In [1]:
# sagemaker
import boto3
import sagemaker
from sagemaker import get_execution_role

In [2]:
# SageMaker session and role
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()

# default S3 bucket
bucket = sagemaker_session.default_bucket()
prefix='cnn-wendy-data'
prefix_output='cnn-wendy-model'

Here we retrieve the dataset of images and we upload it to S3


In [None]:
!wget -nc https://da-youtube-ml.s3.eu-central-1.amazonaws.com/wendy-cnn/frames/wendy_cnn_frames_data.zip
!unzip -qq -n wendy_cnn_frames_data.zip -d wendy_cnn_frames_data 

In [3]:

# upload to S3. Skip if already uploaded. This can take a while.
print('Uploading data to {}'.format(input_data))
input_data = sagemaker_session.upload_data(path='wendy_cnn_frames_data', bucket=bucket, key_prefix=prefix)
print('Data uploaded to {}'.format(input_data))

Model uploaded to s3://sagemaker-eu-central-1-283211002347/cnn-wendy-data


In [4]:
# location to input data can be written down here, if known
# input_data='s3://sagemaker-eu-central-1-283211002347/cnn-wendy-data'

After uploading images to S3, we can define and train the estimator


In [4]:
# import a PyTorch wrapper
from sagemaker.pytorch import PyTorch

# specify an output path

output_path = 's3://{}/{}'.format(bucket, prefix_output)
print('Output path for models is {}'.format(output_path))

# instantiate a pytorch estimator
estimator = PyTorch(entry_point='train.py',
                    source_dir='letsplay_classifier',
                    role=role,
                    framework_version='1.6',
                    train_instance_count=1,
                    train_instance_type='ml.p2.xlarge',
                    train_volume_size = 10,
                    output_path=output_path,
                    sagemaker_session=sagemaker_session,
                    hyperparameters={
                        'img-width': 128,
                        'img-height': 72,
                        'batch-size': 32,
                        'layer-cfg': 'D',
                        'epochs': 8
                    })

Uploading model to s3://sagemaker-eu-central-1-283211002347/cnn-wendy-model


## Train the Estimator

After instantiating the estimator, we train it with a call to `.fit()`. 

In [5]:
%%time 
# train the estimator on S3 training data
estimator.fit({'train': input_data})

'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.
'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


2020-10-20 01:50:47 Starting - Starting the training job...
2020-10-20 01:50:50 Starting - Launching requested ML instances......
2020-10-20 01:52:12 Starting - Preparing the instances for training......
2020-10-20 01:53:15 Downloading - Downloading input data....................................
2020-10-20 01:59:12 Training - Downloading the training image...
2020-10-20 01:59:33 Training - Training image download completed. Training in progress.[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2020-10-20 01:59:33,401 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2020-10-20 01:59:33,425 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2020-10-20 01:59:39,673 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2020-10-20 01:59:40,034 sagemaker-training-toolkit INFO     

In [6]:
print(estimator.model_data)
model_data = estimator.model_data
# model_data =

s3://sagemaker-eu-central-1-283211002347/cnn-wendy-model/pytorch-training-2020-10-20-01-50-47-679/output/model.tar.gz


We set up a model that can predict the class of an image

### Deploy the trained model

We deploy our model to create a predictor. We'll use this to make predictions on our data and evaluate the model.

In [7]:
# importing PyTorchModel
from sagemaker.pytorch import PyTorchModel

# Create a model from the trained estimator data
# And point to the prediction script
model = PyTorchModel(model_data=model_data,
                     role = role,
                     framework_version='1.6',
                     entry_point='predict.py',
                     source_dir='letsplay_classifier')

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.


In [8]:
%%time
# deploy and create a predictor
              
predictor = model.deploy(initial_instance_count=1, instance_type='ml.p2.xlarge')


'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


-------------------!CPU times: user 39.5 s, sys: 6.04 s, total: 45.6 s
Wall time: 10min 14s


In [11]:
# the endpoint where the predictor is located
endpoint_name = predictor.endpoint


Now that the model is deployed, we check how the predictor performs on our full dataset,
ensuring that the predictions make sense.


In [12]:
print(endpoint_name)

pytorch-inference-2020-10-20-02-20-28-656


In [13]:
#endpoint_name='pytorch-inference-2020-10-20-02-20-28-656'

from sklearn.metrics import classification_report
from letsplay_classifier.endpoint import evaluate
y_true, y_pred = evaluate(endpoint_name, 'wendy_cnn_frames_data', 0.1)
report = classification_report(y_true=y_true, y_pred=y_pred)
print(report)

tensor([0])
tensor([[ 4.9770, -0.2632,  0.8564,  2.3913, -3.3477, -1.9263, -2.0825, -0.3487]])
tensor([[ 5.6273, -2.2510,  1.2142, -0.9868, -2.4720, -0.1441, -0.7309,  0.1467]])
tensor([[ 3.2973, -1.5174,  0.9901,  0.9161, -1.1704, -1.1699, -0.9104, -0.4632]])
tensor([[ 3.8077, -2.1421,  1.2140, -1.3024,  1.4097, -1.9534,  0.0448, -0.9545]])
tensor([[ 6.8445, -2.5595,  2.8151, -2.6666, -4.0882, -2.4108,  3.6524, -1.1497]])
tensor([[ 4.7674, -2.1832,  1.2508, -0.9580, -1.9708,  0.0258, -0.6753,  0.2292]])
tensor([[ 4.0341, -2.7485,  0.9825, -1.4354, -1.2906, -0.2366,  1.0813,  0.0326]])
tensor([[ 6.2370, -3.3291,  2.8694, -0.8198, -3.9561, -1.7213,  2.1996, -0.7584]])
tensor([[ 5.9262, -1.1719,  2.4512, -0.1314, -3.4442, -1.7409, -1.2321, -0.5522]])
tensor([[ 7.3244, -0.1036,  1.4149,  0.2778, -1.9335, -2.8178, -3.3480, -0.7711]])
tensor([[ 6.6896, -2.6056,  1.5058, -0.9973, -2.7607, -0.3219, -1.0526,  0.0910]])
tensor([[ 4.1942, -1.1235,  1.6363, -0.1398, -3.5385, -2.1461,  1.9984, -0.

NameError: name 'image_total' is not defined

## Delete the Endpoint

Finally, I've add a convenience function to delete prediction endpoints after we're done with them. And if you're done evaluating the model, you should delete your model endpoint!

In [24]:
# Accepts a predictor endpoint as input
# And deletes the endpoint by name
def delete_endpoint(predictor):
        try:
            boto3.client('sagemaker').delete_endpoint(EndpointName=endpoint_name)
            print('Deleted {}'.format(predictor.endpoint))
        except:
            print('Already deleted: {}'.format(predictor.endpoint))

In [25]:
# delete the predictor endpoint 
delete_endpoint(predictor)

Deleted pytorch-inference-2020-10-19-12-53-25-499
