Load in required libraries, below.

In [1]:
# data 
import pandas as pd 
import numpy as np
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

%matplotlib inline

## SageMaker Resources

The below cell stores the SageMaker session and role (for creating estimators and models), and creates a default S3 bucket. After creating this bucket, locally stored data can be uploaded to S3.

In [2]:
# sagemaker
import boto3
import sagemaker
from sagemaker import get_execution_role

In [3]:
# SageMaker session and role
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()

# default S3 bucket
bucket = sagemaker_session.default_bucket()
prefix='cnn-wendy-data'
prefix_output='cnn-wendy-model'

Here we retrieve the dataset of images and we upload it to S3


In [None]:
!wget -nc https://da-youtube-ml.s3.eu-central-1.amazonaws.com/wendy-cnn/frames/wendy_cnn_frames_data.zip
!unzip -qq -n wendy_cnn_frames_data.zip -d wendy_cnn_frames_data 

In [3]:

# upload to S3. Skip if already uploaded. This can take a while.
print('Uploading data to {}'.format(input_data))
input_data = sagemaker_session.upload_data(path='wendy_cnn_frames_data', bucket=bucket, key_prefix=prefix)
print('Data uploaded to {}'.format(input_data))

Model uploaded to s3://sagemaker-eu-central-1-283211002347/cnn-wendy-data


In [4]:
# location to input data can be written down here, if known
input_data='s3://sagemaker-eu-central-1-283211002347/cnn-wendy-data'

After uploading images to S3, we can define and train the estimator


In [8]:
# import a PyTorch wrapper
from sagemaker.pytorch import PyTorch

# specify an output path

output_path = 's3://{}/{}'.format(bucket, prefix_output)
print('Output path for models is {}'.format(output_path))

# instantiate a pytorch estimator
estimator = PyTorch(entry_point='train.py',
                    source_dir='letsplay_classifier',
                    role=role,
                    framework_version='1.6',
                    train_instance_count=1,
                    train_instance_type='ml.p2.xlarge',
                    train_volume_size = 10,
                    output_path=output_path,
                    sagemaker_session=sagemaker_session,
                    hyperparameters={
                        'img-width': 128,
                        'img-height': 72,
                        'batch-size': 32,
                        'layer-cfg': 'D',
                        'epochs': 8
                    })

Output path for models is s3://sagemaker-eu-central-1-283211002347/cnn-wendy-model


## Train the Estimator

After instantiating the estimator, we train it with a call to `.fit()`. 

In [9]:
%%time 
# train the estimator on S3 training data
estimator.fit({'train': input_data})

'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.
'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


2020-10-21 01:16:53 Starting - Starting the training job...
2020-10-21 01:16:55 Starting - Launching requested ML instances......
2020-10-21 01:18:01 Starting - Preparing the instances for training.........
2020-10-21 01:19:27 Downloading - Downloading input data....................................
2020-10-21 01:25:53 Training - Downloading the training image...
2020-10-21 01:26:14 Training - Training image download completed. Training in progress.[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2020-10-21 01:26:15,089 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2020-10-21 01:26:15,116 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2020-10-21 01:26:15,741 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2020-10-21 01:26:16,162 sagemaker-training-toolkit INFO  

In [10]:
print(estimator.model_data)
model_data = estimator.model_data
# model_data =

s3://sagemaker-eu-central-1-283211002347/cnn-wendy-model/pytorch-training-2020-10-21-01-16-52-935/output/model.tar.gz


We set up a model that can predict the class of an image

### Deploy the trained model

We deploy our model to create a predictor. We'll use this to make predictions on our data and evaluate the model.

In [11]:
# importing PyTorchModel
from sagemaker.pytorch import PyTorchModel

# Create a model from the trained estimator data
# And point to the prediction script
model = PyTorchModel(model_data=model_data,
                     role = role,
                     framework_version='1.6',
                     entry_point='predict.py',
                     source_dir='letsplay_classifier')

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.


In [12]:
%%time
# deploy and create a predictor
              
predictor = model.deploy(initial_instance_count=1, instance_type='ml.p2.xlarge')


'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


-------------------!CPU times: user 39.5 s, sys: 5.32 s, total: 44.9 s
Wall time: 10min 17s


In [13]:
# the endpoint where the predictor is located
endpoint_name = predictor.endpoint


Now that the model is deployed, we check how the predictor performs on our full dataset,
ensuring that the predictions make sense. We produce a classification report.


In [14]:
print(endpoint_name)

pytorch-inference-2020-10-21-03-11-03-490


In [15]:
#endpoint_name='pytorch-inference-2020-10-20-02-20-28-656'

from sklearn.metrics import classification_report
from letsplay_classifier.endpoint import evaluate
y_true, y_pred = evaluate(endpoint_name, 'wendy_cnn_frames_data', 0.1)
report = classification_report(y_true=y_true, y_pred=y_pred)
print(report)

50 processed up to 251
100 processed up to 492
150 processed up to 697
200 processed up to 1001
250 processed up to 1246
300 processed up to 1527
350 processed up to 1769
400 processed up to 2006
450 processed up to 2349
500 processed up to 2521
550 processed up to 2833
600 processed up to 3081
650 processed up to 3290
700 processed up to 3533
750 processed up to 3721
800 processed up to 3934
850 processed up to 4144
900 processed up to 4334
950 processed up to 4566
1000 processed up to 4856
1050 processed up to 5146
1100 processed up to 5353
1150 processed up to 5610
1200 processed up to 5828
1250 processed up to 6059
1300 processed up to 6279
1350 processed up to 6451
1400 processed up to 6720
1450 processed up to 6917
1500 processed up to 7201
1550 processed up to 7439
1600 processed up to 7681
1650 processed up to 8058
1700 processed up to 8344
1750 processed up to 8617
1800 processed up to 8941
1850 processed up to 9207
1900 processed up to 9390
1950 processed up to 9667
2000 proc

## Delete the Endpoint

Finally, I've added a convenience function to delete prediction endpoints after we're done with them. 

In [16]:
# Accepts a predictor endpoint as input
# And deletes the endpoint by name
def delete_endpoint(predictor):
        try:
            boto3.client('sagemaker').delete_endpoint(EndpointName=endpoint_name)
            print('Deleted {}'.format(predictor.endpoint))
        except:
            print('Already deleted: {}'.format(predictor.endpoint))

In [17]:
# delete the predictor endpoint 
delete_endpoint(predictor)

Already deleted: pytorch-inference-2020-10-21-03-11-03-490
