Load in required libraries, below.

In [1]:
# data 
import numpy as np
from sklearn.model_selection import train_test_split


## SageMaker Resources

The below cell stores the SageMaker session and role (for creating estimators and models), and creates a default S3 bucket. After creating this bucket, locally stored data can be uploaded to S3.

In [2]:
# sagemaker
import boto3
import sagemaker
from sagemaker import get_execution_role

In [3]:
# SageMaker session and role
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()

# default S3 bucket
bucket = sagemaker_session.default_bucket()
prefix='cnn-wendy-data-5'
prefix_output='cnn-wendy-model-5'

Here we retrieve the dataset of images and we upload it to S3


In [7]:
!wget -nc https://da-youtube-ml.s3.eu-central-1.amazonaws.com/wendy-cnn/frames/wendy_cnn_frames_data_5.zip
!unzip -qq -n wendy_cnn_frames_data_5.zip -d wendy_cnn_frames_data_5 

--2020-10-31 03:14:09--  https://da-youtube-ml.s3.eu-central-1.amazonaws.com/wendy-cnn/frames/wendy_cnn_frames_data_5.zip
Resolving da-youtube-ml.s3.eu-central-1.amazonaws.com (da-youtube-ml.s3.eu-central-1.amazonaws.com)... 52.219.74.21
Connecting to da-youtube-ml.s3.eu-central-1.amazonaws.com (da-youtube-ml.s3.eu-central-1.amazonaws.com)|52.219.74.21|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1219145869 (1.1G) [application/zip]
Saving to: ‘wendy_cnn_frames_data_5.zip’


2020-10-31 03:14:25 (73.3 MB/s) - ‘wendy_cnn_frames_data_5.zip’ saved [1219145869/1219145869]



In [8]:

# upload to S3. Skip if already uploaded. This can take a while.
print('Uploading data to {}'.format(prefix))
input_data = sagemaker_session.upload_data(path='wendy_cnn_frames_data_5', bucket=bucket, key_prefix=prefix)
print('Data uploaded to {}'.format(input_data))

Uploading data to cnn-wendy-data-5
Data uploaded to s3://sagemaker-eu-central-1-283211002347/cnn-wendy-data-5


In [9]:
# location to input data can be written down here, if known
#input_data='s3://sagemaker-eu-central-1-283211002347/cnn-wendy-data-5'

After uploading images to S3, we can define and train the estimator


In [10]:
# import a PyTorch wrapper
from sagemaker.pytorch import PyTorch

# specify an output path

output_path = 's3://{}/{}'.format(bucket, prefix_output)
print('Output path for models is {}'.format(output_path))

# instantiate a pytorch estimator
estimator = PyTorch(entry_point='train.py',
                    source_dir='letsplay_classifier',
                    role=role,
                    framework_version='1.6',
                    train_instance_count=1,
                    train_instance_type='ml.p2.xlarge',
                    output_path=output_path,
                    sagemaker_session=sagemaker_session,
                    hyperparameters={
                        'img-width': 320,
                        'img-height': 180,
                        'batch-size': 16,
                        'layer-cfg': 'B',
                        'epochs': 6
                    })

Output path for models is s3://sagemaker-eu-central-1-283211002347/cnn-wendy-model-5


## Train the Estimator

After instantiating the estimator, we train it with a call to `.fit()`. 

In [11]:
%%time 
# train the estimator on S3 training data
estimator.fit({'train': input_data})

'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.
'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


2020-10-31 03:59:53 Starting - Starting the training job...
2020-10-31 03:59:55 Starting - Launching requested ML instances......
2020-10-31 04:01:06 Starting - Preparing the instances for training.........
2020-10-31 04:02:28 Downloading - Downloading input data.................................
2020-10-31 04:08:23 Training - Training image download completed. Training in progress..[34mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[34mbash: no job control in this shell[0m
[34m2020-10-31 04:08:24,631 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[34m2020-10-31 04:08:24,655 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[34m2020-10-31 04:08:30,894 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[34m2020-10-31 04:08:31,400 sagemaker-training-toolkit INFO     Installing dependencies from requirements.txt:[0m
[34m/opt/con

In [12]:
print(estimator.model_data)
model_data = estimator.model_data
#model_data = 's3://sagemaker-eu-central-1-283211002347/cnn-wendy-model-2b/pytorch-training-2020-10-26-00-49-31-414/output/model.tar.gz'

s3://sagemaker-eu-central-1-283211002347/cnn-wendy-model-5/pytorch-training-2020-10-31-03-59-53-628/output/model.tar.gz


In [14]:
print(estimator.model_data)
model_data = estimator.model_data

s3://sagemaker-eu-central-1-283211002347/cnn-wendy-model-5/pytorch-training-2020-10-31-03-59-53-628/output/model.tar.gz


We set up a model that can predict the class of an image

### Deploy the trained model

We deploy our model to create a predictor. We'll use this to make predictions on our data and evaluate the model.

In [15]:
# importing PyTorchModel
from sagemaker.pytorch import PyTorchModel

# Create a model from the trained estimator data
# And point to the prediction script
model = PyTorchModel(model_data=model_data,
                     role = role,
                     framework_version='1.6',
                     entry_point='predict.py',
                     source_dir='letsplay_classifier')

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.


In [16]:
%%time
# deploy and create a predictor
              
predictor = model.deploy(initial_instance_count=1, instance_type='ml.p2.xlarge')


'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.


-------------------!CPU times: user 38.8 s, sys: 4.6 s, total: 43.4 s
Wall time: 10min 14s


In [17]:
# the endpoint where the predictor is located
endpoint_name = predictor.endpoint
print(endpoint_name)

pytorch-inference-2020-10-31-06-46-05-193


from letsplay_classifier.endpoint import evalaute

In [18]:
from letsplay_classifier.endpoint import evaluate

y_true,  y_pred = evaluate(predictor, 'wendy_cnn_frames_data_5')

5000 processed up to 2500
10000 processed up to 5000
15000 processed up to 7500
20000 processed up to 10000
25000 processed up to 12500
30000 processed up to 15000
35000 processed up to 17500
40000 processed up to 20000
45000 processed up to 22500
50000 processed up to 25000
55000 processed up to 27500
60000 processed up to 30000
65000 processed up to 32500
70000 processed up to 35000
75000 processed up to 37500
80000 processed up to 40000
85000 processed up to 42500
90000 processed up to 45000
95000 processed up to 47500
100000 processed up to 50000


Now that the model is deployed, we check how the predictor performs on our full dataset,
ensuring that the predictions make sense. We produce a classification report.


In [1]:

#endpoint_name='pytorch-inference-2020-10-26-04-51-38-837'

In [19]:

from sklearn.metrics import classification_report

In [20]:

from sklearn.metrics import classification_report
report = classification_report(y_true=y_true, y_pred=y_pred)
print(report)

              precision    recall  f1-score   support

           0       0.98      0.99      0.98      7198
           1       0.98      0.98      0.98      1163
           2       0.99      0.99      0.99     35425
           3       0.96      0.99      0.98       634
           4       0.98      0.97      0.98      6796

    accuracy                           0.99     51216
   macro avg       0.98      0.98      0.98     51216
weighted avg       0.99      0.99      0.99     51216



In [21]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_true, y_pred)

array([[ 7110,    11,    56,     4,    17],
       [    2,  1140,     4,    17,     0],
       [  167,    11, 35155,     1,    91],
       [    0,     3,     2,   628,     1],
       [    7,     3,   187,     1,  6598]])

In [22]:
!wget -nc https://da-youtube-ml.s3.eu-central-1.amazonaws.com/wendy-cnn/frames/wendy_cnn_frames_E69.zip
!unzip -qq -n wendy_cnn_frames_E69.zip -d wendy_cnn_frames_E69

File ‘wendy_cnn_frames_E69.zip’ already there; not retrieving.



In [23]:
from letsplay_classifier.interval.predict_intervals_endpoint import evaluate
evaluate(predictor, 'wendy_cnn_frames_E69/E69', class_names= ['Battle', 'Hideout', 'Other', 'Siege', 'Tournament'])

01:44-03:10 | Battle : 92% , Other : 7% 
06:12-06:14 | ????? 
19:50-21:56 | Battle : 91% , Other : 7% 
35:42-35:44 | ????? 
36:50-38:10 | Battle : 90% , Other : 9% 
41:10-41:42 | Battle : 77% , Other : 12% 
52:30-52:34 | Other : 85% , Tournament : 15% 


## Delete the Endpoint

Finally, I've added a convenience function to delete prediction endpoints after we're done with them. 

In [17]:
# Accepts a predictor endpoint as input
# And deletes the endpoint by name
def delete_endpoint(predictor):
        try:
            boto3.client('sagemaker').delete_endpoint(EndpointName=endpoint_name)
            print('Deleted {}'.format(predictor.endpoint))
        except:
            print('Already deleted: {}'.format(predictor.endpoint))

In [18]:
# delete the predictor endpoint 
delete_endpoint(predictor)

Already deleted: pytorch-inference-2020-10-30-23-46-17-619
