# Hello! Welcome to Carpl training workspace

## Contents

1. [Background](#Background)
1. [Requirements](#Requirements)
1. [Setup](#Setup)
1. [Data](#Data)
1. [Train](#Train)
1. [Host](#Host)

---

## Background
In this training workspace you will get access to S3 bucket which contains the dataset and annotations you selected on Carpl platform. 
You can fetch that data in your current workspace to begin. We have provided template code to make your training experience better. 


/code folder contains script.py mnist.py<br>
    script.py contains Network architecture, Data loaders, training code, testing code<br>


It is also entrypoint for model deployment<br>
1. input_fn : write custom input function to preprocess data from API 
2. model_fn : loads model into memory
3. predict_fn : runs inferencing on resultant of input_fn
4. output_fn  : write custom post processing to send data back to Carpl


For more information about the PyTorch in SageMaker, please visit [sagemaker-pytorch-containers](https://github.com/aws/sagemaker-pytorch-containers) and [sagemaker-python-sdk](https://github.com/aws/sagemaker-python-sdk) github repositories.




![SAGE.drawio](SAGE.drawio.png)
---

## Setup

_This notebook was created and tested on an ml.m4.xlarge notebook instance._

Let's start by creating a SageMaker session and specifying:

- The S3 bucket and prefix that you want to use for training and model data.  This should be within the same region as the Notebook Instance, training, and hosting.
- The IAM role arn used to give training and hosting access to your data. See the documentation for how to create these.  Note, if more than one role is required for notebook instances, training, and/or hosting, please replace the `sagemaker.get_execution_role()` with a the appropriate full IAM role arn string(s).


## Requirements

Install all your requirements for training model here.

In [9]:
!pip install --upgrade sagemaker==2.110.0

[0m

In [2]:
!yes | pip uninstall torchvison
!pip install -qU torchvision
!pip install pillow
!pip install requests

[0myes: standard output: Broken pipe
[0m

# Training using PyTorch

In [58]:
import sagemaker

sagemaker_session = sagemaker.Session()

bucket = sagemaker_session.default_bucket()
prefix = "sagemaker/DEMO-pytorch-mnist"

role = sagemaker.get_execution_role()

In [64]:
import json
metadata = json.load(open("/opt/ml/metadata/resource-metadata.json","r"))
bucket = metadata["UserProfileName"]
bucket

'carpl-uat-sagemaker-domain'

In [59]:
bucket

'sagemaker-ap-south-1-023180687239'

## Data
### Getting the data



In [None]:
from boto3 import client

conn = client('s3')  # again assumes boto.cfg setup, assume AWS S3
for key in conn.list_objects(Bucket=bucket)['Contents']:
    print(key['Key'])

In [1]:
from torchvision.datasets import MNIST
from torchvision import transforms

## Load your data here

ModuleNotFoundError: No module named 'torchvision'

### Uploading the data to S3
We are going to use the `sagemaker.Session.upload_data` function to upload our datasets to an S3 location. The return value inputs identifies the location -- we will use later when we start the training job.


In [7]:
inputs = sagemaker_session.upload_data(path="data", bucket=bucket, key_prefix=prefix)
print("input spec (in this case, just an S3 path): {}".format(inputs))

input spec (in this case, just an S3 path): s3://sagemaker-ap-south-1-023180687239/sagemaker/DEMO-pytorch-mnist


## Train
### Training script
The `script.py` script provides all the code we need for training and hosting a SageMaker model (`model_fn` function to load a model).
The training script is very similar to a training script you might run outside of SageMaker, but you can access useful properties about the training environment through various environment variables, such as:

* `SM_MODEL_DIR`: A string representing the path to the directory to write model artifacts to.
  These artifacts are uploaded to S3 for model hosting.
* `SM_NUM_GPUS`: The number of gpus available in the current container.
* `SM_CURRENT_HOST`: The name of the current container on the container network.
* `SM_HOSTS`: JSON encoded list containing all the hosts .

Supposing one input channel, 'training', was used in the call to the PyTorch estimator's `fit()` method, the following will be set, following the format `SM_CHANNEL_[channel_name]`:

* `SM_CHANNEL_TRAINING`: A string representing the path to the directory containing data in the 'training' channel.

For more information about training environment variables, please visit [SageMaker Containers](https://github.com/aws/sagemaker-containers).

A typical training script loads data from the input channels, configures training with hyperparameters, trains a model, and saves a model to `model_dir` so that it can be hosted later. Hyperparameters are passed to your script as arguments and can be retrieved with an `argparse.ArgumentParser` instance.

Because the SageMaker imports the training script, you should put your training code in a main guard (``if __name__=='__main__':``) if you are using the same script to host your model as we do in this example, so that SageMaker does not inadvertently run your training code at the wrong point in execution.

For example, the script run by this notebook:

In [2]:
!pygmentize code/script.py

[34mimport[39;49;00m [04m[36margparse[39;49;00m
[34mimport[39;49;00m [04m[36mjson[39;49;00m
[34mimport[39;49;00m [04m[36mlogging[39;49;00m
[34mimport[39;49;00m [04m[36mos[39;49;00m
[34mimport[39;49;00m [04m[36msys[39;49;00m
[37m#import sagemaker_containers[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m[04m[36m.[39;49;00m[04m[36mdistributed[39;49;00m [34mas[39;49;00m [04m[36mdist[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m[04m[36m.[39;49;00m[04m[36mnn[39;49;00m [34mas[39;49;00m [04m[36mnn[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m[04m[36m.[39;49;00m[04m[36mnn[39;49;00m[04m[36m.[39;49;00m[04m[36mfunctional[39;49;00m [34mas[39;49;00m [04m[36mF[39;49;00m
[34mimport[39;49;00m [04m[36mtorch[39;49;00m[04m[36m.[39;49;00m[04m[36moptim[39;49;00m [34mas[39;49;00m [04m[36moptim[39;49;00m
[34mimport[39;49;00m [04m[36mto

### Run training in SageMaker

The `PyTorch` class allows us to run our training function as a training job on SageMaker infrastructure. We need to configure it with our training script, an IAM role, the number of training instances, the training instance type, and hyperparameters. In this case we are going to run our training job on 2 ```ml.c4.xlarge``` instances. But this example can be ran on one or multiple, cpu or gpu instances ([full list of available instances](https://aws.amazon.com/sagemaker/pricing/instance-types/)). The hyperparameters parameter is a dict of values that will be passed to your training script -- you can see how to access these values in the `mnist.py` script above.


In [10]:
from sagemaker.pytorch import PyTorch

In [11]:
!pip show sagemaker | grep Version

Version: 2.110.0


In [12]:
from sagemaker.pytorch import PyTorch

estimator = PyTorch(
    entry_point="mnist.py",
    role=role,
    py_version="py38",
    framework_version="1.11.0",
    instance_count=2,
    instance_type="ml.c5.2xlarge",
    hyperparameters={"epochs": 1, "backend": "gloo"},
    dependencies=['code/requirements.txt'],
    source_dir = "code",
)

After we've constructed our `PyTorch` object, we can fit it using the data we uploaded to S3. SageMaker makes sure our data is available in the local filesystem, so our training script can simply read the data from disk.


In [13]:
estimator.fit({"training": inputs})

2022-10-11 07:14:12 Starting - Starting the training job...
2022-10-11 07:14:36 Starting - Preparing the instances for trainingProfilerReport-1665472452: InProgress
......
2022-10-11 07:15:36 Downloading - Downloading input data...
2022-10-11 07:15:56 Training - Downloading the training image...
2022-10-11 07:16:43 Training - Training image download completed. Training in progress..[35mbash: cannot set terminal process group (-1): Inappropriate ioctl for device[0m
[35mbash: no job control in this shell[0m
[35m2022-10-11 07:16:44,908 sagemaker-training-toolkit INFO     Imported framework sagemaker_pytorch_container.training[0m
[35m2022-10-11 07:16:44,910 sagemaker-training-toolkit INFO     No GPUs detected (normal if no gpus installed)[0m
[35m2022-10-11 07:16:44,916 sagemaker_pytorch_container.training INFO     Block until all host DNS lookups succeed.[0m
[35m2022-10-11 07:16:44,922 sagemaker_pytorch_container.training INFO     Invoking user training script.[0m
[35m2022-10-

## Host
### Create endpoint
After training, we use the `PyTorch` estimator object to build and deploy a `PyTorchPredictor`. This creates a Sagemaker Endpoint -- a hosted prediction service that we can use to perform inference.

As mentioned above we have implementation of `model_fn` in the `mnist.py` script that is required. We are going to use default implementations of `input_fn`, `predict_fn`, `output_fn` and `transform_fm` defined in [sagemaker-pytorch-containers](https://github.com/aws/sagemaker-pytorch-containers).

The arguments to the deploy function allow us to set the number and type of instances that will be used for the Endpoint. These do not need to be the same as the values we used for the training job. For example, you can train a model on a set of GPU-based instances, and then deploy the Endpoint to a fleet of CPU-based instances, but you need to make sure that you return or save your model as a cpu model similar to what we did in `mnist.py`. Here we will deploy the model to a single ```ml.m4.xlarge``` instance.

In [14]:
estimator.__dict__


{'framework_version': '1.11.0',
 'py_version': 'py38',
 'role': 'arn:aws:iam::023180687239:role/service-role/AmazonSageMaker-ExecutionRole-20220906T142944',
 'instance_count': 2,
 'instance_type': 'ml.c5.2xlarge',
 'keep_alive_period_in_seconds': None,
 'instance_groups': None,
 'volume_size': 30,
 'volume_kms_key': None,
 'max_run': 86400,
 'input_mode': 'File',
 'metric_definitions': None,
 'model_uri': None,
 'model_channel_name': 'model',
 'code_uri': None,
 'code_channel_name': 'code',
 'source_dir': 'code',
 'git_config': None,
 'container_log_level': 20,
 '_hyperparameters': {'epochs': 1,
  'backend': 'gloo',
  'sagemaker_submit_directory': 's3://sagemaker-ap-south-1-023180687239/pytorch-training-2022-10-11-07-14-12-085/source/sourcedir.tar.gz',
  'sagemaker_program': 'mnist.py',
  'sagemaker_container_log_level': 20,
  'sagemaker_job_name': 'pytorch-training-2022-10-11-07-14-12-085',
  'sagemaker_region': 'ap-south-1'},
 'code_location': None,
 'entry_point': 'mnist.py',
 'depe

In [22]:
predictor = estimator.deploy(initial_instance_count=1, instance_type="ml.c5.2xlarge")

-----!

### Evaluate

You can use the test images to evalute the endpoint. The accuracy of the model depends on how many it is trained. 

### Cleanup

After you have finished with this example, remember to delete the prediction endpoint to release the instance(s) associated with it

In [None]:
sagemaker_session.delete_endpoint(endpoint_name=predictor.endpoint_name)

In [24]:
predictor.endpoint_name 

'pytorch-training-2022-10-11-08-07-06-661'

In [4]:
data_url = "" #add url of sample test data 

In [43]:
import boto3

client = boto3.client('sagemaker-runtime')

custom_attributes = "c000b4f9-df62-4c85-a0bf-7c525f9104a4"  # An example of a trace ID.
endpoint_name = predictor.endpoint_name                                        # Your endpoint name.
content_type = "application/json"                                        # The MIME type of the input data in the request body.
accept = "application/json"                                              # The desired MIME type of the inference in the response.
payload = json.dumps({"url":data_url})                                           # Payload for inference.
response = client.invoke_endpoint(
    EndpointName=endpoint_name, 
    CustomAttributes=custom_attributes, 
    ContentType=content_type,
    Accept=accept,
    Body=payload
    )

print(response)   

{'ResponseMetadata': {'RequestId': '0957ab6d-569b-477a-aef0-7481a935eecb', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '0957ab6d-569b-477a-aef0-7481a935eecb', 'x-amzn-invoked-production-variant': 'AllTraffic', 'date': 'Tue, 11 Oct 2022 08:23:34 GMT', 'content-type': 'application/json', 'content-length': '207'}, 'RetryAttempts': 0}, 'ContentType': 'application/json', 'InvokedProductionVariant': 'AllTraffic', 'Body': <botocore.response.StreamingBody object at 0x7efc7f467550>}


In [44]:
from pprint import pprint

In [45]:
pprint(response) 

{'Body': <botocore.response.StreamingBody object at 0x7efc7f467550>,
 'ContentType': 'application/json',
 'InvokedProductionVariant': 'AllTraffic',
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '207',
                                      'content-type': 'application/json',
                                      'date': 'Tue, 11 Oct 2022 08:23:34 GMT',
                                      'x-amzn-invoked-production-variant': 'AllTraffic',
                                      'x-amzn-requestid': '0957ab6d-569b-477a-aef0-7481a935eecb'},
                      'HTTPStatusCode': 200,
                      'RequestId': '0957ab6d-569b-477a-aef0-7481a935eecb',
                      'RetryAttempts': 0}}


In [46]:
r = json.load(response["Body"])

In [47]:
r

[[-3.0102062225341797,
  -2.48018217086792,
  -1.5776013135910034,
  -2.6531476974487305,
  -3.039762020111084,
  -1.8981196880340576,
  -2.262784481048584,
  -4.358467102050781,
  -1.3877885341644287,
  -3.6531260013580322]]