# Amazon SageMaker

<img src="https://miro.medium.com/max/1838/1*zl6cMc74o25AIT5LxYVtVQ.png" height="400">

> "Amazon SageMaker is a fully managed machine learning service. With Amazon SageMaker, data scientists and developers can quickly and easily build and train machine learning models, and then directly deploy them into a production/hosted environment. With native support for bring-your-own-algorithms and frameworks, Amazon SageMaker offers flexible distributed training options that adjust to your specific workflows. Deploy a model into a secure and scalable environment by launching it with a single click from the Amazon SageMaker console. Training and hosting are billed by minutes of usage, with no minimum fees and no upfront commitments."

https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html

SageMaker can also be deployed via code and has its own python library.  Below, we show how to use your own AWS Educate accounts to create training and testing jobs that can be deployed on AWS. But first, lets install it and import any other libraries we might need.

In [4]:
!pip install sagemaker



In [0]:
import os
import pandas as pd

import sagemaker
from sagemaker import get_execution_role
from sagemaker.sklearn.estimator import SKLearn

import boto3

## Access to AWS Services

Below are the list of access keys that are required by the libraries to interact with AWS services like S3 and SageMaker.  You can get the first 3 keys via the Vocareum portal by clicking the "Account Details" button and then the "Show" button.

<br>
<img src="https://scontent-lhr3-1.xx.fbcdn.net/v/t1.15752-9/68541886_907291666271323_6721776512917307392_n.png?_nc_cat=104&_nc_oc=AQleXYtWLruOkDKgYsmMQXBt68ikF5jdkF4xZrPueSIgEpIircpjOHbNKGFvM1Zfr2g&_nc_ht=scontent-lhr3-1.xx&oh=e3ddeb63c6bc163a3eb115cc8b51e011&oe=5E155B3D" alt="Vocareum account details button screenshot">
<br><br>

The Role ARN however requires a role to be created and the S3 and SageMaker policies to be asigned to that role.  To do this follow the instructions below.

1. In the AWS console navigate to the IAM service by going to the console home page and searching in the "Find Service" search bar, or by clicking the "Services" drop down and searching for IAM.

2. Click on Roles in the left navigation bar.

3. Click "Create role".

4. Select "SageMaker" from the list of services and click "Next".

5. Search for and select the AmazonS3FullAccess and AmazonSageMakerFullAccess services, and click "Next".

6. Click "Next", the tags are optional.

7. Give your role a name and click "Create role".

8. Click on the role you have created in the roles list to bring up the summary where you should see the Role ARN key.

In [0]:
# WHAT YOU NEED TO RUN
# Access these from the Vocareum "Account Details" section
AWS_ACCESS_KEY_ID = "ASIA3MJULZLYTVNEU5FI"
AWS_SECRET_ACCESS_KEY = "BDlIvraq1ZKP9xo/twMu7d3T3fvBhBo2fPdteshk"
AWS_SESSION_TOKEN = "FQoGZXIvYXdzEGYaDDw5bhmELfnEjhXsiyKCAjayuzTMC70ORJh6woC6E59yQ2+1Hj14xE6tX2LF4GgB/g5MuaXem9JIWMh/gqn3is6WntwVOEru4jTATzN9Fdi4N3PPXGKVmW0q+KwCvB0gBwoGD+WWlgnYzdVRNMBUWFdo+uefloumTrTktYm9xefPoDE6ZAtYAW4vp2t/Vr6sQ9q2YETx6cN/Q7XouoCUUyPCHa/MjOT14syeYNgMEUz3IWb7DOUQIcVN9IQWrA1ZATMvWKnb9xZpnzlI4lYPhF3jiUxqRne6Vk2oKrjT3Es6xCRvN6ND+Gr4SRjmexB2KvIWU/zxej46/eXuhvpPSYLq/yfLvNfRtcuTeaaD0Nj2ISjDvf/qBQ=="

# Create a role and assign the AmazonS3FullAccess and AmazonSageMakerFullAccess
# policies to that role, and view the role summary to get the Role ARN
AWS_SAGEMAKER_ROLE_ARN = "arn:aws:iam::782330743537:role/awsrole"

## Boto
Boto is the Amazon Web Services (AWS) SDK for Python. It enables Python developers to create, configure, and manage AWS services, such as EC2 and S3. Boto provides an easy to use, object-oriented API, as well as low-level access to AWS services.

In [0]:
# Create a session using the AWS keys provided above
session = boto3.Session(
    aws_access_key_id =      AWS_ACCESS_KEY_ID,
    aws_secret_access_key =  AWS_SECRET_ACCESS_KEY,
    aws_session_token=       AWS_SESSION_TOKEN,
    region_name='us-east-1',
)

In [0]:
# Create a SageMaker dedicated session
sagemaker_session = sagemaker.Session(boto_session=session)

## Data

As usual we will need some data to train a model with.  This will need uploaded to an S3 bucket that SageMaker can access.  But guess what, ***you've already been using S3 buckets!*** Every time we fetch some data using a URL that begins with `https://ai-camp-content.s3.amazonaws.com/`, we are hitting a public S3 bucket called `ai-camp-content`.

We have placed another amazon review CSV file at the below link.  This one contains reviews and whether they are real or fake.  Read it in this CSV file and look at some of its content.

```"https://ai-camp-content.s3.amazonaws.com/reviews.csv"```


In [17]:
# TODO: Load and inspect data
df = pd.read_csv("https://ai-camp-content.s3.amazonaws.com/reviews.csv")
df.head()

Unnamed: 0,data,label
0,The Bose system has bluetooth packages i got i...,fake
1,Nice set of speakers. The sound quality is am...,fake
2,"Have had these for almost a yeara now, the spe...",fake
3,I bought this a few days ago and I am so happy...,fake
4,I bought this a month ago and I am so happy i ...,fake


## Upload SageMaker Data

Now that you have the data that SageMaker needs, you need to upload it to your own S3 bucket on AWS. We want to upload a folder called aicamp-test and within it a folder called data that contains our reviews.csv file.

> `/aicamp-test/data/reviews.csv`

In [0]:
DATA_DIR = 'data'
prefix = 'aicamp-test'
full_path = "{}/{}".format(prefix, DATA_DIR)



We need to create this locally so create a folder called data and save the current dataframe to a new csv file in the data fodler.

In [0]:
# Create a folder called data
!mkdir -p data

# Save the dataframe into that folder
df.to_csv(path_or_buf = '/content/data/reviews.csv', index=False)

Now we can upload the data and the folder to a bucket which SageMaker can create automatically.  After running the code below you should see the bucket name and the path to your bucket data folder.  If you go onto the AWS console online, you should also be able to see the bucket listed in the AWS console under S3.

In [27]:
# Uploading the data folder to a bucket
data = sagemaker_session.upload_data('/content/data', key_prefix=full_path)

print ("Bucket: " + sagemaker_session.default_bucket())
print ("Data: " + data)

Bucket: sagemaker-us-east-1-782330743537
Data: s3://sagemaker-us-east-1-782330743537/aicamp-test/data


<img src="https://scontent-lhr3-1.xx.fbcdn.net/v/t1.15752-9/68263554_349963875927198_6086539317851717632_n.png?_nc_cat=105&_nc_oc=AQkN1qgZirq43MzqkwyOK4XJB1eUqs2LaO6aaptWMik3KY2wFsmWlHyXD30WYM5jzY0&_nc_ht=scontent-lhr3-1.xx&oh=e1513abb927b9e36b5fd59720d1f04b0&oe=5DD82905" alt="Bucket List">

## SageMaker File

To tell SageMaker what exactly to do with the data to train, and then test, we need to provide a python script.  We have seen once before the `%%writefile file.py` command which outputs the cell as a new file. Run the code below and verify that it created a new file.

### Start/Training/Fitting

Start looking at the line `if __name__ == '__main__':` half way down the file.  This is the entry point into the program and so this is where we want to start the SageMaker  training process.  The content of this block of code does some setting up and then calls the training function which is defined above it.  This is where we get the data from the bucket we just uploaded, train a machine learning model like we've been doing before, and then finally dump the model file into another S3 bucket to be used later.

### Model_fn

The `model_fn` function defined below the entry point code and  is one of 4 functions required by SageMaker.  It is for loading in the model that we will have trained in the training function.

### Input_fn

Now that the model is loaded, we receive some data to predict against.  This is what the `input_fn` is for.  This function checks what format the input data is in before then deciding the best way to unpack it. JSON format might be the easiest to use for now.

### Predict_fn

We need to use our trained model to predict against the data recieved from the input function.  The `predict_fn` does this and return the prediction(s).

### Output_fn

Finally, the predictions need returned to the place they were sent from.  We use a SageMaker Responce function to send our reply.

In [28]:
%%writefile entry_point.py
import argparse
import pandas as pd
import os
import json
from io import StringIO

from sklearn.externals import joblib
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neural_network import MLPClassifier
from sklearn.pipeline import Pipeline

from sagemaker_containers.beta.framework import worker

# method to train and save the model
def training():

    # Take the set of files and read them all into a single pandas dataframe
    input_files = [ os.path.join(args.train, file) for file in os.listdir(args.train) ]
    if len(input_files) == 0:
        raise ValueError(('There are no files in {}.\n' +
                          'This usually indicates that the channel ({}) was incorrectly specified,\n' +
                          'the data specification in S3 was incorrectly specified or the role specified\n' +
                          'does not have permission to access the data.').format(args.train, "train"))

    raw_data = [ pd.read_csv(file, engine="python") for file in input_files ]
    train_data = pd.concat(raw_data)
    
    # labels are in the first column
    train_y = train_data['label']
    train_X = train_data['data']
    
    pipeline = Pipeline([
        ('tfidf', TfidfVectorizer(analyzer='word', 
                                  max_df=0.75,
                                  max_features=None, 
                                  ngram_range=(1,3), 
                                  norm=None,
                                  smooth_idf=True, 
                                  stop_words=None,
                                  sublinear_tf=False)),
        ('clf', MLPClassifier(alpha = 0.001, 
                              hidden_layer_sizes = 5, 
                              learning_rate = 'constant', 
                              random_state = 42)),
    ])

    pipeline.fit(train_X.tolist(), train_y.tolist())
    
    joblib.dump(pipeline, os.path.join(args.model_dir, "model.joblib"))

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    
    # Sagemaker specific arguments. Defaults are set in the environment variables.
    parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR'])
    parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
    parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])

    args = parser.parse_args()
    training()

# [REQUIRED]: method to load model
def model_fn(model_dir):
    """Deserialized and return fitted model

    Note that this should have the same name as the serialized model in the main method
    """
    clf = joblib.load(os.path.join(model_dir, "model.joblib"))
    return clf

# [REQUIRED]: method to handle input of requests....
def input_fn(request_body, request_content_type):
    """An input_fn that loads a pickled numpy array"""
    if request_content_type == "application/python-pickle":
        array = np.load(BytesIO((request_body)))
        return array
    elif request_content_type == 'application/json':
        jsondata = json.load(StringIO(request_body))
        return [jsondata['instances'][0]['features'][0]]
    else:
        # Handle other content-types here or raise an Exception
        # if the content type is not supported.
        raise ValueError("{} not supported by script!".format(request_content_type))

# [REQUIRED]: method to run prediction on the model
def predict_fn(input_data, model):
    prediction = model.predict(input_data)
    return {'predicted-value': prediction[0]}

# [REQUIRED]: method to handle output of requests....
def output_fn(prediction, accept):
    if accept == "application/json":
        return worker.Response(json.dumps(prediction), accept, mimetype=accept)
    elif accept == 'text/csv':
        return worker.Response(encoders.encode(prediction, accept), accept, mimetype=accept)
    else:
        raise ValueError("{} accept type is not supported by this script.".format(accept)) 

Overwriting entry_point.py


We now give SageMaker this entry point file and tell it how powerful a machine to run the training on.  Below we choose one of the large instances because anything smaller will not be powerful enough for training.  You can see a list of [instance types](https://aws.amazon.com/sagemaker/pricing/instance-types/) and their [prices](https://aws.amazon.com/sagemaker/pricing/) on the Amazon website.

In [0]:
script_path = 'entry_point.py'
sklearn = SKLearn(
    entry_point=script_path,
    train_instance_type="ml.m5.large",
    role=AWS_SAGEMAKER_ROLE_ARN,
    sagemaker_session=sagemaker_session)

Now we are ready to begin training.  The following function kicks off the `entry_point.py` script which will run the training function.

In [30]:
sklearn.fit({'train': data})

2019-08-23 13:14:09 Starting - Starting the training job...
2019-08-23 13:14:10 Starting - Launching requested ML instances...
2019-08-23 13:15:11 Starting - Preparing the instances for training......
2019-08-23 13:16:13 Downloading - Downloading input data
2019-08-23 13:16:13 Training - Downloading the training image.
[31m2019-08-23 13:16:25,899 sagemaker-containers INFO     Imported framework sagemaker_sklearn_container.training[0m
[31m2019-08-23 13:16:25,902 sagemaker-containers INFO     No GPUs detected (normal if no gpus installed)[0m
[31m2019-08-23 13:16:25,913 sagemaker_sklearn_container.training INFO     Invoking user training script.[0m

2019-08-23 13:16:25 Training - Training image download completed. Training in progress.[31m2019-08-23 13:16:57,518 sagemaker-containers INFO     Module entry_point does not provide a setup.py. [0m
[31mGenerating setup.py[0m
[31m2019-08-23 13:16:57,518 sagemaker-containers INFO     Generating setup.cfg[0m
[31m2019-08-23 13:16:57,51

You should now have a trained instance which you can check in the AWS console under training jobs.

<img src="https://scontent-lhr3-1.xx.fbcdn.net/v/t1.15752-9/68759217_2373273856229987_8843558449552818176_n.png?_nc_cat=108&_nc_oc=AQnzU8JsbYJaU3u57x5JCCfQUXcnUny-bt1eX4xrmHDAUuzCzv8h8gQiguSP025wFWQ&_nc_ht=scontent-lhr3-1.xx&oh=2ceeabed584fea0c9c73565e16ccc82d&oe=5DD49355">

## Endpoint for Testing

To test against this model we need to deply an endpoint that someone can hit.  This will allow us to make use of the `model_fn`, `input_fn`, `predict_fn`, and `output_fn` functions.

In [31]:
predictor = sklearn.deploy(initial_instance_count=1, instance_type="ml.t2.medium", endpoint_name='google-colab-aicamp-test')

-----------------------------------------------------------------------------------------------!



After the above code completes you should have a created enpoint and endpoint configuration.  Go to your AWS console to check.  Go to the "Endpoint" tab in the SageMaker section to see that there is an endpoint there. Click on it to load up the details including the endpoint URL.

<img src="https://scontent-lhr3-1.xx.fbcdn.net/v/t1.15752-9/68754979_472474233534174_3288858798751481856_n.png?_nc_cat=100&_nc_oc=AQl0ApMkpo_2RzEIVsloFYCuLH367rEIdVsRnRzFHtN3P5gqpvddrXtHm62erB-HOn0&_nc_ht=scontent-lhr3-1.xx&oh=bb193a1154f97e8939495e5dde8a6758&oe=5DD90E21">

## Hit the Endpoint

Now that your endpoint is deployed you can hit it as long as you have the correct credentials.  You can form your POST request however you like but we find it easiest using Postman.

<img src="https://scontent-lhr3-1.xx.fbcdn.net/v/t1.15752-9/68678979_363066761041475_6797681650328141824_n.png?_nc_cat=110&_nc_oc=AQny9QrQRmaw98auTipqZYdtT1LzVxp0fPVvGSEjZSLpR5gCJOzSsRJmSkXDxdE_lW4&_nc_ht=scontent-lhr3-1.xx&oh=396b6864ad65b33a90cbc1bd4b6262be&oe=5DCAF196">

[Download](https://www.getpostman.com/downloads/) and install Postman.  Make a POST request with the following body format.

```
{
    "instances": [
        {
            "features": [
                "hello world"
            ]
        }
    ]
}
```

Next go into the Authorization tab and select the "AWS Signature" type.  Finally, fill out all of the inputs on the right including AccessKey, SecretKey, AWS Region, Service Name, and Session Token.  You should be able to figure out what these should be based on the keys we have set at the beginning of this notebook.

<br>

Once you have filled in the authorization details, click on the "Headers" tab and and create a key called Content-Type and set its value to application/json. We need to set this because the data that we are sending is in the form of JSON.

<br>

***Note***: make sure the Service Name is in all lower case letters.

Now, hit send and make sure you get back a resonable responce based on the contents of the `output_fn` in the entry_point.py file.

## Cleanup Your Mess

If you have finished with this endpoint and you no longer need it, you'd better shut it down or you might incure more cost over time! The following code removes the endpoint and its configuration from service.  After running these, make sure that they are not in the Endpoint list in the AWS console.

In [0]:
sagemaker_session.delete_endpoint('google-colab-aicamp-test')

In [0]:
sagemaker_session.delete_endpoint_config('google-colab-aicamp-test')

## Challenge

Now it's your turn, edit the `entry_point.py` file to do some of your own machine learning!

## Debugging

If you run into any difficulties, remember you can see the 

## More Reading

- [AWS 10-minute SageMaker Tutorial](https://aws.amazon.com/getting-started/tutorials/build-train-deploy-machine-learning-model-sagemaker/?sc_icampaign=pac-sagemaker-console-tutorial&sc_ichannel=ha&sc_icontent=awssm-2276&sc_iplace=console-body&trk=ha_awssm-2276)

- [Blog for handwritten numbers clustering using SageMaker](https://towardsdatascience.com/aws-sagemaker-ais-next-game-changer-480d79e252a8)

- [AWS Educate Permissions](https://s3.amazonaws.com/awseducate-starter-account-services/AWS_Educate_Starter_Accounts_and_AWS_Services.pdf)