# AWS Marketplace Product Usage Demonstration - Algorithms

## Using Algorithm ARN with Amazon SageMaker APIs

This sample notebook demonstrates mAdvisor-AutoML algorithm using Algorithm ARN to run training jobs and use that result for inference.

## Compatibility
This notebook is compatible only with [mAdvisor-AutoML]("URL") Algorithm published to AWS Marketplace. 

***Pre-Requisite:*** Please subscribe to this product before proceeding with this notebook

## Set up the environment

In [None]:
import sagemaker as sage
from sagemaker import get_execution_role

role = get_execution_role()

# S3 prefixes
common_prefix = "DEMO-mAdvisor-byo-Monster"
training_input_prefix = common_prefix + "/training-input-data"
batch_inference_input_prefix = common_prefix + "/batch-inference-input-data"

### Create the session

The session remembers our connection parameters to Amazon SageMaker. We'll use it to perform all of our Amazon SageMaker operations.

In [None]:
sagemaker_session = sage.Session()

## Upload the data for training

When training large models with huge amounts of data, you'll typically use big data tools, like Amazon Athena, AWS Glue, or Amazon EMR, to create your data in S3. For the purposes of this example, we're using some the classic [Iris dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set), which we have included. 

We can use use the tools provided by the Amazon SageMaker Python SDK to upload the data to a default bucket. 

In [None]:
training_input = "Provide S3 Bucket training dataset location"
print ("Training Data Location " + training_input)

## Creating Training Job using Algorithm ARN

Please put in the algorithm arn you want to use below. This can either be an AWS Marketplace algorithm you subscribed to (or) one of the algorithms you created in your own account.

The algorithm arn listed below belongs to the ["mAdvisor-AutoML"]("AWS market place URL") product.

In [None]:
#from src.scikit_product_arns import ScikitArnProvider

algorithm_arn = "Please Provide Alogorithm ARN"

In [None]:
#Please provide your dataset Target name 
hyperparameters={"Target":"Target column name"}

In [None]:
import json
import time
from sagemaker.algorithm import AlgorithmEstimator

algo = AlgorithmEstimator(
            algorithm_arn=algorithm_arn,
            role=role,
            instance_count=1,
            instance_type='ml.c4.xlarge',
            hyperparameters=hyperparameters,
            base_job_name='jobname-from-aws-marketplace')

## Run Training Job

In [None]:
print ("Now run the training job using algorithm arn %s in region %s" % (algorithm_arn, sagemaker_session.boto_region_name))
algo.fit({'train': training_input})

## Batch Transform Job

Now let's use the model built to run a batch inference job and verify it works.

### Batch Transform Input Preparation

The snippet below is removing the "label" column (column indexed at 0) and retaining the rest to be batch transform's input. 

***NOTE:*** This is the same training data, which is a no-no from a ML science perspective. But the aim of this notebook is to demonstrate how things work end-to-end.

In [None]:
transform_input = "s3 bucket Test dataset location"
print("Transform input location " + transform_input)

In [None]:
transformer = algo.transformer(1, 'ml.m4.xlarge')
transformer.transform(transform_input, content_type='text/csv')
transformer.wait()

print("Batch Transform output saved to " + transformer.output_path)

#### Inspect the Batch Transform Output in S3

In [None]:
from urllib.parse import urlparse

parsed_url = urlparse(transformer.output_path)
bucket_name = parsed_url.netloc
print(parsed_url,'p')
print(bucket_name,'')
file_key = '{}/{}.out'.format(parsed_url.path[1:], "Test dataset Name ex: iris_test.csv")

s3_client = sagemaker_session.boto_session.client('s3')

response = s3_client.get_object(Bucket = sagemaker_session.default_bucket(), Key = file_key)
response_bytes = response['Body'].read().decode('utf-8')
print(response_bytes)