### Using the SageMaker Autopilot SDK
The Amazon SageMaker SDK includes a simple API for SageMaker Autopilot. You can find its documentation at https://sagemaker.readthedocs.io/en/stable/automl.html. In this section, you'll learn how to use this API to train a model on the same dataset as in the previous section.    

Launching a jobThe SageMaker SDK makes it extremely easy to launch an Autopilot job – just upload your data in S3, and call a single API! Let's see how:

1. First, we import the SageMaker SDK:

In [1]:
import sagemaker
sess = sagemaker.Session()

2. Then, we download the dataset:

In [2]:
# link is not working

# %%sh
# wget -N https://sagemaker-sample-data-us-west-2.s3-us-west-2.amazonaws.com/autopilot/direct_marketing/bank-additional.zip
# unzip -o bank-additional.zip

In [3]:
# reading data from URL
import pandas as pd
url = "https://github.com/h2oai/h2o-2/raw/master/smalldata/bank-additional-full.csv"
data = pd.read_csv(url, sep=';')
#display(data.head(3))

# Randomly sample 70% of your dataframe
df = data.sample(frac=0.7)
display(df.head(3))

# saving file in current path
df.to_csv('bank-additional-full.csv')

Unnamed: 0,age,job,marital,education,default,housing,loan,contact,month,day_of_week,...,campaign,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,y
28537,38,technician,divorced,university.degree,no,no,no,cellular,apr,wed,...,2,999,2,failure,-1.8,93.075,-47.1,1.415,5099.1,no
3377,47,blue-collar,married,unknown,unknown,yes,no,telephone,may,thu,...,1,999,0,nonexistent,1.1,93.994,-36.4,4.86,5191.0,no
29349,26,technician,single,professional.course,no,yes,no,telephone,apr,fri,...,2,999,0,nonexistent,-1.8,93.075,-47.1,1.405,5099.1,no


3. Next, we upload the dataset to S3:

In [4]:
bucket = sess.default_bucket()
prefix = 'sagemaker/DEMO-automl-dm'

In [5]:
s3_input_data = sess.upload_data(path="./bank-additional-full.csv",
                                 key_prefix = prefix + 'input')
# locaion in bucket now
# s3://sagemaker-us-east-1-603012210694/sagemaker/DEMO-automl-dminput/

In [6]:
#One extra step here:
#for some reason get_execution_role() don't work locally for that we have few other ways to do that
# Code link: https://github.com/aws/sagemaker-python-sdk/issues/300 

import boto3
region = boto3.Session().region_name

def resolve_sm_role():
    client = boto3.client('iam', region_name=region)
    response_roles = client.list_roles(
        PathPrefix='/',
        # Marker='string',
        MaxItems=999
    )
    for role in response_roles['Roles']:
        if role['RoleName'].startswith('AmazonSageMaker-ExecutionRole-'):
            #print('Resolved SageMaker IAM Role to: ' + str(role))
            return role['Arn']
    raise Exception('Could not resolve what should be the SageMaker role to be used')

role = resolve_sm_role()
#print(role)

4. We then configure the AutoML job, which only takes one line of code. We define the target attribute (remember, that column is named y), and where to store training artifacts. Optionally, we can also set a maximum run time for the job, a maximum run time per job, or reduce the number of candidate models that will be tuned. Please note that restricting the job's duration too much is likely to impact its accuracy. For development purposes, this isn't a problem, so let's cap our job at one hour, or 250 tuning jobs (whichever limit it hits first):

In [7]:
from sagemaker.automl.automl import AutoML
auto_ml_job = AutoML(role = role,#sagemaker.get_execution_role(),
                     sagemaker_session = sess,
                     target_attribute_name = 'y',
                     output_path = 's3://{}/{}/output'.format(bucket,prefix),
                     max_runtime_per_training_job_in_seconds = 600,
                     max_candidates = 5,#250,
                     total_job_runtime_in_seconds = 3600)

5. Next, we launch the Autopilot job, passing it the location of the training set. We turn logs off (who wants to read hundreds of tuning logs?), and we set the call to non-blocking, as we'd like to query the job status in the next cells:

In [8]:
auto_ml_job.fit(inputs=s3_input_data, logs=False, wait=False)

The job starts right away. Now let's see how we can monitor its status.

### Monitoring a job
While the job is running, we can use the describe_auto_ml_job() API to monitor its progress:

1. For example, the following code will check the job's status every 30 seconds until the data analysis step completes:

In [13]:
from time import sleep
job = auto_ml_job.describe_auto_ml_job()
job_status = job['AutoMLJobStatus']
job_sec_status = job['AutoMLJobSecondaryStatus']

In [14]:
if job_status not in ('Stopped', 'Failed'):
    while job_status in ('InProgress') and job_sec_status    in ('AnalyzingData'):
        sleep(30)
        job = auto_ml_job.describe_auto_ml_job()
        job_status = job['AutoMLJobStatus']
        job_sec_status = job['AutoMLJobSecondaryStatus']
        print (job_status, job_sec_status)

InProgress AnalyzingData
InProgress AnalyzingData
InProgress AnalyzingData
InProgress AnalyzingData
InProgress AnalyzingData
InProgress AnalyzingData
InProgress AnalyzingData
InProgress AnalyzingData
InProgress AnalyzingData
InProgress AnalyzingData
InProgress AnalyzingData
InProgress AnalyzingData
InProgress FeatureEngineering


2. Once the data analysis is complete, the two auto-generated notebooks are available. We can find their location using the same API:

In [36]:
job = auto_ml_job.describe_auto_ml_job()
job_candidate_notebook = job['AutoMLJobArtifacts']['CandidateDefinitionNotebookLocation']
job_data_notebook = job['AutoMLJobArtifacts']['DataExplorationNotebookLocation']
print(job_candidate_notebook)
print()
print(job_data_notebook)

s3://sagemaker-us-east-1-603012210694/sagemaker/DEMO-automl-dm/output/automl-2021-03-11-15-18-06-196/sagemaker-automl-candidates/pr-1-a51af9647a40452f929f64b8aa8a01056a2e1ab7043e4653abefbd0fdd/notebooks/SageMakerAutopilotCandidateDefinitionNotebook.ipynb

s3://sagemaker-us-east-1-603012210694/sagemaker/DEMO-automl-dm/output/automl-2021-03-11-15-18-06-196/sagemaker-automl-candidates/pr-1-a51af9647a40452f929f64b8aa8a01056a2e1ab7043e4653abefbd0fdd/notebooks/SageMakerAutopilotDataExplorationNotebook.ipynb


3. Using the AWS CLI, we can copy the two notebooks locally. We'll take a look at them later in this chapter:

In [None]:
# not working

%%sh -s $job_candidate_notebook $job_data_notebook
aws s3 cp $1 .
aws s3 cp $2 .

4. While the feature engineering runs, we can wait for completion using the same code snippet as the preceding, looping while job_sec_status is equal to FeatureEngineering.

5. Once the feature engineering is complete, the model tuning starts. While it's running, we can use the Amazon SageMaker Experiments SDK to keep track of jobs. We'll cover SageMaker Experiments in detail in a later chapter, but here, the code is simple enough to give you a sneak peek! All it takes is to pass the experiment name to the ExperimentAnalytics object. Then, we can retrieve information on all tuning jobs so far in a pandas DataFrame. From then on, it's business as usual, and we can easily display the number of jobs that have already run, and the top 5 jobs so far:

In [56]:
import pandas as pd
from sagemaker.analytics import ExperimentAnalytics

exp = ExperimentAnalytics(sagemaker_session=sess,
                          experiment_name=job['AutoMLJobName']+'-aws-auto-ml-job')
df = exp.dataframe()
print("Number of jobs: ", len(df))

df = pd.concat([df['ObjectiveMetric - Max'],
                df.drop(['ObjectiveMetric - Max'], axis=1)], axis=1)

df.sort_values('ObjectiveMetric - Max', ascending=0)[:5]

Number of jobs:  5


Unnamed: 0,ObjectiveMetric - Max,TrialComponentName,DisplayName,SourceArn,SageMaker.ImageUri,SageMaker.InstanceCount,SageMaker.InstanceType,SageMaker.VolumeSizeInGB,_tuning_objective_metric,alpha,...,code - MediaType,code - Value,input_channel_mode,job_name,label_col,max_dataset_size,SageMaker.ImageUri - MediaType,SageMaker.ImageUri - Value,ds - MediaType,ds - Value
1,0.76126,tuning-job-1-46dfc77b76b643acbc-002-3a820be7-a...,tuning-job-1-46dfc77b76b643acbc-002-3a820be7-a...,arn:aws:sagemaker:us-east-1:603012210694:train...,683313688378.dkr.ecr.us-east-1.amazonaws.com/s...,1.0,ml.m5.4xlarge,50.0,validation:f1,1e-05,...,,,,,,,,,,
0,0.71679,tuning-job-1-46dfc77b76b643acbc-001-ba22ca75-a...,tuning-job-1-46dfc77b76b643acbc-001-ba22ca75-a...,arn:aws:sagemaker:us-east-1:603012210694:train...,683313688378.dkr.ecr.us-east-1.amazonaws.com/s...,1.0,ml.m5.4xlarge,50.0,validation:f1,0.0338,...,,,,,,,,,,
2,,automl-202-dpp0-csv-1-a68308d3550443d59a939816...,automl-202-dpp0-csv-1-a68308d3550443d59a939816...,arn:aws:sagemaker:us-east-1:603012210694:trans...,,1.0,ml.m5.4xlarge,,,,...,,,,,,,,,,
3,,automl-202-dpp0-1-24ee556126e04ed1a1359ecbf194...,automl-202-dpp0-1-24ee556126e04ed1a1359ecbf194...,arn:aws:sagemaker:us-east-1:603012210694:train...,683313688378.dkr.ecr.us-east-1.amazonaws.com/s...,1.0,ml.m5.4xlarge,50.0,,,...,application/x-code,s3://sagemaker-us-east-1-603012210694/sagemake...,,,,,,,,
4,,db-1-83b4cfe79c0a4060ad5512f53c014c22ea7dd12aa...,db-1-83b4cfe79c0a4060ad5512f53c014c22ea7dd12aa...,arn:aws:sagemaker:us-east-1:603012210694:proce...,,1.0,ml.m5.2xlarge,250.0,,,...,,,Pipe,automl-2021-03-11-15-18-06-196,y,5.0,,120479346908.dkr.ecr.us-east-1.amazonaws.com/d...,,s3://sagemaker-us-east-1-603012210694/sagemake...


6. Once the model tuning is complete, we can very easily find the best candidate:

In [57]:
# error

job_best_candidate = auto_ml_job.best_candidate()
print(job_best_candidate['CandidateName'])
print(job_best_candidate['FinalAutoMLJobObjectiveMetric'])

KeyError: 'BestCandidate'

Then, we can deploy and test the model using the SageMaker SDK. We've covered a lot of ground already, so let's save that for future chapters, where we'll revisit this example.

### Cleaning up
SageMaker Autopilot creates many underlying artifacts such as dataset splits, pre-processing scripts, pre-processed datasets, models, and so on. If you'd like to clean up completely, the following code snippet will do that. Of course, you could also use the AWS CLI:

In [19]:
import boto3
job_outputs_prefix = '{}/output/{}'.format(prefix, job['AutoMLJobName'])
s3_bucket = boto3.resource('s3').Bucket(bucket)
s3_bucket.objects.filter(Prefix=job_outputs_prefix).delete()

[{'ResponseMetadata': {'RequestId': '2SJA41B3388T9BNQ',
   'HostId': 'ZoGxa+A/y2men3HndTLyXuTco0x5uH1+5K72LNdOc9b76cEhl5/l/sw46QVB+l4Xkd/rNASVF28=',
   'HTTPStatusCode': 200,
   'HTTPHeaders': {'x-amz-id-2': 'ZoGxa+A/y2men3HndTLyXuTco0x5uH1+5K72LNdOc9b76cEhl5/l/sw46QVB+l4Xkd/rNASVF28=',
    'x-amz-request-id': '2SJA41B3388T9BNQ',
    'date': 'Thu, 11 Mar 2021 11:11:33 GMT',
    'content-type': 'application/xml',
    'transfer-encoding': 'chunked',
    'server': 'AmazonS3',
    'connection': 'close'},
   'RetryAttempts': 0},
  'Deleted': [{'Key': 'sagemaker/DEMO-automl-dm/output/automl-2021-03-11-10-45-36-874/transformed-data/dpp2/rpb/train/chunk_58.csv.out'},
   {'Key': 'sagemaker/DEMO-automl-dm/output/automl-2021-03-11-10-45-36-874/sagemaker-automl-candidates/pr-1-7a9a812d63984f8d803b90a734e5407a14a8621e165c4b2e8d40991e9c/notebooks/sagemaker_automl/steps.py'},
   {'Key': 'sagemaker/DEMO-automl-dm/output/automl-2021-03-11-10-45-36-874/transformed-data/dpp3/csv/train/chunk_57.csv.out'},