# Using the Sagemaker SDK with built-in algoriths ( page - 118 )

## 1. Preparing Data

In [4]:
import pandas as pd

In [5]:
!ls -ltr

total 792
drwxr-xr-x 2 root root   6144 Jan 15  2020 bank-additional
-rw-r--r-- 1 root root 432828 Jan 15  2020 bank-additional.zip
drwxr-xr-x 2 root root   6144 Apr  4 23:18 xai
-rw-r--r-- 1 root root 271549 Apr  4 23:23 Model-Metrics-Parameters.png
-rw-r--r-- 1 root root   2053 Apr  4 23:24 Notes.txt
-rw-r--r-- 1 root root  22541 Apr  5 01:19 AutoPilot.ipynb
drwxr-xr-x 3 root root   6144 Apr  5 01:25 output-artifacts
-rw-r--r-- 1 root root     58 Apr  5 01:28 README.md
-rw-r--r-- 1 root root  18968 Apr  5 08:44 SageMakerSDK-BuiltIn-Algorithms.ipynb
-rw-r--r-- 1 root root  35101 Apr  5 08:45 housing.csv


In [6]:
dataset = pd.read_csv('housing.csv')

In [7]:
print(dataset.shape)

(506, 13)


In [8]:
dataset[:5]

Unnamed: 0,crim,zn,indus,chas,nox,age,rm,dis,rad,tax,ptratio,lstat,medv
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296.0,15.3,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242.0,17.8,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242.0,17.8,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222.0,18.7,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222.0,18.7,5.33,36.2


   - 12 features, output: medv

Reading the algorithm documentation (https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html), we see that Amazon SageMaker requires that a CSV file doesn't have a header record and that the target variable is in the first column. Accordingly, we move the medv column to the front of the dataframe:


In [9]:
dataset = pd.concat([dataset['medv'], dataset.drop(['medv'], axis=1)], axis=1)

In [31]:
dataset[:5]

Unnamed: 0,medv,crim,zn,indus,chas,nox,age,rm,dis,rad,tax,ptratio,lstat
0,24.0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296.0,15.3,4.98
1,21.6,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242.0,17.8,9.14
2,34.7,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242.0,17.8,4.03
3,33.4,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222.0,18.7,2.94
4,36.2,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222.0,18.7,5.33


In [11]:
from sklearn.model_selection import train_test_split

In [12]:
training_dataset, validation_dataset = train_test_split(dataset, test_size=0.1)

In [13]:
training_dataset.to_csv('training_dataset.csv', index=False, header=False)
validation_dataset.to_csv('validation_dataset.csv', index=False, header=False)

In [14]:
import sagemaker

In [15]:
sess = sagemaker.Session()

In [16]:
bucket = sess.default_bucket()

In [17]:
bucket

'sagemaker-us-west-2-076084266064'

In [18]:
prefix='boston-housing'

In [19]:
training_data_path = sess.upload_data(path='training_dataset.csv', key_prefix=prefix + '/input/training')

In [20]:
validation_data_path = sess.upload_data(path='validation_dataset.csv', key_prefix=prefix + '/input/validation')

In [21]:
print(training_data_path)

s3://sagemaker-us-west-2-076084266064/boston-housing/input/training/training_dataset.csv


In [30]:
print(validation_data_path)

s3://sagemaker-us-west-2-076084266064/boston-housing/input/validation/validation_dataset.csv


## Configuring a training job

In [25]:
import boto3
from sagemaker import image_uris

In [26]:
region = boto3.Session().region_name
container = image_uris.retrieve('linear-learner', region)

In [27]:
container

'174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:1'

In [28]:
## configure the training job 
from sagemaker.estimator import Estimator 

In [40]:
ll_estimator = Estimator(
    container, 
    role=sagemaker.get_execution_role(),
    instance_count=1,
    instance_type='ml.m5.large',
    output_path='s3://{}/{}/output'.format(bucket,prefix)
)

Couldn't call 'get_role' to get Role ARN from role name AmazonSageMaker-ExecutionRole-20210403T113990 to get Role path.
Assuming role was created in SageMaker AWS console, as the name contains `AmazonSageMaker-ExecutionRole`. Defaulting to Role ARN with service-role in path. If this Role ARN is incorrect, please add IAM read permissions to your role or supply the Role Arn directly.


In [41]:
sagemaker.get_execution_role()

Couldn't call 'get_role' to get Role ARN from role name AmazonSageMaker-ExecutionRole-20210403T113990 to get Role path.
Assuming role was created in SageMaker AWS console, as the name contains `AmazonSageMaker-ExecutionRole`. Defaulting to Role ARN with service-role in path. If this Role ARN is incorrect, please add IAM read permissions to your role or supply the Role Arn directly.


'arn:aws:iam::076084266064:role/service-role/AmazonSageMaker-ExecutionRole-20210403T113990'

In [42]:
ll_estimator

<sagemaker.estimator.Estimator at 0x7fc81032d110>

In [43]:
ll_estimator.set_hyperparameters(
    predictor_type='regressor',
    mini_batch_size=32)

In [37]:
training_data_channel = sagemaker.TrainingInput(
    s3_data=training_data_path, content_type='text/csv')

In [38]:
validation_data_channel = sagemaker.TrainingInput(
    s3_data=validation_data_path, content_type='text/csv'
)

## Launching a training job

In [44]:
ll_estimator.fit({'train': training_data_channel, 
                 'validation': validation_data_channel})

2021-04-05 09:08:40 Starting - Starting the training job...
2021-04-05 09:09:03 Starting - Launching requested ML instancesProfilerReport-1617613720: InProgress
......
2021-04-05 09:10:07 Starting - Preparing the instances for training......
2021-04-05 09:11:04 Downloading - Downloading input data...
2021-04-05 09:11:27 Training - Downloading the training image.[34mDocker entrypoint called with argument(s): train[0m
[34mRunning default environment configuration script[0m
[34m[04/05/2021 09:11:46 INFO 140498629949248] Reading default configuration from /opt/amazon/lib/python3.7/site-packages/algorithm/resources/default-input.json: {'mini_batch_size': '1000', 'epochs': '15', 'feature_dim': 'auto', 'use_bias': 'true', 'binary_classifier_model_selection_criteria': 'accuracy', 'f_beta': '1.0', 'target_recall': '0.8', 'target_precision': '0.8', 'num_models': 'auto', 'num_calibration_samples': '10000000', 'init_method': 'uniform', 'init_scale': '0.07', 'init_sigma': '0.01', 'init_bias': 

In [45]:
sagemaker.get_execution_role()

Couldn't call 'get_role' to get Role ARN from role name AmazonSageMaker-ExecutionRole-20210403T113990 to get Role path.
Assuming role was created in SageMaker AWS console, as the name contains `AmazonSageMaker-ExecutionRole`. Defaulting to Role ARN with service-role in path. If this Role ARN is incorrect, please add IAM read permissions to your role or supply the Role Arn directly.


'arn:aws:iam::076084266064:role/service-role/AmazonSageMaker-ExecutionRole-20210403T113990'