##Importing Important Libraries

Steps to be followed
1.Importing Libraries
2.Creating s3 bucket
3.Mapping train and test data in s3
4.Mapping the path of models in s3

In [13]:
import sagemaker
import boto3
from sagemaker.amazon.amazon_estimator import get_image_uri
from sagemaker.session import s3_input,Session

In [35]:
bucket_name="naaztest456"
my_region=boto3.session.Session().region_name
print(my_region)
print(bucket_name)

us-east-1
naaztest456


In [36]:
s3 = boto3.resource('s3')
try:
    if my_region == 'us-east-1':
        s3.create_bucket(Bucket=bucket_name)
    print('S3 bucket created successfully')
except Exception as e:
    print('S3 error:',e)
    

S3 bucket created successfully


In [38]:
#Output path where trained model is saved

prefix= 'xgboost-as-a-built-in-algo'
output_path='s3://{}/{}/output'.format(bucket_name,prefix)
print(output_path)

s3://naaztest456/xgboost-as-a-built-in-algo/output


Downloading the dataset and storing in s3

In [43]:
import pandas as pd
import urllib
try:
    urllib.request.urlretrieve("http://d1.awsstatic.com/tmt/build-train-deploy-machine-learning-model-sagemaker/bank_clean.27f01fbbdf43271788427f3682996ae29ceca05d.csv","bank_clean.csv")
    print("Success : Downloaded bank_clean.csv")
except Exception as e:
    print("Data load Error",e)

try:
    model_data=pd.read_csv('./bank_clean.csv',index_col=0)
    print("Success Data Loaded into DataFrame")
except Exception as e:
    print("Data Load Error ",e)
    
    
    

Success : Downloaded bank_clean.csv
Success Data Loaded into DataFrame


In [45]:
##Train-Test-Split
import numpy as np
train_data,test_data=np.split(model_data.sample(frac=1,random_state=1729),[int(0.7 *len(model_data))])
print(train_data.shape,test_data.shape)

(28831, 61) (12357, 61)


In [52]:
##Saving Train and Test data in s3 buckets
##We startwith train data

import os
pd.concat([train_data['y_yes'],train_data.drop(['y_no','y_yes'],axis=1)],axis=1).to_csv('train.csv',index=False,header=False)

boto3.Session().resource('s3').Bucket(bucket_name).Object(os.path.join(prefix,"train/train.csv")).upload_file('train.csv')
s3_input_train=sagemaker.TrainingInput(s3_data='s3://{}/{}/train'.format(bucket_name,prefix), content_type='csv')

In [53]:
#Test Data into buckets
pd.concat([test_data['y_yes'],test_data.drop(['y_no','y_yes'],axis=1)],axis=1).to_csv('test.csv',index=False,header=False)
boto3.Session().resource('s3').Bucket(bucket_name).Object(os.path.join(prefix,"test/test.csv")).upload_file('test.csv')
s3_input_test=sagemaker.TrainingInput(s3_data='s3://{}/{}/test'.format(bucket_name,prefix), content_type='csv')

# Building Model XGBoost

In [54]:
#This will automatically looks for the XGBoost image URI and builds an XGbosst Container
#Specify repo versions based on your preferences

container=get_image_uri(boto3.Session().region_name,'xgboost',repo_version='1.0-1')

The method get_image_uri has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


In [55]:
#Initialise hyperparameters
hyperparameters={
    "max_depth":"5",
    "eta":"0.2",
    "gamma":"4",
    "min_child_weight":"6",
    "subsamples":"0.7",
    "objective":"binary:logistic",
}

In [58]:
# construct a sagemaker estimator that calls the xgboost container
estimator=sagemaker.estimator.Estimator(image_uri=container,
                                        hyperparameters=hyperparameters,
                                        role=sagemaker.get_execution_role(),
                                        train_instance_count=1,
                                        train_instance_type="ml.m5.2xlarge",
                                        train_volume_size=5,
                                        output_path=output_path,
                                        train_use_spot_instances=True,
                                        train_max_run=300,
                                        train_max_wait=600)
                                       

train_instance_count has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
train_instance_type has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
train_max_run has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
train_use_spot_instances has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
train_max_wait has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.
train_volume_size has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


In [60]:
estimator.fit({'train':s3_input_train,'validation':s3_input_test})

2022-10-23 13:16:20 Starting - Starting the training job...
2022-10-23 13:16:44 Starting - Preparing the instances for trainingProfilerReport-1666530980: InProgress
.........
2022-10-23 13:18:20 Downloading - Downloading input data...
2022-10-23 13:18:45 Training - Downloading the training image...
2022-10-23 13:19:21 Training - Training image download completed. Training in progress..[34mINFO:sagemaker-containers:Imported framework sagemaker_xgboost_container.training[0m
[34mINFO:sagemaker-containers:Failed to parse hyperparameter objective value binary:logistic to Json.[0m
[34mReturning the value itself[0m
[34mINFO:sagemaker-containers:No GPUs detected (normal if no gpus installed)[0m
[34mINFO:sagemaker_xgboost_container.training:Running XGBoost Sagemaker in algorithm mode[0m
[34mERROR:sagemaker-containers:Reporting training FAILURE[0m
[34mERROR:sagemaker-containers:framework error: [0m
[34mTraceback (most recent call last):
  File "/miniconda3/lib/python3.6/site-packa

UnexpectedStatusException: Error for Training job sagemaker-xgboost-2022-10-23-13-16-20-769: Failed. Reason: AlgorithmError: framework error: 
Traceback (most recent call last):
  File "/miniconda3/lib/python3.6/site-packages/sagemaker_containers/_trainer.py", line 84, in train
    entrypoint()
  File "/miniconda3/lib/python3.6/site-packages/sagemaker_xgboost_container/training.py", line 94, in main
    train(framework.training_env())
  File "/miniconda3/lib/python3.6/site-packages/sagemaker_xgboost_container/training.py", line 90, in train
    run_algorithm_mode()
  File "/miniconda3/lib/python3.6/site-packages/sagemaker_xgboost_container/training.py", line 68, in run_algorithm_mode
    checkpoint_config=checkpoint_config
  File "/miniconda3/lib/python3.6/site-packages/sagemaker_xgboost_container/algorithm_mode/train.py", line 110, in sagemaker_train
    validated_train_config = hyperparameters.validate(train_config)
  File "/miniconda3/lib/python3.6/site-packages/sagemaker_algorithm_toolkit/hyperparameter_validation.py", line 270, in validate
    raise exc.UserError("Missing required hyperparameter: {}".format(hp)