## Steps to be followed:
    1.IMporting necessary libraries.
    2. Creating S3 bucket.
    3. Mapping train and test data in S3.
    4. Mapping the path of the models in S3.

## Using some built in algo, it is present in sagemaker like XGboost. We will be downloading image container which has already installed in get_image_uri

## boto3 - python from local environment we can read S3 buckets if its public.

## S3 sessions - to use instance we need to create session.

In [1]:
import sagemaker
import boto3
from sagemaker.amazon.amazon_estimator import get_image_uri
from sagemaker.session import s3_input, Session

## Can create a bucket based on different regions, so thats why checking the region first

In [16]:
bucket_name = 'bankappdemo'
my_region = boto3.session.Session().region_name
print(my_region)

us-east-2


In [17]:
s3 = boto3.resource('s3')
try:
    if  my_region == 'us-east-1':
        s3.create_bucket(Bucket=bucket_name)
    else: 
        s3.create_bucket(Bucket=bucket_name, CreateBucketConfiguration={ 'LocationConstraint': my_region })
    print('S3 bucket created successfully')
except Exception as e:
    print('S3 error: ',e)

S3 bucket created successfully


### to save a model into s3 bucket and versioning purpose, we can get the model name. As of now we are using the xgboost-as-a-built-in-algo model. So we can use that directly. 
### Once we train my model, if will be stored in the below output path

In [18]:
prefix = 'xgboost-as-a-built-in-algo'
output_path = 's3://{}/{}/output'.format(bucket_name, prefix)
print(output_path)

s3://bankappdemo/xgboost-as-a-built-in-algo/output


## Downloading dataset into amazon S3 bucket

In [19]:
import pandas as pd
import urllib

try:
    urllib.request.urlretrieve ("https://d1.awsstatic.com/tmt/build-train-deploy-machine-learning-model-sagemaker/bank_clean.27f01fbbdf43271788427f3682996ae29ceca05d.csv", "bank_clean.csv")
    print('Success: downloaded bank_clean.csv.')
except Exception as e:
    print('Data load error: ',e)

try:
    model_data = pd.read_csv('./bank_clean.csv',index_col=0)
    print('Success: Data loaded into dataframe.')
except Exception as e:
    print('Data load error: ',e)

Success: downloaded bank_clean.csv.
Success: Data loaded into dataframe.


### Dividing Train-test data - we can also use this data to store it into our S3 bucket so that it can be used for later purposes

In [20]:
import numpy as np
train_data, test_data = np.split(model_data.sample(frac=1, random_state=1729), [int(0.7 * len(model_data))])
print(train_data.shape, test_data.shape)

(28831, 61) (12357, 61)


### Amazon sagemaker requires reformatting. Dependent features should be in first column
### Creating train.csv
### Using boto3 session to access S3 and putting the train data into S3 bucket
### Data path should be given from S3 bucket. For that we have to create a path.

In [26]:
import os
pd.concat([train_data['y_yes'], train_data.drop(['y_no', 'y_yes'], axis=1)], axis=1).to_csv('train.csv', index=False, header=False)
boto3.Session().resource('s3').Bucket(bucket_name).Object(os.path.join(prefix, 'train/train.csv')).upload_file('train.csv')
s3_input_train = sagemaker.inputs.TrainingInput(s3_data='s3://{}/{}/train'.format(bucket_name, prefix), content_type='csv')

### Doing the same for test-data

In [27]:
pd.concat([test_data['y_yes'], test_data.drop(['y_no', 'y_yes'], axis=1)], axis=1).to_csv('test.csv', index=False, header=False)
boto3.Session().resource('s3').Bucket(bucket_name).Object(os.path.join(prefix, 'test/test.csv')).upload_file('test.csv')
s3_input_test = sagemaker.inputs.TrainingInput(s3_data='s3://{}/{}/test'.format(bucket_name, prefix), content_type='csv')

### Build models Xgboost - In build algorithm

In [31]:
container = get_image_uri(boto3.Session().region_name, 'xgboost', repo_version='1.0-1')

The method get_image_uri has been renamed in sagemaker>=2.
See: https://sagemaker.readthedocs.io/en/stable/v2.html for details.


In [34]:
hyperparameters = {
    "max-depth" : "5",
    "eta" : "0.2",
    "gamma" : "4",
    "min-child-weight" : "6",
    "subsample" : "0.7",
    "objective" : "binary-logistic",
}

## constructing sagemaker estimator that calls container

In [None]:
sess = sagemaker.Session()
xgb = sagemaker.estimator.Estimator(container,sagemaker.get_execution_role(), instance_count=1, instance_type='ml.m4.xlarge',output_path=output_Path,sagemaker_session=sess)
xgb.set_hyperparameters(max_depth=5,eta=0.2,gamma=4,min_child_weight=6,subsample=0.8,silent=0,objective='binary:logistic',num_round=100)

### Running training by calling fit from estimator

In [None]:
xgb.fit({'train': s3_input_train})

## Deploying ML model

In [None]:
xgb_predictor = xgb.deploy(initial_instance_count=1,instance_type='ml.m4.xlarge')

### Prediction of test data

In [None]:
from sagemaker.serializers import CSVSerializer

test_data_array = test_data.drop(['y_no', 'y_yes'], axis=1).values #load the data into an array
xgb_predictor.serializer = CSVSerializer() # set the serializer type
predictions = xgb_predictor.predict(test_data_array).decode('utf-8') # predict!
predictions_array = np.fromstring(predictions[1:], sep=',') # and turn the prediction into an array
print(predictions_array.shape)

## Deleting the end points

In [None]:
xgb_predictor.delete_endpoint(delete_endpoint_config=True)

In [35]:
bucket_to_delete = boto3.resource('s3').Bucket(bucket_name)
bucket_to_delete.objects.all().delete()

[{'ResponseMetadata': {'RequestId': '6SE6G38M5GZDB08D',
   'HostId': 'rz7ib2vdT4w8VJZag7RQD+ON6jpG+dorf+0AnUSFC3GVwLjprCrAYz3EwJcSCZltyv4F14qoBYI=',
   'HTTPStatusCode': 200,
   'HTTPHeaders': {'x-amz-id-2': 'rz7ib2vdT4w8VJZag7RQD+ON6jpG+dorf+0AnUSFC3GVwLjprCrAYz3EwJcSCZltyv4F14qoBYI=',
    'x-amz-request-id': '6SE6G38M5GZDB08D',
    'date': 'Sun, 05 Sep 2021 18:06:02 GMT',
    'content-type': 'application/xml',
    'transfer-encoding': 'chunked',
    'server': 'AmazonS3',
    'connection': 'close'},
   'RetryAttempts': 0},
  'Deleted': [{'Key': 'xgboost-as-a-built-in-algo/train/train.csv'},
   {'Key': 'xgboost-as-a-built-in-algo/test/test.csv'}]}]