# Regression with Amazon SageMaker XGBoost algorithm
_**Distributed training for regression with Amazon SageMaker XGBoost script mode**_

---

## Contents
1. [Introduction](#Introduction)
2. [Setup](#Setup)
  1. [Fetching the dataset](#Fetching-the-dataset)
  2. [Data Ingestion](#Data-ingestion)
3. [Training the XGBoost model](#Training-the-XGBoost-model)
3. [Deploying the XGBoost model](#Deploying-the-XGBoost-model)

---

## Introduction

This notebook demonstrates the use of Amazon SageMaker XGBoost to train and host a regression model. [XGBoost (eXtreme Gradient Boosting)](https://xgboost.readthedocs.io) is a popular and efficient machine learning algorithm used for regression and classification tasks on tabular datasets. It implements a technique know as gradient boosting on trees, and performs remarkably well in machine learning competitions, and gets a lot of attention from customers. 

We use the [Abalone data](https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html), originally from the [UCI data repository](https://archive.ics.uci.edu/ml/datasets/abalone). More details about the original dataset can be found [here](https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.names).  In this libsvm converted version, the nominal feature (Male/Female/Infant) has been converted into a real valued feature as required by XGBoost. Age of abalone is to be predicted from eight physical measurements.  

---
## Setup


This notebook was created and tested on an ml.m5.2xlarge notebook instance.

Let's start by specifying:
1. The S3 bucket and prefix that you want to use for training and model data. This should be within the same region as the Notebook Instance, training, and hosting.
1. The IAM role arn used to give training and hosting access to your data. See the documentation for how to create these. Note, if more than one role is required for notebook instances, training, and/or hosting, please replace the boto regexp with a the appropriate full IAM role arn string(s).

In [113]:
import sys
!{sys.executable} -m pip install -qU awscli boto3 "sagemaker>=1.71.0,<2.0.0"

You should consider upgrading via the '/home/ec2-user/anaconda3/envs/python3/bin/python -m pip install --upgrade pip' command.[0m


In [161]:
%%time

import os
import boto3
import re
import sagemaker
import pandas as pd

# Get a SageMaker-compatible role used by this Notebook Instance.
role = sagemaker.get_execution_role()
region = boto3.Session().region_name

### update below values appropriately ###
bucket = 'sagemaker-rul-xgboost'
prefix = 'DEMO-xgboost-dist-script-RUL-libsvm-1'
#### 

print(region)

ap-southeast-1
CPU times: user 39.9 ms, sys: 0 ns, total: 39.9 ms
Wall time: 1.08 s


### Fetching the dataset

Following methods split the data into train/test/validation datasets and upload files to S3.

In [115]:
%%time

import io
import boto3
import random

def data_split(FILE_DATA, DATA_DIR, FILE_TRAIN_BASE, FILE_TRAIN_1, FILE_VALIDATION, FILE_TEST, 
               PERCENT_TRAIN_0, PERCENT_TRAIN_1, PERCENT_VALIDATION, PERCENT_TEST):
    data = [l for l in open(FILE_DATA, 'r')]
    train_file_0 = open(DATA_DIR + "/" + FILE_TRAIN_0, 'w')
    train_file_1 = open(DATA_DIR + "/" + FILE_TRAIN_1, 'w')
    valid_file = open(DATA_DIR + "/" + FILE_VALIDATION, 'w')
    tests_file = open(DATA_DIR + "/" + FILE_TEST, 'w')

    num_of_data = len(data)
    num_train_0 = int((PERCENT_TRAIN_0/100.0)*num_of_data)
    num_train_1 = int((PERCENT_TRAIN_1/100.0)*num_of_data)
    num_valid = int((PERCENT_VALIDATION/100.0)*num_of_data)
    num_tests = int((PERCENT_TEST/100.0)*num_of_data)

    data_fractions = [num_train_0, num_train_1, num_valid, num_tests]
    split_data = [[],[],[],[]]

    rand_data_ind = 0

    for split_ind, fraction in enumerate(data_fractions):
        for i in range(fraction):
            rand_data_ind = random.randint(0, len(data)-1)
            split_data[split_ind].append(data[rand_data_ind])
            data.pop(rand_data_ind)

    for l in split_data[0]:
        train_file_0.write(l)

    for l in split_data[1]:
        train_file_1.write(l)
        
    for l in split_data[2]:
        valid_file.write(l)

    for l in split_data[3]:
        tests_file.write(l)

    train_file_0.close()
    train_file_1.close()
    valid_file.close()
    tests_file.close()

def write_to_s3(fobj, bucket, key):
    return boto3.Session(region_name=region).resource('s3').Bucket(bucket).Object(key).upload_fileobj(fobj)

def upload_to_s3(bucket, channel, filename):
    fobj=open(filename, 'rb')
    key = prefix+'/'+channel
    url = 's3://{}/{}/{}'.format(bucket, key, filename)
    print('Writing to {}'.format(url))
    write_to_s3(fobj, bucket, key)

CPU times: user 37 µs, sys: 0 ns, total: 37 µs
Wall time: 40.8 µs


### Data ingestion

Next, we read the dataset from the existing repository into memory, for preprocessing prior to training. This processing could be done *in situ* by Amazon Athena, Apache Spark in Amazon EMR, Amazon Redshift, etc., assuming the dataset is present in the appropriate location. Then, the next step would be to transfer the data to S3 for use in training. For small datasets, such as this one, reading into memory isn't onerous, though it would be for larger datasets.

In [116]:
%%time
import urllib.request

# Load the dataset
FILE_DATA = 'abalone'
urllib.request.urlretrieve("https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression/abalone", FILE_DATA)

#split the downloaded data into train/test/validation files
FILE_TRAIN_0 = 'abalone.train_0'
FILE_TRAIN_1 = 'abalone.train_1'
FILE_VALIDATION = 'abalone.validation'
FILE_TEST = 'abalone.test'
PERCENT_TRAIN_0 = 35
PERCENT_TRAIN_1 = 35
PERCENT_VALIDATION = 15
PERCENT_TEST = 15

DATA_DIR = 'data'

if not os.path.exists(DATA_DIR):
    os.mkdir(DATA_DIR)

data_split(FILE_DATA, DATA_DIR, FILE_TRAIN_0, FILE_TRAIN_1, FILE_VALIDATION, FILE_TEST, 
           PERCENT_TRAIN_0, PERCENT_TRAIN_1, PERCENT_VALIDATION, PERCENT_TEST)


CPU times: user 26.2 ms, sys: 0 ns, total: 26.2 ms
Wall time: 1.4 s


In [117]:
#upload the files to the S3 bucket
upload_to_s3(bucket, 'train/train_0.libsvm', DATA_DIR + "/" + FILE_TRAIN_0)
upload_to_s3(bucket, 'train/train_1.libsvm', DATA_DIR + "/" + FILE_TRAIN_1)
upload_to_s3(bucket, 'validation/validation.libsvm', DATA_DIR + "/" + FILE_VALIDATION)
upload_to_s3(bucket, 'test/test.libsvm', DATA_DIR + "/" + FILE_TEST)

Writing to s3://sagemaker-rul-xgboost/DEMO-xgboost-dist-script-RUL/train/train_0.libsvm/data/abalone.train_0
Writing to s3://sagemaker-rul-xgboost/DEMO-xgboost-dist-script-RUL/train/train_1.libsvm/data/abalone.train_1
Writing to s3://sagemaker-rul-xgboost/DEMO-xgboost-dist-script-RUL/validation/validation.libsvm/data/abalone.validation
Writing to s3://sagemaker-rul-xgboost/DEMO-xgboost-dist-script-RUL/test/test.libsvm/data/abalone.test


In [160]:
traindf=pd.read_csv('Data/train-01.csv', index_col=0)
evaldf=pd.read_csv('Data/test-02.csv', index_col=0)
traindf1=pd.read_csv('Data/train-02.csv', index_col=0)
testdf=pd.read_csv('Data/test-01.csv', index_col=0)
col = ['RUL', 'id', 'cycle', 'setting1', 'setting2', 'setting3', 's1', 's2', 's3',
       's4', 's5', 's6', 's7', 's8', 's9', 's10', 's11', 's12', 's13', 's14',
       's15', 's16', 's17', 's18', 's19', 's20', 's21']

traindf = traindf[col]
testdf = testdf[col]
traindf1 = traindf1[col]
evaldf = evaldf[col]
#train = traindf.drop(columns='id','cycle','setting1', 'setting2', 'setting3')
#test = testdf.drop(columns='id','cycle','setting1', 'setting2', 'setting3')
print(traindf1['setting1'].dtype)


float64


In [119]:
traindf.to_csv('traindf.csv', header=False, index=False)
traindf1.to_csv('traindf1.csv', header=False, index=False)
evaldf.to_csv('valdf.csv', header=False, index=False)
testdf.to_csv('testdf.csv',header=False, index=False)

In [162]:
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train/train_0.libsvm')).upload_file('traindflibsvm.libsvm')
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train/train_1.libsvm')).upload_file('traindflibsvm1.libsvm')
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'validation/validation.libsvm')).upload_file('valdflibsvm.libsvm')
boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'test/test.libsvm')).upload_file('testdflibsvm.libsvm')

## Create a XGBoost script to train with 

SageMaker can now run an XGboost script using the XGBoost estimator. When executed on SageMaker a number of helpful environment variables are available to access properties of the training environment, such as:

- `SM_MODEL_DIR`: A string representing the path to the directory to write model artifacts to. Any artifacts saved in this folder are uploaded to S3 for model hosting after the training job completes.
- `SM_OUTPUT_DIR`: A string representing the filesystem path to write output artifacts to. Output artifacts may include checkpoints, graphs, and other files to save, not including model artifacts. These artifacts are compressed and uploaded to S3 to the same S3 prefix as the model artifacts.

Supposing two input channels, 'train' and 'validation', were used in the call to the XGBoost estimator's fit() method, the following environment variables will be set, following the format `SM_CHANNEL_[channel_name]`:

`SM_CHANNEL_TRAIN`: A string representing the path to the directory containing data in the 'train' channel
`SM_CHANNEL_VALIDATION`: Same as above, but for the 'validation' channel.

A typical training script loads data from the input channels, configures training with hyperparameters, trains a model, and saves a model to model_dir so that it can be hosted later. Hyperparameters are passed to your script as arguments and can be retrieved with an argparse.ArgumentParser instance. For example, the script that we will run in this notebook is provided as the accompanying file (`abalone.py`) and also shown below:

```python

import argparse
import json
import logging
import os
import pandas as pd
import pickle as pkl

from sagemaker_containers import entry_point
from sagemaker_xgboost_container.data_utils import get_dmatrix
from sagemaker_xgboost_container import distributed

import xgboost as xgb


def _xgb_train(params, dtrain, evals, num_boost_round, model_dir, is_master):
    """Run xgb train on arguments given with rabit initialized.

    This is our rabit execution function.

    :param args_dict: Argument dictionary used to run xgb.train().
    :param is_master: True if current node is master host in distributed training, or is running single node training job. Note that rabit_run will include this argument.
    """
    booster = xgb.train(params=params, dtrain=dtrain, evals=evals, num_boost_round=num_boost_round)

    if is_master:
        model_location = model_dir + '/xgboost-model'
        pkl.dump(booster, open(model_location, 'wb'))
        logging.info("Stored trained model at {}".format(model_location))


if __name__ == '__main__':
    parser = argparse.ArgumentParser()

    # Hyperparameters are described here. In this simple example we are just including one hyperparameter.
    parser.add_argument('--max_depth', type=int,)
    parser.add_argument('--eta', type=float)
    parser.add_argument('--gamma', type=int)
    parser.add_argument('--min_child_weight', type=int)
    parser.add_argument('--subsample', type=float)
    parser.add_argument('--verbose', type=int)
    parser.add_argument('--objective', type=str)
    parser.add_argument('--num_round', type=int)

    # Sagemaker specific arguments. Defaults are set in the environment variables.
    parser.add_argument('--output_data_dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR'])
    parser.add_argument('--model_dir', type=str, default=os.environ['SM_MODEL_DIR'])
    parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])
    parser.add_argument('--validation', type=str, default=os.environ['SM_CHANNEL_VALIDATION'])
    parser.add_argument('--sm_hosts', type=str, default=os.environ['SM_HOSTS'])
    parser.add_argument('--sm_current_host', type=str, default=os.environ['SM_CURRENT_HOST'])

    args, _ = parser.parse_known_args()

    # Get SageMaker host information from runtime environment variables
    sm_hosts = json.loads(os.environ['SM_HOSTS'])
    sm_current_host = args.sm_current_host

    dtrain = get_dmatrix(args.train, 'libsvm')
    dval = get_dmatrix(args.validation, 'libsvm')
    watchlist = [(dtrain, 'train'), (dval, 'validation')] if dval is not None else [(dtrain, 'train')]

    train_hp = {
        'max_depth': args.max_depth,
        'eta': args.eta,
        'gamma': args.gamma,
        'min_child_weight': args.min_child_weight,
        'subsample': args.subsample,
        'verbose': args.verbose,
        'objective': args.objective}

    xgb_train_args = dict(
        params=train_hp,
        dtrain=dtrain,
        evals=watchlist,
        num_boost_round=args.num_round,
        model_dir=args.model_dir)

    if len(sm_hosts) > 1:
        # Wait until all hosts are able to find each other
        entry_point._wait_hostname_resolution()

        # Execute training function after initializing rabit.
        distributed.rabit_run(
            exec_fun=_xgb_train,
            args=xgb_train_args,
            include_in_training=(dtrain is not None),
            hosts=sm_hosts,
            current_host=sm_current_host,
            update_rabit_args=True
        )
    else:
        # If single node training, call training method directly.
        if dtrain:
            xgb_train_args['is_master'] = True
            _xgb_train(**xgb_train_args)
        else:
            raise ValueError("Training channel must have data to train model.")


def model_fn(model_dir):
    """Deserialized and return fitted model.

    Note that this should have the same name as the serialized model in the _xgb_train method
    """
    model_file = 'xgboost-model'
    booster = pkl.load(open(os.path.join(model_dir, model_file), 'rb'))
    return booster
```



Because the container imports your training script, always put your training code in a main guard `(if __name__=='__main__':)` so that the container does not inadvertently run your training code at the wrong point in execution.

For more information about training environment variables, please visit https://github.com/aws/sagemaker-containers.

## Training the XGBoost model

After setting training parameters, we kick off training, and poll for status until training is completed, which in this example, takes between few minutes.

To run our training script on SageMaker, we construct a sagemaker.xgboost.estimator.XGBoost estimator, which accepts several constructor arguments:

* __entry_point__: The path to the Python script SageMaker runs for training and prediction.
* __role__: Role ARN
* __train_instance_type__ *(optional)*: The type of SageMaker instances for training. __Note__: Because Scikit-learn does not natively support GPU training, Sagemaker Scikit-learn does not currently support training on GPU instance types.
* __sagemaker_session__ *(optional)*: The session used to train on Sagemaker.
* __hyperparameters__ *(optional)*: A dictionary passed to the train function as hyperparameters.

In [163]:
hyperparams = {
        "max_depth":"5",
        "eta":"0.2",
        "gamma":"4",
        "min_child_weight":"6",
        "subsample":"0.7",
        "verbose":"1",
        "objective":"reg:squarederror",
        "num_round":"50"}

instance_type = "ml.m5.2xlarge"
output_path = 's3://{}/{}/{}/output'.format(bucket, prefix, 'abalone-dist-xgb')
content_type = "libsvm"

In [164]:
# Open Source distributed script mode
from sagemaker.session import s3_input, Session
from sagemaker.xgboost.estimator import XGBoost

boto_session = boto3.Session(region_name=region)
session = Session(boto_session=boto_session)
script_path = 'abalone-Copy1.py'

xgb_script_mode_estimator = XGBoost(
    entry_point=script_path,
    framework_version='0.90-1', # Note: framework_version is mandatory prevversion=1.0-1
    hyperparameters=hyperparams,
    role=role,
    train_instance_count=2, 
    train_instance_type=instance_type,
    output_path=output_path)

train_input = s3_input("s3://{}/{}/{}/".format(bucket, prefix, 'train'), content_type=content_type)
validation_input = s3_input("s3://{}/{}/{}/".format(bucket, prefix, 'validation'), content_type=content_type)

's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.
's3_input' class will be renamed to 'TrainingInput' in SageMaker Python SDK v2.


### Train XGBoost Estimator on abalone data 


Training is as simple as calling `fit` on the Estimator. This will start a SageMaker Training job that will download the data, invoke the entry point code (in the provided script file), and save any model artifacts that the script creates.

In [166]:
xgb_script_mode_estimator.fit({'train': train_input, 'validation': validation_input})
#xgb_script_mode_estimator.fit({'train': train_input})

2020-10-05 10:44:08 Starting - Starting the training job...
2020-10-05 10:44:10 Starting - Launching requested ML instances......
2020-10-05 10:45:13 Starting - Preparing the instances for training...
2020-10-05 10:45:46 Downloading - Downloading input data...
2020-10-05 10:46:24 Training - Training image download completed. Training in progress.[34mINFO:sagemaker-containers:Imported framework sagemaker_xgboost_container.training[0m
[34mINFO:sagemaker-containers:No GPUs detected (normal if no gpus installed)[0m
[34mINFO:sagemaker_xgboost_container.training:Invoking user training script.[0m
[34mINFO:sagemaker-containers:Module abalone-Copy1 does not provide a setup.py. [0m
[34mGenerating setup.py[0m
[34mINFO:sagemaker-containers:Generating setup.cfg[0m
[34mINFO:sagemaker-containers:Generating MANIFEST.in[0m
[34mINFO:sagemaker-containers:Installing module with the following command:[0m
[34m/miniconda3/bin/python -m pip install . [0m
[34mProcessing /opt/ml/code[0m
[34

[35m[10:46:34] Tree method is automatically selected to be 'approx' for distributed training.[0m
[34m[10:46:34] Tree method is automatically selected to be 'approx' for distributed training.[0m

2020-10-05 10:46:54 Uploading - Uploading generated training model
2020-10-05 10:46:54 Completed - Training job completed
Training seconds: 136
Billable seconds: 136


## Deploying the XGBoost model

After training, we can use the estimator to create an Amazon SageMaker endpoint – a hosted and managed prediction service that we can use to perform inference.

You can also optionally specify other functions to customize the behavior of deserialization of the input request (`input_fn()`), serialization of the predictions (`output_fn()`), and how predictions are made (`predict_fn()`). The defaults work for our current use-case so we don’t need to define them.

In [179]:
predictor = xgb_script_mode_estimator.deploy(initial_instance_count=1, 
                                             instance_type="ml.m5.2xlarge")
#predictor.serializer = str

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
Using already existing model: sagemaker-xgboost-2020-10-05-10-44-08-296


-----------!

In [187]:
#predictor.serializer = str
#endpoint_name='sagemaker-xgboost-2020-10-05-10-44-08-296'
from sklearn.datasets import load_svmlight_file

def get_data():
    data = load_svmlight_file("testdflibsvm.libsvm")
    return data

X = get_data()
print(X)




predictor.predict(X)


(<13096x26 sparse matrix of type '<class 'numpy.float64'>'
	with 340496 stored elements in Compressed Sparse Row format>, array([142., 141., 140., ...,  22.,  21.,  20.]))


ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>
". See https://ap-southeast-1.console.aws.amazon.com/cloudwatch/home?region=ap-southeast-1#logEventViewer:group=/aws/sagemaker/Endpoints/sagemaker-xgboost-2020-10-05-10-44-08-296 in account 018166606076 for more information.

In [188]:

%%time
import json
from itertools import islice
import math
import struct

test_file = 'testdflibsvm.libsvm'

with open(test_file, 'r') as f:
    payload = f.read()
print(test_file)

#predictor.predict(X)

testdflibsvm.libsvm
CPU times: user 5.62 ms, sys: 58 µs, total: 5.68 ms
Wall time: 5.32 ms


In [192]:

import pandas as pd
runtime_client = boto3.client('runtime.sagemaker', region_name=region)
response = runtime_client.invoke_endpoint(EndpointName=predictor.endpoint, 
                                          ContentType='text/x-libsvm', 
                                          Body=payload)
result = response['Body'].read().decode('ascii')
print(result)
result1=list(result)
res=pd.DataFrame(result1)
res.to_csv('resdist.csv')

import csv 
  
# data to be written row-wise in csv fil 
#data = [['Geeks'], [4], ['geeks !']] 
  
# opening the csv file in 'w+' mode 
file = open('resultdist.csv', 'w+', newline ='') 
  
# writing the data into the file 
with file:     
    write = csv.writer(file) 
    write.writerows(result1)
'''
result = response['Body'].read()
result = result.decode("utf-8")
result = result.split(',')
result = [math.ceil(float(i)) for i in result]
label = payload.strip(' ').split()[0]
#print ('Label: ',label,'\nPrediction: ', result[0])
#result1=list(result)
#res=pd.DataFrame(result1)
#res.to_csv('resdist.csv')
#print('Predicted values are {}.'.format(result))
'''

[186.40768432617188, 188.06385803222656, 169.426513671875, 184.88504028320312, 180.43524169921875, 201.8377685546875, 188.56129455566406, 182.78067016601562, 188.93959045410156, 205.02073669433594, 199.96800231933594, 182.4867706298828, 176.23831176757812, 186.23739624023438, 186.6061553955078, 185.9101104736328, 177.7836456298828, 162.9558868408203, 172.3774871826172, 140.43846130371094, 174.41773986816406, 166.49880981445312, 182.61012268066406, 162.59066772460938, 161.9212646484375, 186.1192626953125, 166.80690002441406, 187.9276580810547, 153.79515075683594, 150.68682861328125, 172.04141235351562, 152.28082275390625, 164.85040283203125, 155.88633728027344, 165.90081787109375, 149.8717803955078, 147.1006317138672, 169.74745178222656, 173.63803100585938, 164.21240234375, 159.7097625732422, 171.5538330078125, 161.61631774902344, 175.07794189453125, 158.5606689453125, 150.1730194091797, 171.18826293945312, 162.85638427734375, 169.00070190429688, 155.5899658203125, 146.2104034423828, 14

'\nresult = response[\'Body\'].read()\nresult = result.decode("utf-8")\nresult = result.split(\',\')\nresult = [math.ceil(float(i)) for i in result]\nlabel = payload.strip(\' \').split()[0]\n#print (\'Label: \',label,\'\nPrediction: \', result[0])\n#result1=list(result)\n#res=pd.DataFrame(result1)\n#res.to_csv(\'resdist.csv\')\n#print(\'Predicted values are {}.\'.format(result))\n'

In [182]:
import sys
import math
def do_predict(data, endpoint_name, content_type):
    payload = '\n'.join(data)
    response = runtime_client.invoke_endpoint(EndpointName=endpoint_name, 
                                   ContentType=content_type, 
                                   Body=payload)
    result = response['Body'].read()
    result = result.decode("utf-8")
    result = result.split(',')
    preds = [float((num)) for num in result]
    preds = [math.ceil(num) for num in preds]
    return preds

def batch_predict(data, batch_size, endpoint_name, content_type):
    items = len(data)
    arrs = []
    
    for offset in range(0, items, batch_size):
        if offset+batch_size < items:
            results = do_predict(data[offset:(offset+batch_size)], endpoint_name, content_type)
            arrs.extend(results)
        else:
            arrs.extend(do_predict(data[offset:items], endpoint_name, content_type))
        sys.stdout.write('.')
    return(arrs)

In [184]:
%%time
import json
import numpy as np
import pandas as pd
test_file = 'testdflibsvm.libsvm'
endpoint_name='sagemaker-xgboost-2020-10-05-10-44-08-296'
with open(test_file, 'r') as f:
    payload = f.read().strip()

labels = [int(line.split(' ')[0]) for line in payload.split('\n')]
test_data = [line for line in payload.split('\n')]
preds = batch_predict(test_data, 100, endpoint_name, 'libsvm')
res['pred']=pd.DataFrame(preds)
res['label']=pd.DataFrame(labels)
res.to_csv('res.csv')
print('\n Median Absolute Percent Error (MdAPE) = ', np.median(np.abs(np.array(labels) - np.array(preds)) / np.array(labels)))

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>
". See https://ap-southeast-1.console.aws.amazon.com/cloudwatch/home?region=ap-southeast-1#logEventViewer:group=/aws/sagemaker/Endpoints/sagemaker-xgboost-2020-10-05-10-44-08-296 in account 018166606076 for more information.

In [191]:
output_path = 's3://{}/{}/output'.format(bucket, prefix)
print(output_path)
compiled_model = xgb_script_mode_estimator.compile_model(target_instance_family='rasp3b',
                                   target_platform_os="LINUX",
                                   target_platform_arch="ARM_EABIHF",
                                   input_shape={'data':[1, 26]},
                                   role=role,
                                   framework='xgboost',
                                   framework_version='0.90-1',
                                   output_path=output_path)

#input_shape={'data':[1, 26]},
#framework_version='1.2.0',
#framework_version='1.0-1'

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.


s3://sagemaker-rul-xgboost/DEMO-xgboost-dist-script-RUL-libsvm-1/output
?..!

The instance type rasp3b is not supported for deployment via SageMaker.Please deploy the model manually.


### (Optional) Delete the Endpoint

If you're done with this exercise, please run the delete_endpoint line in the cell below.  This will remove the hosted endpoint and avoid any charges from a stray instance being left on.

In [178]:
xgb_script_mode_estimator.delete_endpoint()

estimator.delete_endpoint() will be deprecated in SageMaker Python SDK v2. Please use the delete_endpoint() function on your predictor instead.
