# Predict all cause 30-day hospital readmission risk

## Thinking about Data

It is important to understand the relationship between different tables and the data in those tables. This is important to identify the information which is relevant to the prediction. The tool that you used to generate the data created different csv files which you will upload to S3 bucket. Based on the generated data, you can see the below relationship between different tables within your data set. If you are using your own data for this notebook, it will help to create some visualization of the data to better understand the relationship.

## Cleaning and Visualizing Your Data

<img src="EHR.png">

## Steps involved in this machine learning project

1. Understanding of your data 
2. Storing and converting your data into parquet for optimized performance and storage
3. Feature selection and feature engineering using Spark
4. Data pre-processing - StringIndexer and OneHotEncoding to convert categorical variables into required training data
5. Train Spark ML model for data pre-processing and serialize using MLeap library to be used during inference pipeline
6. Convert the data set into XGBoost supported format i.e. CSV from Spark Data Frame
7. Split the data set into training and validation for model training and validation
8. Train XGBoost Model using SageMaker XGBoost algorithm and validate model prediction using validation data set
9. Tune the trained model using Hyperparameter tuning jobs for required HPO parameters
10. Get the best tuned model and create inference pipeline which includes Spark ML model and XGBoost Model
11. Create the end point configuration to deploy the inference pipeline
12. Deploy the inference pipeline for real time prediction
13. Invoke real time prediction API for a request.

You need to update the S3 Bucket and KMS Key Id with the values for your environment. This notebook requires certain resources to be created. Cloud Formation template has been provided to create the required resources.

In [None]:
import boto3
import botocore 
import time

bucket = '' # Update this to the bucket that was created in your lab account as part of this enviroment.
sse_kms_id = '' ## Update this value from Cloud Formation template
glue_crawler_db = '' ## Update this value from Cloud Formation template
s3 = boto3.resource('s3')

# Preprocessing using Apache Spark in AWS Glue

### Upload Glue Scripts to S3

In [None]:
%%bash

# Download Dependencies
wget https://s3-us-west-2.amazonaws.com/sparkml-mleap/0.9.6/python/python.zip
wget https://s3-us-west-2.amazonaws.com/sparkml-mleap/0.9.6/jar/mleap_spark_assembly.jar

In [None]:
 # Uploading Glue scripts and dependencies to S3
from sagemaker import Session as Sess

# SageMaker session
sess = Sess()

result = sess.upload_data(path='glue_scripts/convert_to_parquet', bucket=bucket, key_prefix='scripts', extra_args={"ServerSideEncryption": "aws:kms",'SSEKMSKeyId':sse_kms_id })
print(result)
result = sess.upload_data(path='glue_scripts/produce_training_data', bucket=bucket, key_prefix='scripts', extra_args={"ServerSideEncryption": "aws:kms",'SSEKMSKeyId':sse_kms_id })
print(result)
result = sess.upload_data(path='python.zip', bucket=bucket, key_prefix='scripts', extra_args={"ServerSideEncryption": "aws:kms",'SSEKMSKeyId':sse_kms_id })
print(result)
result = sess.upload_data(path='mleap_spark_assembly.jar', bucket=bucket, key_prefix='scripts', extra_args={"ServerSideEncryption": "aws:kms",'SSEKMSKeyId':sse_kms_id })
print(result)

Your S3 Bucket is now ready with raw data and required scripts

### Create and run AWS Glue Preprocessing Job

Next we'll be creating Glue client via Boto3 so that we can invoke the start_job_run API of Glue. This API creates an immutable run/execution corresponding to the job definition created above. We will require the job_run_id for the particular job execution to check for status. We'll pass the data and model locations as part of the job execution parameters.

Finally, we will check for the job status to see if it has succeeded, failed or stopped. Once the job is succeeded, we have the transformed data into S3 in required format. If the job fails, you can go to AWS Glue console, click on Jobs tab on the left, and from the page, click on this particular job and you will be able to find the CloudWatch logs (the link under Logs) link for these jobs which can help you to see what exactly went wrong in the job execution.

#### Start CSV to Parquet conversion Glue Job

This Glue Job is setup to use Spark 2.4 and Python 3.0. The job requires a python script which is uploaded to the S3 bucket and provided to the job while creating this job using Cloud Formation template. You can generate these scripts in Glue using the console so that you don't have to write the script from scratch and can make modifications to the generated script as per your use case. In this case, we generated the script to read the data from Glue crawler database and then selecting only the required columns based on domain knowledge for pre-processing and model training. We will drop the null values and update the data types to be supported by our machine learning algorithm. Finally, the data set is saved to S3 bucket in parquet with partition keys.

In [None]:
### Create and run AWS Glue Preprocessing Job

# Define the Job in AWS Glue
glue = boto3.client('glue')

try:
    glue.get_job(JobName='glue-etl-convert-to-parquet')
    print("Job already exists, continuing...")
except glue.exceptions.EntityNotFoundException:
    print('{}\n'.format("Job Not Found, Check the output of Cloud Formation template"))

# Run the job in AWS Glue
try:
    job_name='glue-etl-convert-to-parquet'
    response = glue.start_job_run(JobName=job_name,
                                  Arguments={
                                            '--s3_bucket' : bucket,
                                            '--glue_crawler_db' : glue_crawler_db ##This value is from cloud formation template
                                    })
    job_run_id = response['JobRunId']
    print('{}\n'.format(response))
except glue.exceptions.ConcurrentRunsExceededException:
    print("Job run already in progress, continuing...")

    
# Check on the job status
import time

job_run_status = glue.get_job_run(JobName=job_name,RunId=job_run_id)['JobRun']['JobRunState']
while job_run_status not in ('FAILED', 'SUCCEEDED', 'STOPPED'):
    job_run_status = glue.get_job_run(JobName=job_name,RunId=job_run_id)['JobRun']['JobRunState']
    print (job_run_status)
    time.sleep(300)

#### Start Glue Job to Produce Training Data

This Glue Job is setup uses Spark 2.2 and Python 2.0. The job requires a python script which is uploaded to the S3 bucket and provided to the job while creating this job using Cloud Formation template. You can generate these scripts in Glue using the console so that you don't have to write the script from scratch and can make modifications to the generated script as per your use case. In this case, we are using Spark 2.2 instead of latest supported Spark version i.e. Spark 2.4 since MLeap serialization libraries provided for serializing Spark ML model for data pre-processing does not support Spark 2.4. You can check more details about this on https://github.com/aws/sagemaker-sparkml-serving-container. In the script, we are reading directly from S3 bucket all the partitions but you can filter partitions to read specific partitions. Here the partitions are based on the date but you can define your own partition strategy. As per the understanding of the data, you will join multiple tables to produce the training data for your machine learning model. After joining the tables, we will drop the columns which are not required. Since you are using Supervised learning model i.e. XGBoost we need to provide the label data. We will calculate label data i.e. 30-day readmission by sorting all the encounters by timestamp for a specific patient id and then taking a difference of encounter start from previous encounter stop. This will provide the number of days from last encounter and can be used to identify if the encounter was within last 30 days or not. After which we are using Imputation technique to fill some of the missing values in the data. We are also doing feature engineering to convert birth date into age to better suit our machine learning model. Once all this is done, you will generate features vector by leveraging Spark ML OneHotEncoding and then serialize the model using MLeap serialization library. Since XGBoost algorithm supports CSV format for training data, you need to convert Spark Data Frame into CSV files and save to S3 bucket.

In [None]:
### Create and run AWS Glue Preprocessing Job

# Define the Job in AWS Glue
glue = boto3.client('glue')

try:
    glue.get_job(JobName='glue-etl-produce-traing-data')
    print("Job already exists, continuing...")
except glue.exceptions.EntityNotFoundException:
    print('{}\n'.format("Job Not Found, Check the output of Cloud Formation template"))

# Run the job in AWS Glue
try:
    job_name='glue-etl-produce-traing-data'
    response = glue.start_job_run(JobName=job_name,
                                  Arguments={
                                            '--sse_kms_id': sse_kms_id,
                                            '--s3_bucket' : bucket
                                    })
    job_run_id = response['JobRunId']
    print('{}\n'.format(response))
except glue.exceptions.ConcurrentRunsExceededException:
    print("Job run already in progress, continuing...")

    
# Check on the job status
import time

job_run_status = glue.get_job_run(JobName=job_name,RunId=job_run_id)['JobRun']['JobRunState']
while job_run_status not in ('FAILED', 'SUCCEEDED', 'STOPPED'):
    job_run_status = glue.get_job_run(JobName=job_name,RunId=job_run_id)['JobRun']['JobRunState']
    print (job_run_status)
    time.sleep(300)

 ### [OPTIONAL] Validate Spark ML prediction
You can use  another Jupyter notebook i.e. **Sparkml-model-test.ipynb** provided in this Github repo to understand and validate the prediction generated by Spark ML model for data pre-processing.

# Training an Amazon SageMaker XGBoost Model

Now that we have our data preprocessed in a format that XGBoost recognizes, we can run a simple training job to train a binary classifier model on our data. We can run this entire process in our Jupyter notebook. Run the following cell, labeled Run Amazon SageMaker XGBoost Training Job. This will run our XGBoost training job in Amazon SageMaker, and monitor the progress of the job. Once the job is ‘Completed’, you can move on to the next cell.

This will train the model on the preprocessed data we created earlier. After a few minutes, usually less than 5, the job should complete successfully, and output our model artifacts to the S3 location we specified. Once this is done, we can deploy an inference pipeline that consists of pre-processing, inference and post-processing steps.

### Run Amazon SageMaker XGBoost Training Job

Now we will use SageMaker XGBoost algorithm to train on this dataset. We already know the S3 location
where the preprocessed training data was uploaded as part of the Glue job. You need to update train_prefix and validation_prefix with S3 prefix location of training and validation data set

#### We need to retrieve the XGBoost algorithm image
We will retrieve the XGBoost built-in algorithm image so that it can leveraged for the training job.

In [None]:
from sagemaker.amazon.amazon_estimator import get_image_uri
import boto3
import botocore
from botocore.exceptions import ClientError
from sagemaker import get_execution_role

from sagemaker import Session as Sess

# SageMaker session
sess = Sess()

# Boto3 session
session = boto3.session.Session()
role = get_execution_role()
region = session.region_name

training_image = get_image_uri(sess.boto_region_name, 'xgboost', repo_version="latest")
print (training_image)

##Get training and validation data set location on S3
train_prefix = "train-data/2020/3/13" ## Update S3 Prefix
validation_prefix = "validation-data/2020/3/13" ## Update S3 Prefix

#### Next XGBoost model parameters and dataset details will be set properly
We have parameterized this Notebook so that the same data location which was used in the PySpark script can now be passed to XGBoost Estimator as well.

In [None]:
### Run Amazon SageMaker XGBoost Training Job
from sagemaker.amazon.amazon_estimator import get_image_uri

import random
import string


# Get XGBoost container image for current region
training_image = get_image_uri(region, 'xgboost', repo_version="latest")

# Create a unique training job name
training_job_name = 'xgboost-readmission-'+''.join(random.choice(string.ascii_lowercase + string.digits) for _ in range(8))

# Create the training job in Amazon SageMaker
sagemaker = boto3.client('sagemaker')
response = sagemaker.create_training_job(
    TrainingJobName=training_job_name,
    HyperParameters={
        'early_stopping_rounds ': '5',
        'num_round': '10',
        'objective': 'binary:logistic', ## Binary classification since readmission will be Yes or NO. Get probability of binary classification
        'eval_metric': 'auc'

    },
    AlgorithmSpecification={
        'TrainingImage': training_image,
        'TrainingInputMode': 'File',
    },
    RoleArn=role,
    InputDataConfig=[
        {
            'ChannelName': 'train',
            'DataSource': {
                'S3DataSource': {
                    'S3DataType': 'S3Prefix',
                    'S3Uri': 's3://{}'.format(bucket+'/'+train_prefix),
                    'S3DataDistributionType': 'FullyReplicated'
                }
            },
            'ContentType': 'text/csv',
            'CompressionType': 'None',
            'RecordWrapperType': 'None',
            'InputMode': 'File'
        },
        {
            'ChannelName': 'validation',
            'DataSource': {
                'S3DataSource': {
                    'S3DataType': 'S3Prefix',
                    'S3Uri': 's3://{}'.format(bucket+'/'+validation_prefix),
                    'S3DataDistributionType': 'FullyReplicated'
                }
            },
            'ContentType': 'text/csv',
            'CompressionType': 'None',
            'RecordWrapperType': 'None',
            'InputMode': 'File'
        },
    ],
    OutputDataConfig={
        'S3OutputPath': 's3://{}/xgb'.format(bucket),
        'KmsKeyId' : sse_kms_id
    },
    ResourceConfig={
        'InstanceType': 'ml.m5.4xlarge', ## For XGBoost use memory optimized instances since all the data is loaded into memory so we need memory intensive Ec2 instance
        'InstanceCount': 2, ## Distributed training
        'VolumeSizeInGB': 10
    },
    StoppingCondition={
        'MaxRuntimeInSeconds': 3600
    },)

print('{}\n'.format(response))

# Monitor the status until completed
job_run_status = sagemaker.describe_training_job(TrainingJobName=training_job_name)['TrainingJobStatus']
while job_run_status not in ('Failed', 'Completed', 'Stopped'):
    job_run_status = sagemaker.describe_training_job(TrainingJobName=training_job_name)['TrainingJobStatus']
    print (job_run_status)
    time.sleep(30)



### Hyperparameter Tuning to find the best model

Amazon SageMaker automatic model tuning, also known as hyperparameter tuning, finds the best version of a model by running many training jobs on your dataset using the algorithm and ranges of hyperparameters that you specify. It then chooses the hyperparameter values that result in a model that performs the best, as measured by a metric that you choose.

In [None]:
### Run Amazon SageMaker XGBoost Training Job
from sagemaker.amazon.amazon_estimator import get_image_uri

# Get XGBoost container image for current region
training_image = get_image_uri(region, 'xgboost', repo_version="latest")

training_job_definition = {
    "AlgorithmSpecification": {
      "TrainingImage": training_image,
      "TrainingInputMode": "File"
    },
    "InputDataConfig": [
      {
        "ChannelName": "train",
        "CompressionType": "None",
        "ContentType": "text/csv",
        "DataSource": {
          "S3DataSource": {
            "S3DataDistributionType": "FullyReplicated",
            "S3DataType": "S3Prefix",
            "S3Uri": 's3://{}'.format(bucket+'/'+train_prefix)
          }
        }
      },
      {
        "ChannelName": "validation",
        "CompressionType": "None",
        "ContentType": "text/csv",
        "DataSource": {
          "S3DataSource": {
            "S3DataDistributionType": "FullyReplicated",
            "S3DataType": "S3Prefix",
            "S3Uri": 's3://{}'.format(bucket+'/'+validation_prefix)
          }
        }
      }
    ],
    "OutputDataConfig": {
      "S3OutputPath": "s3://{}/xgb".format(bucket),
      "KmsKeyId" : sse_kms_id
    },
    "ResourceConfig": {
      "InstanceCount": 2, ## Distributed training
      "InstanceType": "ml.m5.4xlarge",
      "VolumeSizeInGB": 10
    },
    "RoleArn": role,
    "StaticHyperParameters": {
      "eval_metric": "auc",
      "num_round": "100",
      "objective": "binary:logistic",
      "rate_drop": "0.3",
    },
    "StoppingCondition": {
      "MaxRuntimeInSeconds": 3600
    }
}

In [None]:
tuning_job_config = {
    "ParameterRanges": {
      "CategoricalParameterRanges": [],
      "ContinuousParameterRanges": [
        {
          "MaxValue": "1",
          "MinValue": "0",
          "Name": "eta",
        },
        {
          "MaxValue": "10",
          "MinValue": "1",
          "Name": "min_child_weight",
        },
        {
          "MaxValue": "2",
          "MinValue": "0",
          "Name": "alpha",            
        }
      ],
      "IntegerParameterRanges": [
        {
          "MaxValue": "10",
          "MinValue": "1",
          "Name": "max_depth",
        }
      ]
    },
    "ResourceLimits": {
      "MaxNumberOfTrainingJobs": 5,
      "MaxParallelTrainingJobs": 3
    },
    "Strategy": "Bayesian",
    "HyperParameterTuningJobObjective": {
      "MetricName": "validation:auc",
      "Type": "Maximize"
    }
  }

In [None]:
# Create a unique training job name
import random
import string

tuning_job_name = 'xgboost-readmission-'+''.join(random.choice(string.ascii_lowercase + string.digits) for _ in range(8))

smclient = boto3.Session().client('sagemaker')

smclient.create_hyper_parameter_tuning_job(HyperParameterTuningJobName = tuning_job_name,
                                            HyperParameterTuningJobConfig = tuning_job_config,
                                            TrainingJobDefinition = training_job_definition)

In [None]:
# Monitor the status until completed
job_run_status = smclient.describe_hyper_parameter_tuning_job(HyperParameterTuningJobName=tuning_job_name)['HyperParameterTuningJobStatus']
while job_run_status not in ('Failed', 'Completed', 'Stopped'):
    job_run_status = smclient.describe_hyper_parameter_tuning_job(HyperParameterTuningJobName=tuning_job_name)['HyperParameterTuningJobStatus']
    print (job_run_status)
    time.sleep(30)

# Deploying an Amazon SageMaker Endpoint 

Now that we have a set of model artifacts, we can set up an inference pipeline that executes sequentially in Amazon SageMaker. We start by setting up a Model, which will point to all of our model artifacts, then we setup an Endpoint configuration to specify our hardware, and finally we can stand up an Endpoint. With this endpoint, we will pass the raw data and no longer need to write pre-processing logic in our application code. The same pre-processing steps that ran for training can be applied to inference input data for better consistency and ease of management.

Deploying a model in SageMaker requires two components:

Docker image residing in ECR.
Model artifacts residing in S3.
SparkML

For SparkML, Docker image for MLeap based SparkML serving is provided by SageMaker team. For more information on this, please see SageMaker SparkML Serving. MLeap serialized SparkML model was uploaded to S3 as part of the SparkML job we executed in AWS Glue.

XGBoost

For XGBoost, we will use the same Docker image we used for training. The model artifacts for XGBoost was uploaded as part of the training job we just ran.

# Building an Inference Pipeline consisting of SparkML & XGBoost models for a realtime inference endpoint

In [None]:
##Get the best training job name 
best_training_job = smclient.describe_hyper_parameter_tuning_job(HyperParameterTuningJobName=tuning_job_name)['BestTrainingJob']['TrainingJobName']
print ('Best training job : ' + best_training_job)

info = smclient.describe_training_job(TrainingJobName=best_training_job)
best_model_data_loc = info['ModelArtifacts']['S3ModelArtifacts']
print('Model Artifact Location : ' + best_model_data_loc)


### Passing the schema of the payload via environment variable
SparkML serving container needs to know the schema of the request that'll be passed to it while calling the `predict` method. In order to alleviate the pain of not having to pass the schema with every request, `sagemaker-sparkml-serving` allows you to pass it via an environment variable while creating the model definitions. This schema definition will be required in our next step for creating a model.

You can overwrite this schema on a per request basis by passing it as part of the individual request payload as well. If you would like to explore on how to specify schema for each request, you can visit - https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/sparkml_serving_emr_mleap_abalone/sparkml_serving_emr_mleap_abalone.ipynb

In [None]:
import json
schema = {"input":[{"type":"string","name":"encounters_encounterclass"},{"type":"string","name":"patient_gender"},{"type":"string","name":"patient_marital"},{"type":"string","name":"patient_ethnicity"},{"type":"string","name":"patient_race"},{"type":"string","name":"providers_speciality"},{"type":"string","name":"encounters_reasoncode"},{"type":"string","name":"encounters_code"},{"type":"string","name":"procedures_code"},{"type":"double","name":"patient_healthcare_expenses"},{"type":"double","name":"patient_healthcare_coverage"},{"type":"double","name":"encounters_total_claim_cost"},{"type":"double","name":"encounters_payer_coverage"},{"type":"double","name":"encounters_base_encounter_cost"},{"type":"double","name":"procedures_base_cost"},{"type":"long","name":"providers_utilization"},{"type":"double","name":"age"}],"output":{"type":"double","name":"features","struct":"vector"}}
schema_json = json.dumps(schema)
print(schema_json)

### Creating a `PipelineModel` which comprises of the SparkML and XGBoost model in the right order

Next we'll create a SageMaker `PipelineModel` with SparkML and XGBoost.The `PipelineModel` will ensure that both the containers get deployed behind a single API endpoint in the correct order. The same model would later be used for Batch Transform as well to ensure that a single job is sufficient to do prediction against the Pipeline. 

Here, during the `Model` creation for SparkML, we will pass the schema definition that we built in the previous cell.

In [None]:
from sagemaker.model import Model
from sagemaker.pipeline import PipelineModel
from sagemaker.sparkml.model import SparkMLModel
from time import gmtime, strftime
import time

sparkml_model_prefix = 'spark-ml-model/2020/3/13'

timestamp_prefix = strftime("%Y-%m-%d-%H-%M-%S", gmtime())

sparkml_data = 's3://{}/{}/{}'.format(bucket,sparkml_model_prefix,'model.tar.gz')
# passing the schema defined above by using an environment variable that sagemaker-sparkml-serving understands
sparkml_model = SparkMLModel(model_data=sparkml_data, env={'SAGEMAKER_SPARKML_SCHEMA' : schema_json})
xgb_model_data = '{}'.format(best_model_data_loc)
xgb_model = Model(model_data=xgb_model_data, image=training_image)

model_name = 'inference-pipeline-readmission-' + timestamp_prefix
sm_model = PipelineModel(name=model_name, role=role, models=[sparkml_model, xgb_model])

### Deploying the `PipelineModel` to an endpoint for realtime inference
Next we will deploy the model we just created with the `deploy()` method to start an inference endpoint and we will send some requests to the endpoint to verify that it works as expected.

In [None]:
endpoint_name = 'inference-pipeline-readmission-ep-' + timestamp_prefix
sm_model.deploy(initial_instance_count=1, instance_type='ml.m5.4xlarge', endpoint_name=endpoint_name)

In [None]:
# Monitor the status until completed
endpoint_status = sagemaker.describe_endpoint(EndpointName='inference-pipeline-readmission-ep-' + timestamp_prefix)['EndpointStatus']
while endpoint_status not in ('OutOfService','InService','Failed'):
    endpoint_status = sagemaker.describe_endpoint(EndpointName='pipeline-xgboost-readmission')['EndpointStatus']
    print(endpoint_status)
    time.sleep(30)


### Invoking the newly created inference endpoint with a payload to transform the data
Now we will invoke the endpoint with a valid payload that SageMaker SparkML Serving can recognize. There are three ways in which input payload can be passed to the request:

* Pass it as a valid CSV string. In this case, the schema passed via the environment variable will be used to determine the schema. For CSV format, every column in the input has to be a basic datatype (e.g. int, double, string) and it can not be a Spark `Array` or `Vector`.


#### Passing the payload in CSV format
You will first see get the test data to get the prediction validated. Get the test data values for readmission yes and no. Use these values in the request to get the prediction. You will see how the payload can be passed to the endpoint in CSV format.

In [None]:
!pip install pyarrow

In [None]:
## Load Test CSV into pandas
import pandas as pd 
import s3fs
import pyarrow.parquet as pq

fs = s3fs.S3FileSystem()
test_data_prefix = 'test-data/2020/3/13'

# Python 3.6 or later
p_dataset = pq.ParquetDataset(
    f"s3://{bucket}/{test_data_prefix}",
    filesystem=fs
)

test_data = p_dataset.read().to_pandas()


In [None]:
test_data_0 = test_data[(test_data['readmission'] == 0) & (test_data['encounters_reasoncode'] != 0) & (test_data['procedures_code'] != 0)]
test_data_0.head()



In [None]:
test_data_1 = test_data[(test_data['readmission'] == 1) & (test_data['encounters_reasoncode'] != 0) & (test_data['procedures_code'] != 0)]
test_data_1.head()


In [None]:
from sagemaker.predictor import json_serializer, csv_serializer, json_deserializer, RealTimePredictor
from sagemaker.content_types import CONTENT_TYPE_CSV, CONTENT_TYPE_JSON
## Payload schema = encounters_encounterclass,patient_gender,patient_marital,patient_ethnicity,patient_race,
## providers_speciality,encounters_reasoncode,encounters_code,procedures_code,patient_healthcare_expenses,
## patient_healthcare_coverage,encounters_total_claim_cost,encounters_payer_coverage,encounters_base_encounter_cost,
## procedures_base_cost,providers_utilization,age
payload = "ambulatory,F,M,nonhispanic,white,F,72892002,185349003,118001005,210792.63,52916.21,129.16,69.16,129.16,0,14228,63"
predictor = RealTimePredictor(endpoint=endpoint_name, sagemaker_session=sess, serializer=csv_serializer,
                                content_type=CONTENT_TYPE_CSV, accept=CONTENT_TYPE_CSV)
print(predictor.predict(payload))


#### Different payload
Now let's update the request with different values for encounter_class, procedure_code, encounter_code, gender, patient_healthcare_expenses, etc. and see the results

In [None]:
payload = "ambulatory,F,M,hispanic,white,F,72892002,185349003,118001005,210792.63,52916.21,129.16,69.16,129.16,0,14228,63"

print(predictor.predict(payload))


#### [Optional] Deleting the Endpoint
If you do not plan to use this endpoint, then it is a good practice to delete the endpoint so that you do not incur the cost of running it.

In [None]:
boto_session = sess.boto_session
sm_client = boto_session.client('sagemaker')
sm_client.delete_endpoint(EndpointName=endpoint_name)