# Spark ML Model Validation

#### Define schema for Spark ML model inference

The schema is identified from the features Vector that was used to train the model

In [None]:
import json
schema = {"input":[{"type":"string","name":"encounters_encounterclass"},{"type":"string","name":"patient_gender"},{"type":"string","name":"patient_marital"},{"type":"string","name":"patient_ethnicity"},{"type":"string","name":"patient_race"},{"type":"string","name":"encounters_reasoncode"},{"type":"string","name":"encounters_code"},{"type":"string","name":"procedures_code"},{"type":"double","name":"patient_healthcare_expenses"},{"type":"double","name":"patient_healthcare_coverage"},{"type":"double","name":"encounters_total_claim_cost"},{"type":"double","name":"encounters_payer_coverage"},{"type":"double","name":"encounters_base_encounter_cost"},{"type":"double","name":"procedures_base_cost"},{"type":"long","name":"providers_utilization"},{"type":"double","name":"age"}],"output":{"type":"double","name":"features","struct":"vector"}}
schema_json = json.dumps(schema)
print(schema_json)


## Creating SageMaker model from the model artifacts on S3 Bucket 

You need to update **s3_model_bucket** and **s3_model_bucket_prefix** as per your environment values.

In [None]:
s3_model_bucket = "" ## UPDATE with S3 bucket name


In [None]:
!aws s3 ls 's3://'$s3_model_bucket'/spark-ml-model' --recursive

In [None]:
## UPDATE the S3 prefix from the above output excluding /model.tar.gz
s3_model_bucket_prefix = "spark-ml-model/2020/4/9" 

In [None]:
from time import gmtime, strftime
import time

timestamp_prefix = strftime("%Y-%m-%d-%H-%M-%S", gmtime())

import sagemaker
from sagemaker import get_execution_role
from sagemaker.sparkml.model import SparkMLModel

sess = sagemaker.Session()
role = get_execution_role()

# S3 location of where you uploaded your trained and serialized SparkML model
sparkml_data = 's3://{}/{}/{}'.format(s3_model_bucket, s3_model_bucket_prefix, 'model.tar.gz')
model_name = 'sparkml-abalone-' + timestamp_prefix
sparkml_model = SparkMLModel(model_data=sparkml_data, 
                             role=role, 
                             sagemaker_session=sess, 
                             name=model_name,
                             # passing the schema defined above by using an environment 
                             #variable that sagemaker-sparkml-serving understands
                             env={'SAGEMAKER_SPARKML_SCHEMA' : schema_json})




### Deploy SageMaker model for real time prediction

In [None]:
endpoint_name = 'sparkml-abalone-ep-' + timestamp_prefix
sparkml_model.deploy(initial_instance_count=1, instance_type='ml.c4.large', endpoint_name=endpoint_name)

### Invoking the newly created inference endpoint with a payload to transform the data
Now we will invoke the endpoint with a valid payload that SageMaker SparkML Serving can recognize. There are three ways in which input payload can be passed to the request:

* Pass it as a valid CSV string. In this case, the schema passed via the environment variable will be used to determine the schema. For CSV format, every column in the input has to be a basic datatype (e.g. int, double, string) and it can not be a Spark `Array` or `Vector`.

In [None]:
## Update the below payload with test data as per above defined schema 
payload = "outpatient,M,S,hispanic,white,271737000,185347001,430193006,262241.40,2324.88,129.16,64.16,129.16,526.17,16,40"


In [None]:
from sagemaker.predictor import json_serializer, csv_serializer, json_deserializer, RealTimePredictor
from sagemaker.content_types import CONTENT_TYPE_CSV, CONTENT_TYPE_JSON
predictor = RealTimePredictor(endpoint=endpoint_name, sagemaker_session=sess, serializer=csv_serializer,
                                content_type=CONTENT_TYPE_CSV, accept=CONTENT_TYPE_CSV)
print(predictor.predict(payload))

#### [Optional] Deleting the Endpoint
If you do not plan to use this endpoint, then it is a good practice to delete the endpoint so that you do not incur the cost of running it.

In [None]:
boto_session = sess.boto_session
sm_client = boto_session.client('sagemaker')
sm_client.delete_endpoint(EndpointName=endpoint_name)