# Launch GraphStorm Real-time Inference Endpoint

In the previous notebook, you trained a simple model on the GraphStorm training data, and optionally started a long-running advanced model run. 

Now, based on the trained model artifacts, you will deploy the model as a SageMaker real-time inference endpoint so that you can perform online inference on single transactions in real-time!

----

## Prerequisites

**This notebook is designed to run from within the SageMaker Notebook instance that you created as part of the CDK deployment.**

This is necessary, as we set up the roles and permissions to allow the SageMaker Notebook to post queries to the Neptune Database instance the CDK created.

If you used the included CDK to deploy the project and are running this notebook on the SageMaker Notebook instance we created as part of the CDK, you should have all roles already set up and can proceed with the notebook.

Otherwise, to launch a GraphStorm real-time inference endpoint on SageMaker, you will need to have an AWS account and access to the following AWS services.

- SageMaker service. Needed to launch endpoint, and optionally train models. Please refer to [Amazon SageMaker service](https://aws.amazon.com/pm/sagemaker/) for how to get access to Amazon SageMaker.
- Amazon ECR. Needed to store GraphStorm Sagemaker Docker images. Please refer to [Amazon Elastic Container Registry service](https://aws.amazon.com/ecr/) for how to get access to Amazon ECR.
- S3 service. Needed for input and output for SageMaker. Please refer to [Amazon S3 service](https://aws.amazon.com/s3/) for how to get access to Amazon S3.



You will use GraphStorm's SageMaker scripts in this notebook that depend on the SageMaker SDK and boto3 Python libraries.

In [None]:
# Clone graphstorm to get access to the endpoint launch script
import os

GS_HOME = f"{os.environ.get('HOME')}/SageMaker/graphstorm"

!git clone https://github.com/awslabs/graphstorm.git {GS_HOME}

### Step 1: Setup GraphStorm Real-time Inference Docker Image

GraphStorm real-time inference endpoint relies on SageMaker Bring Your Own Container (BYOC), using a Docker image which can be built locally and pushed to Amazon ECR with the GraphStorm-provided scripts. In this step you will build the GraphStorm endpoint image, and push it to ECR to make it available to use during endpoint deployment.

In [None]:
# Retrieve resources created by CDK
import json
import os

with open(
    f"{os.environ['HOME']}/SageMaker/cdk_outputs.json", "r", encoding="utf-8"
) as f:
    cdk_outputs = json.load(f)

ACCOUNT_ID = cdk_outputs["ACCOUNT_ID"]
AWS_REGION = cdk_outputs["AWS_REGION"]

Building and pushing the image to ECR should take around 3 minutes

In [None]:
# Build the GraphStorm real-time inference Docker image to be used on CPUs
!bash $GS_HOME/docker/build_graphstorm_image.sh --environment sagemaker-endpoint --device cpu > /dev/null 2>&1

In [None]:
# Will push an image to '<accound_id>.dkr.ecr.<aws_region>.amazonaws.com/graphstorm:sagemaker-endpoint-cpu'
!bash $GS_HOME/docker/push_graphstorm_image.sh --environment sagemaker-endpoint --device cpu --region $AWS_REGION --account $ACCOUNT_ID

### Step 2: Launch a SageMaker Real-time Inference endpoint

To launch a SageMaker real-time inference endpoint, you will need three model artifacts that were generated during graph construction (GConstruct/GSProcessing) and model training.

- The saved model prefix path that contains the ``model.bin`` file. This file contains the trained model weights, and GraphStorm will create one such file for every epoch.
- The updated graph construction JSON file, ``data_transform_new.json``. This JSON file is one of the outputs of graph construction pipeline and model training pipeline. It contains updated information about feature transformations and feature dimensions.
- The updated model training configuration YAML file, ``GRAPHSTORM_RUNTIME_UPDATED_TRAINING_CONFIG.yaml``. This YAML file is one of the outputs of model training. It contains the original hyperparameters used to train the 
model, and any runtime parameters provided during training to ensure the model architecture matches.

You can find these model artifacts under the ``./model-simple/`` folder produced in the previous notebook for model training.

First, load the saved path from the JSON file you created in the last notebook.

In [None]:
import json

with open("task_config.json", "r", encoding="utf-8") as f:
    task_config = json.load(f)

MODEL_PATH = task_config["MODEL_PATH"]

In [None]:
!ls -l $MODEL_PATH

Using the saved model artifacts, you can run the script below to launch a SageMaker real-time inference endpoint to host the trained GraphStorm model to accept inference requests.

> Your account will need to have a quota to be able to deploy the endpoint of the specific instance type suggested, `ml.c6i.xlarge`.
  You can view your quotas and request increases at the [AWS Quotas console](https://us-east-1.console.aws.amazon.com/servicequotas/home/services/sagemaker/quotas)
    

To allow easier communication between NeptuneDB, the SageMaker notebook instance, and the SageMaker endpoint, we are deploying all of them in the same VPC.

To do that, we need information about which subnets to deploy the VPC in, and the security group to attach to it that allow it to communicate with other resources within the VPC (like this SageMaker Notebook instance).

In [None]:
# You need an S3 location to upload the model artifacts to
S3_BUCKET = cdk_outputs["NDB_STACK_S3_BUCKET"]
# The endpoint needs an execution role with a number of permissions that allow it to function
ENDPOINT_ROLE = cdk_outputs["NDB_STACK_ENDPOINT_ROLE"]
# We are deploying the endpoint within the same VPC as the NeptuneDB cluster, so we need that information at deployment time
VPC_SUBNET_IDS = " ".join(cdk_outputs["VPC_SUBNET_IDS"])
VPC_SECURITY_GROUP_IDS = " ".join(cdk_outputs["VPC_SECURITY_GROUP_IDS"])

# Build up training command from variables
command = f"""python {GS_HOME}/sagemaker/launch/launch_realtime_endpoint.py \
        --image-uri "{ACCOUNT_ID}.dkr.ecr.{AWS_REGION}.amazonaws.com/graphstorm:sagemaker-endpoint-cpu" \
        --role {ENDPOINT_ROLE} \
        --region {AWS_REGION} \
        --instance-type ml.c6i.xlarge \
        --restore-model-path {MODEL_PATH}/epoch-1 \
        --model-yaml-config-file {MODEL_PATH}/GRAPHSTORM_RUNTIME_UPDATED_TRAINING_CONFIG.yaml \
        --graph-json-config-file {MODEL_PATH}/data_transform_new.json \
        --infer-task-type node_classification \
        --upload-tarfile-s3 s3://{S3_BUCKET}/model-artifacts \
        --model-name ieee-fraud-detection \
        --vpc-subnet-ids {VPC_SUBNET_IDS} \
        --vpc-security-group-ids {VPC_SECURITY_GROUP_IDS} \
        --async-execution false"""

The script will deploy the endpoint using boto3 to create the 3 AWS resources that are needed to deploy a SageMaker endpoint:

1. A [SageMaker Model](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModel.html) that describes the kind of model we are planning to deploy, e.g. which image to use 
   which VPC to deploy the model into.
2. A [SageMaker Endpoint Config](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpointConfig.html) that describes the endpoint, e.g. which instance type to use during deployment.
3. A [SageMaker Endpoint](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpoint.html) which uses the previous two to finally launch and deploy the endpoint, making it 
   available for inference.

The scripts takes the Docker image you previously built and pushed along with these three model artifacts to launch a SageMaker endpoint based on the model name, ``ieee-fraud-detection``. 

Outputs of this command include the launched endpoint name based on the value for ``--model-name``.

The endpoint name will look like `ieee-fraud-detection-Endpoint-<timestamp>`. 

**You will need the endpoint name in the next notebook!**

The endpoint name can also be found from [Amazon SageMaker AI Web console](https://us-east-1.console.aws.amazon.com/sagemaker/home?region=us-east-1#/endpoints) under the``Inference -> Endpoints`` menu.

In [None]:
# Print Python command before executing
print(command)

For detailed explanation of arguments of the `launch_realtime_endpoint.py` script, please refer to [the GraphStorm documentation](https://graphstorm.readthedocs.io/en/latest/cli/model-training-inference/real-time-inference.html#deploy-a-sagemaker-real-time-inference-endpoint)

In [None]:
# Execute deploy command
!{command}

Once the deployment process is done, **ensure you have made a note of the endpoint name**, and proceed to the next and final notebook, `4-Sample-graph-and-invoke-endpoint.ipynb`. 

There you will see how you can in real-time sample your graph from NeptuneDB, pack the response in a JSON payload, and use that to make a prediction using the SageMaker endpoint you just launched!