# Invoke an existing SageMaker endpoint

This notebook lets you run inference on an *existing* SageMaker model endpoint built with an Arcee model listed on the [AWS Marketplace](https://aws.amazon.com/marketplace/seller-profile?id=seller-r7b33ivdczgs6).

This notebook works regardless of the deployment method you used to create the endpoint: AWS CloudFormation, AWS CLI, one of the AWS SDKs, etc.

## Usage instructions
* Run this notebook in the AWS account and in the AWS region where you created the endpoint.
* Make sure that your AWS credentials have the right permissions to invoke the endpoint.
* Make sure that the endpoint is in service before running the inference.
* Replace `YOUR_ENDPOINT_NAME` in the cell below with the name of the SageMaker endpoint you created. Make sure to use the name, not the URL, as visible in the [SageMaker console](https://console.aws.amazon.com/sagemaker/home#/endpoints).

In [None]:
import json
import pprint

import boto3
from IPython.display import Markdown, display

In [None]:
endpoint_name = "YOUR_ENDPOINT_NAME"

### A. Define a test payload

In [None]:
model_sample_input = {
    "model": "tgi",
    "messages": [
        {"role": "system", "content": "You are a friendly and helpful AI assistant."},
        {
            "role": "user",
            "content": "Suggest 5 names for a new neighborhood pet food store. Names should be short, fun, easy to remember, and respectful of pets. \
        Explain why customers would like them.",
        },
    ],
    "max_tokens": 1024,
}

### B. Perform real-time inference

In [None]:
runtime_sm_client = boto3.client("runtime.sagemaker")

In [None]:
response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/json",
    Body=json.dumps(model_sample_input),
)

output = json.loads(response["Body"].read().decode("utf8"))

### C. Visualize output

We can print the raw JSON output in OpenAI format.

In [None]:
pprint.pprint(output)

We can also print the generated output with Markdown formatting.

In [None]:
display(Markdown(output["choices"][0]["message"]["content"]))

Here are some more examples. Please feel free to tweak them and add your own!

In [None]:
model_sample_input = {
    "messages": [
        {
            "role": "system",
            "content": "As a friendly technical assistant engineer, answer the question in detail.",
        },
        {"role": "user", "content": "Why are transformers better models than LSTM?"},
    ],
    "max_tokens": 1024,
}

response = runtime_sm_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="application/json",
    Body=json.dumps(model_sample_input),
)

output = json.loads(response["Body"].read().decode("utf8"))
display(Markdown(output["choices"][0]["message"]["content"]))

This notebook doesn't clean up the endpoint for you. Don't forget to delete it when you're done.

We'd be happy to hear from you, learn more about your use case, and help you build your next AI-driven solution. Please reach out to julien@arcee.ai.