# Running Strands Agents with Sagemaker Endpoint using Mistral LLM

## Purpose

We will use the Strands Agent SDK to query SageMaker AI Inference Endpoints. We use the LLM `Mistral-Small-24B-Instruct-2501` from Sagemaker JumpStart model hub. We create a Strands agent and use it to invoke a previously created inference component.

Inference Components are a feature of SageMakerAI announced at re:Invent 2023. Inference Components allow models to be deployed and scaled independent of their hosting infrastructure. They are a more efficient way to use the hardware that hosts GPU-accelerated models. We can deploy the Mistral model we just registered to an Inference Component on our host using the below code.

## Prerequisites

To use SageMaker AI endpoints in these examples, you will need to first deploy a managed endpoint. In this example, you will leverage an already-deployed endpoint running Mistral LLM on Sagemaker AI. Below, you will create and use a Strands Agent to invoke the Mistral LLM, and use the agent code to reason about math.

## Dependencies

<div class="alert alert-block alert-info">
⚠️ <b>Important:</b> (1) Make sure you've run the <code>0-setup/1-required-dependencies-strands.ipynb</code> notebook before proceeding. If you haven't, close this notebook, run that notebook first, then come back here.
</div>

<div class="alert alert-block alert-info">
⚠️ <b>Important:</b> (2) To use <b>Amazon SageMaker AI</b> for running the Inference Endpoint, make sure you've run the <code>0-setup/2-setup-mistral-sagemaker-endpoint.ipynb</code> notebook before proceeding. If you haven't, close this notebook, run that notebook first, then come back here.
</div>



## Preparation

### Run this cell to make sure the Strands Agent libraries are installed

In [None]:
%pip show strands-agents strands-agents-tools

### If Strands Agents libraries do not show above, then install them by running this cell

In [None]:
# Uncomment line below to run pip install
# %pip install 'strands-agents[sagemaker]' strands-agents-tools

### Restore names of Endpoint, Endpoint Config, and Inference Component

Previously run notebook should have stored these variables into local memory

In [None]:
%store -r MISTRAL_ENDPOINT_NAME
print(f"Endpoint name: {MISTRAL_ENDPOINT_NAME}")

%store -r MISTRAL_ENDPOINT_CONFIG_NAME
print(f"Endpoint Config Name: {MISTRAL_ENDPOINT_CONFIG_NAME}")

%store -r MISTRAL_INFERENCE_COMPONENT_NAME
print(f"Inference Component Name: {MISTRAL_INFERENCE_COMPONENT_NAME}")

In [None]:
import boto3
import json
from sagemaker import get_execution_role

# Setup role and sagemaker session
iam_role = get_execution_role()
boto_session = boto3.Session(region_name='us-west-2')


## Create Strands Agent and Sagemaker AI Model

First, we create an instance of **SageMakerAIModel** based on the Mistral LLM endpoint previously deployed. 
Next, we create a Strand Agent that wraps that model and allows us to submit queries.

More info: [see Strands Sagemaker Docs](https://strandsagents.com/latest/documentation/docs/user-guide/concepts/model-providers/sagemaker/)

In [None]:
import strands
from strands import Agent
from strands.models.sagemaker import SageMakerAIModel
import logging
import sys

logging.getLogger("strands").setLevel(logging.INFO)
logging.basicConfig(
    format="%(levelname)s | %(name)s | %(message)s",
    handlers=[logging.StreamHandler(sys.stdout)]
)

model = SageMakerAIModel(
    endpoint_config={
        'endpoint_name': MISTRAL_ENDPOINT_NAME,
        'region_name': 'us-west-2',
        'inference_component_name': MISTRAL_INFERENCE_COMPONENT_NAME,
    },
    payload_config={
        'max_tokens': 4000,
        'temperature': 0.1,
        'top_p': 0.9,
        'stream': False
    },
    boto_session=boto_session
)

In [None]:
messages = [
    {"role": "system", "content": "You are a helpful assistant capable of explaining physics concepts."},
    {"role": "user", "content": "Explain the basics of Einstein's Special Theory of Relativity. Also explain how it was proven via actual measurements."}
]

payload = {
    "messages": messages,
    "max_tokens": 4000,
    "temperature": 0.1,
    "top_p": 0.9,
}


In [None]:
agent = Agent(
    model=model,
    system_prompt=messages[0]["content"]
)

result = agent(messages[1]["content"])
print(result)