# Running Strands Agents with Sagemaker Endpoint using Mistral LLM

## Purpose

We will use the Strands Agent SDK to query SageMaker AI Inference Endpoints. We use the LLM `Mistral-Small-24B-Instruct-2501` from Sagemaker JumpStart model hub. We create a Strands agent and use it to invoke a previously created inference component.

Inference Components are a feature of SageMakerAI announced at re:Invent 2023. Inference Components allow models to be deployed and scaled independent of their hosting infrastructure. They are a more efficient way to use the hardware that hosts GPU-accelerated models. We can deploy the Mistral model we just registered to an Inference Component on our host using the below code.

## Prerequisites

To use SageMaker AI endpoints in these examples, you will need to first deploy a managed endpoint. In this example, you will leverage an already-deployed endpoint running Mistral LLM on Sagemaker AI. Below, you will create and use a Strands Agent to invoke the Mistral LLM, and use the agent code to reason about math.

## Dependencies

<div class="alert alert-block alert-info">
⚠️ <b>Important:</b> (1) Make sure you've run the <code>0-setup/1-required-dependencies-strands.ipynb</code> notebook before proceeding. If you haven't, close this notebook, run that notebook first, then come back here.
</div>

<div class="alert alert-block alert-info">
⚠️ <b>Important:</b> (2) To use <b>Amazon SageMaker AI</b> for running the Inference Endpoint, make sure you've run the <code>0-setup/2-setup-mistral-sagemaker-endpoint.ipynb</code> notebook before proceeding. If you haven't, close this notebook, run that notebook first, then come back here.
</div>



## Preparation

### Run this cell to make sure the Strands Agent libraries are installed

In [1]:
%pip show strands-agents strands-agents-tools

Name: strands-agents
Version: 1.4.0
Summary: A model-driven approach to building AI agents in just a few lines of code
Home-page: https://github.com/strands-agents/sdk-python
Author: 
Author-email: AWS <opensource@amazon.com>
License: Apache-2.0
Location: /opt/conda/lib/python3.12/site-packages
Requires: boto3, botocore, docstring-parser, mcp, opentelemetry-api, opentelemetry-instrumentation-threading, opentelemetry-sdk, pydantic, typing-extensions, watchdog
Required-by: strands-agents-builder, strands-agents-tools
---
Name: strands-agents-tools
Version: 0.2.12
Summary: A collection of specialized tools for Strands Agents
Home-page: https://github.com/strands-agents/tools
Author: 
Author-email: AWS <opensource@amazon.com>
License: Apache-2.0
Location: /opt/conda/lib/python3.12/site-packages
Requires: aiohttp, aws-requests-auth, botocore, dill, markdownify, pillow, prompt-toolkit, pyjwt, requests, rich, slack-bolt, strands-agents, sympy, tenacity, typing-extensions, watchdog
Required-by

### If Strands Agents libraries do not show above, then install them by running this cell

In [None]:
# Uncomment line below to run pip install
# %pip install 'strands-agents[sagemaker]' strands-agents-tools

### Restore names of Endpoint, Endpoint Config, and Inference Component

Previously run notebook should have stored these variables into local memory

In [2]:
%store -r MISTRAL_ENDPOINT_NAME
print(f"Endpoint name: {MISTRAL_ENDPOINT_NAME}")

%store -r MISTRAL_ENDPOINT_CONFIG_NAME
print(f"Endpoint Config Name: {MISTRAL_ENDPOINT_CONFIG_NAME}")

%store -r MISTRAL_INFERENCE_COMPONENT_NAME
print(f"Inference Component Name: {MISTRAL_INFERENCE_COMPONENT_NAME}")

Endpoint name: strands-endpoint-001
Endpoint Config Name: strands-endpoint-config
Inference Component Name: mistral-24b-instruct-2501-ic


In [3]:
import boto3
import json
from sagemaker import get_execution_role

# Setup role and sagemaker session
iam_role = get_execution_role()
boto_session = boto3.Session(region_name='us-west-2')


sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /home/sagemaker-user/.config/sagemaker/config.yaml


## Create Strands Agent and Sagemaker AI Model

First, we create an instance of **SageMakerAIModel** based on the Mistral LLM endpoint previously deployed. 
Next, we create a Strand Agent that wraps that model and allows us to submit queries.

More info: [see Strands Sagemaker Docs](https://strandsagents.com/latest/documentation/docs/user-guide/concepts/model-providers/sagemaker/)

In [5]:
import strands
from strands import Agent
from strands.models.sagemaker import SageMakerAIModel
import logging
import sys

logging.getLogger("strands").setLevel(logging.INFO)
logging.basicConfig(
    format="%(levelname)s | %(name)s | %(message)s",
    handlers=[logging.StreamHandler(sys.stdout)]
)

model = SageMakerAIModel(
    endpoint_config={
        'endpoint_name': MISTRAL_ENDPOINT_NAME,
        'region_name': 'us-west-2',
        'inference_component_name': MISTRAL_INFERENCE_COMPONENT_NAME,
    },
    payload_config={
        'max_tokens': 4000,
        'temperature': 0.1,
        'top_p': 0.9,
        'stream': False
    },
    boto_session=boto_session
)

In [7]:
messages = [
    {"role": "system", "content": "You are a helpful assistant capable of explaining physics concepts."},
    {"role": "user", "content": "Explain the basics of Einstein's Special Theory of Relativity. Also explain how it was proven via actual measurements."}
]

payload = {
    "messages": messages,
    "max_tokens": 4000,
    "temperature": 0.1,
    "top_p": 0.9,
}


In [8]:
agent = Agent(
    model=model,
    system_prompt=messages[0]["content"]
)

result = agent(messages[1]["content"])
print(result)

INFO | strands.models.sagemaker | payload=<{
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant capable of explaining physics concepts."
    },
    {
      "role": "user",
      "content": [
        {
          "text": "Explain the basics of Einstein's Special Theory of Relativity. Also explain how it was proven via actual measurements.",
          "type": "text"
        }
      ]
    }
  ],
  "max_tokens": 4000,
  "temperature": 0.1,
  "top_p": 0.9,
  "stream": false
}>
INFO | strands.models.sagemaker | response=<{
  "id": "chatcmpl-4f2db84c9278430e87541bf9d88f4499",
  "object": "chat.completion",
  "created": 1761247493,
  "model": "lmi",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": null,
        "content": "Einstein's Special Theory of Relativity, published in 1905, is a fundamental theory in physics that describes the relationship between space and time. Here are the basi