# Deploy OpenAI gpt-oss model on SageMaker AI

In this notebook we deploy OpenAI gpt-oss model on SageMaker AI using several options.

Please see the OpenAI introduction [blog](https://simonwillison.net/2025/Aug/5/gpt-oss/) for more details

## Step 1: Setup

Fetch and import dependencies

In [None]:
%pip install sagemaker --upgrade --quiet --no-warn-conflicts

In [None]:
import json
import sagemaker
import boto3

role = sagemaker.get_execution_role()  # execution role for the endpoint
sess = sagemaker.session.Session()  # sagemaker session for interacting with different AWS APIs
bucket = sess.default_bucket()  # bucket to house artifacts
region = sess._region_name  # region name of the current SageMaker Studio environment

sm_client = boto3.client("sagemaker")  # client to intreract with SageMaker
smr_client = boto3.client("sagemaker-runtime")  # client to intreract with SageMaker Endpoints

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")
print(f"sagemaker version: {sagemaker.__version__}")

## Option 1. Deploy gpt-oss-120b from JumpStart 

We will use Inference Component enabled endpoint

In [None]:
from sagemaker.jumpstart.model import JumpStartModel
from sagemaker.compute_resource_requirements.resource_requirements import ResourceRequirements

accept_eula = False  # Change to True to agree to term and conditions and accept EULA
model_id, model_version = "openai-reasoning-gpt-oss-120b", "1.0.0"

model_name = endpoint_name = sagemaker.utils.name_from_base("gpt-oss-120b")
inference_component_name = f"ic-{model_name}"

jumpstart_model = JumpStartModel(
    model_id=model_id,
    model_version=model_version,
    name=model_name
)

jumpstart_model.deploy(
    accept_eula=accept_eula,
    instance_type="ml.p5en.48xlarge",
    initial_instance_count=1,
    container_startup_health_check_timeout=900,
    endpoint_name=endpoint_name,
    endpoint_type=sagemaker.enums.EndpointType.INFERENCE_COMPONENT_BASED,
    inference_component_name=inference_component_name,
    resources=ResourceRequirements(requests={"num_accelerators": 8, "memory": 1024*10, "copies": 1,}),
)
llm = sagemaker.Predictor(
    endpoint_name=endpoint_name,
    sagemaker_session=sess,
    serializer=sagemaker.serializers.JSONSerializer(),
    deserializer=sagemaker.deserializers.JSONDeserializer(),
    component_name=inference_component_name
)

## Inference Examples

In [7]:
payload={
    "messages": [
        {"role": "user", "content": "Name popular places to visit in London?"}
    ],
}
res = llm.predict(payload)
print("-----\n" + res["choices"][0]["message"]["content"] + "\n-----\n")
print(res["usage"])

-----
analysisThe user asks: "Name popular places to visit in London?" Simple answer. Provide list. Possibly brief description. Should be concise.assistantfinalHere are some of the most popular attractions and neighborhoods that visitors love to explore in London:

| Attraction / Area | What to See / Do | Why It’s Popular |
|-------------------|------------------|------------------|
| **The British Museum** | World‑class art and antiquities (Rosetta Stone, Egyptian mummies) | Free entry and a treasure‑trove of global history |
| **The Tower of London** | Crown Jewels, historic royal palace, Yeoman Warders | Dark, dramatic history + iconic medieval walls |
| **Buckingham Palace** | Changing of the Guard, State Rooms (summer) | Symbol of the British monarchy |
| **Westminster Abbey & Houses of Parliament** | Gothic architecture, Big Ben, Westminster Hall | Political and religious heart of the nation |
| **The London Eye** | 30‑minute rotation with panoramic city views | Modern skyline ex

In [8]:
payload={
    "messages": [
        {"role": "user", "content": "What is bigger 9.11 or 9.8?"}
    ],
}
res = llm.predict(payload)
print("-----\n" + res["choices"][0]["message"]["content"] + "\n-----\n")
print(res["usage"])

-----
analysisThe user asks: "What is bigger 9.11 or 9.8?" Basically asking which number is larger. 9.8 is larger than 9.11. Provide answer. Possibly explain.assistantfinal9.8 is the larger number. It’s greater than 9.11.
-----

{'prompt_tokens': 94, 'total_tokens': 164, 'completion_tokens': 70, 'prompt_tokens_details': None}


## Cleanup

In [9]:
sess.delete_inference_component(inference_component_name)
sess.delete_endpoint(endpoint_name)
sess.delete_endpoint_config(endpoint_name)
sess.delete_model(model_name)