# Deploy OpenAI gpt-oss model on SageMaker AI

In this notebook we deploy OpenAI gpt-oss model on SageMaker AI using several options.

Please see the OpenAI introduction [blog](https://simonwillison.net/2025/Aug/5/gpt-oss/) for more details

## Step 1: Setup

Fetch and import dependencies

In [None]:
%pip install sagemaker --upgrade --quiet --no-warn-conflicts

In [None]:
import json
import sagemaker
import boto3

role = sagemaker.get_execution_role()  # execution role for the endpoint
sess = sagemaker.session.Session()  # sagemaker session for interacting with different AWS APIs
bucket = sess.default_bucket()  # bucket to house artifacts
region = sess._region_name  # region name of the current SageMaker Studio environment

sm_client = boto3.client("sagemaker")  # client to intreract with SageMaker
smr_client = boto3.client("sagemaker-runtime")  # client to intreract with SageMaker Endpoints

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")
print(f"sagemaker version: {sagemaker.__version__}")

## Option 1. Deploy gpt-oss-120b from JumpStart 

We will use Inference Component enabled endpoint

In [None]:
from sagemaker.jumpstart.model import JumpStartModel
from sagemaker.compute_resource_requirements.resource_requirements import ResourceRequirements

accept_eula = False  # Change to True to agree to term and conditions and accept EULA
model_id, model_version = "openai-reasoning-gpt-oss-120b", "1.0.0"

model_name = endpoint_name = sagemaker.utils.name_from_base("gpt-oss-120b")
inference_component_name = f"ic-{model_name}"

jumpstart_model = JumpStartModel(
    model_id=model_id,
    model_version=model_version,
    name=model_name
)

jumpstart_model.deploy(
    accept_eula=accept_eula,
    instance_type="ml.p5en.48xlarge",
    initial_instance_count=1,
    container_startup_health_check_timeout=900,
    endpoint_name=endpoint_name,
    endpoint_type=sagemaker.enums.EndpointType.INFERENCE_COMPONENT_BASED,
    inference_component_name=inference_component_name,
    resources=ResourceRequirements(requests={"num_accelerators": 8, "memory": 1024*10, "copies": 1,}),
)
llm = sagemaker.Predictor(
    endpoint_name=endpoint_name,
    sagemaker_session=sess,
    serializer=sagemaker.serializers.JSONSerializer(),
    deserializer=sagemaker.deserializers.JSONDeserializer(),
    component_name=inference_component_name
)

## Option 2. Deploy gpt-oss model from S3

If you need to deploy these models from S3 (for example, after fine-tuning) you can use the code below.

Please change the `model_s3_path` to the S3 prefix with your model weights

In [None]:
inference_image = f"763104351884.dkr.ecr.{region}.amazonaws.com/djl-inference:0.0.0.dev1-lmi0.0.0-cu128"
instance_type = "ml.p5en.48xlarge"

model_s3_path = "s3://<BUCKET>/<PREFIX>/"

lmi_env = {
    "OPTION_MODEL_ID": "/opt/ml/model",
    #"TIKTOKEN_ENCODINGS_BASE": "/opt/ml/model",
    "OPTION_TENSOR_PARALLEL_SIZE": "8",
}

model_name = sagemaker.utils.name_from_base("model-lmi")
endpoint_name = model_name
inference_component_name = f"ic-{model_name}"

In [None]:
from sagemaker.compute_resource_requirements.resource_requirements import ResourceRequirements

lmi_model = sagemaker.Model(
    image_uri=inference_image,
    env=lmi_env,
    role=role,
    name=model_name,
    model_data={
        'S3DataSource': {
            'S3Uri': model_s3_path,
            'S3DataType': 'S3Prefix',
            'CompressionType': 'None'
        }
    },
)

lmi_model.deploy(
    initial_instance_count=1,
    instance_type=instance_type,
    container_startup_health_check_timeout=600,
    endpoint_name=endpoint_name,
    endpoint_type=sagemaker.enums.EndpointType.INFERENCE_COMPONENT_BASED,
    inference_component_name=inference_component_name,
    resources=ResourceRequirements(requests={"num_accelerators": 8, "memory": 1024*10, "copies": 1,}),
)

llm = sagemaker.Predictor(
    endpoint_name=endpoint_name,
    sagemaker_session=sess,
    serializer=sagemaker.serializers.JSONSerializer(),
    deserializer=sagemaker.deserializers.JSONDeserializer(),
    component_name=inference_component_name
)

## Inference Examples

In [36]:
payload={
    "messages": [
        {"role": "user", "content": "Name popular places to visit in London?"}
    ],
}
res = llm.predict(payload)
print("-----\n" + res["choices"][0]["message"]["content"] + "\n-----\n")
print(res["usage"])

-----
analysisUser asks: "Name popular places to visit in London?" Likely expecting a list of popular tourist attractions. Provide many and brief notes. Could categorize: historical, iconic. Provide maybe 10-15. Provide short description. No need to be exhaustive. I'll output nicely formatted list.assistantfinalHere’s a quick “London Must‑Visit” list that covers the city’s most iconic sights, historic landmarks, cultural hubs, and hidden gems. Feel free to tweak the list based on your interests (history, art, shopping, nature, etc.)!

| # | Attraction | Why It’s Popular | Quick Tip |
|---|------------|------------------|-----------|
| 1 | **The British Museum** | Houses the Rosetta Stone, Egyptian mummies, and a vast collection of world artefacts. | Free entry; early morning or mid‑week slots avoid crowds. |
| 2 | **Buckingham Palace** & **Changing of the Guard** | The official residence of the monarch; the guard ceremony is a quintessential London spectacle. | Check the guard change s

In [37]:
payload={
    "messages": [
        {"role": "user", "content": "What is bigger 9.11 or 9.8?"}
    ],
}
res = llm.predict(payload)
print("-----\n" + res["choices"][0]["message"]["content"] + "\n-----\n")
print(res["usage"])

-----
analysisThe user asks: "What is bigger 9.11 or 9.8?" They likely refer to numeric values. 9.8 > 9.11? Let's compare: 9.8 equals 9.80 as decimal. 9.11 is 9.11. 9.8 > 9.11 because 9.8 is 9.80 which is greater than 9.11. Indeed, 9.80 > 9.11. So answer: 9.8 is bigger. Provide explanation. Also maybe mention that 9.11 is 9.11. 9.8 > 9.11. Provide maybe some context if they refer to something else. Probably ask to pick bigger numeric value. Provide simple answer. Also can mention that 9.8 > 9.11. Show decimal places one by one. 9.8 = 9.80 > 9.11. So answer: 9.8.assistantfinal**9.8 is bigger.**

- 9.8 is the same as 9.80.
- 9.80 > 9.11 because 0.80 > 0.11.

So among the two numbers, 9.8 is the larger value.
-----

{'prompt_tokens': 80, 'total_tokens': 354, 'completion_tokens': 274, 'prompt_tokens_details': None}


## Cleanup

In [38]:
sess.delete_inference_component(inference_component_name)
sess.delete_endpoint(endpoint_name)
sess.delete_endpoint_config(endpoint_name)
sess.delete_model(model_name)