# OpenAI OSS Model Getting Started Guide on Amazon Bedrock

This notebook provides a comprehensive introduction to using gpt-oss-120b & gpt-oss-20b on Amazon Bedrock, including how to leverage the familiar OpenAI SDK interface with Amazon Bedrock. We'll cover how to make API requests, explore available parameters and payload structures, and examine use cases for both model variants. 
## Model Variants

### GPT-OSS-120b

Parameters: 120 billion

Use Cases: Complex reasoning tasks, agentic use cases

### GPT-OSS-20b

Parameters: 20 billion

Use Cases: Faster inference, cost-effective deployments, simpler tasks

## Core Capabilities

Both OpenAI model variants share the following characteristics:

**Input/Output:** Text-in, text-out 

**Context Window:** 128,000 tokens  

**Model Type:** Advanced reasoning models

**Region:** us-west-2

**Bedrock Guardrails** ✅ Supported

**Converse API** ✅ Supported

**OpenAI Chat Completions API** ✅ Supported

**Web Search:** ❌ Not available at this time

All benchmarking data can be viewed [here](https://openai.com/index/introducing-gpt-oss/).


## What You'll Learn in this getting started guide

- Options to use Amazon Bedrock for GPT-OSS inference, including:    
    - Using the OpenAI SDK with Amazon Bedrock
    - Using Amazon Bedrock's InvokeModel API
    - Using Amazon Bedrock's Converse API
- Understanding request parameters and response structures
- Choosing between Large and Small models for your use case
- Implementing tool calling capabilities

## Model Access on Amazon Bedrock

Ensure you have the correct IAM permission in order to access OpenAI's models on Amazon Bedrock. In order to ensure you have model access, follow these steps: 

- Go to Amazon Bedrock --> model access
- Click Modify model access
- Scroll to OpenAI
- Click the checkbox next to the models you would like access to
- Click next & accept any EULAs
- Click submit

## IAM Permissions

To use Bedrock models, your AWS credentials need the following permissions:


In [None]:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": "*"
    }
  ]
}


## Step 1: Environment Configuration

First, we need to install the required packages and tell the OpenAI SDK to talk to Bedrock instead of OpenAI's servers.

### Required Imports:
- `os` → For environment variables
- `boto3` → For native Bedrock API interactions  
- `json` → For JSON serialization/deserialization
- `datetime` → For timestamp tracking and performance measurements
- `openai` → For OpenAI SDK compatibility with Bedrock
- `IPython.display` → For enhanced output formatting and streaming demonstrations

### Environment Variables:
We set two environment variables to redirect the OpenAI SDK:
- `AWS_BEARER_TOKEN_BEDROCK` → Your Bedrock API key  
- `OPENAI_BASE_URL` → Bedrock's OpenAI-compatible endpoint

In [None]:
%pip install boto3 openai ipython

In [None]:
import os
import boto3
import json
import getpass
from openai import OpenAI
from datetime import datetime
from IPython.display import clear_output, display, display_markdown, Markdown

### Model IDs

- **openai.gpt-oss-120b-1:0**
- **openai.gpt-oss-20b-1:0**

In [None]:
# Model Configuration - Change this to your desired model
MODEL_ID = "openai.gpt-oss-120b-1:0"  

print(f"✅ Using model: {MODEL_ID}")

In [None]:
# If you already have AWS_BEARER_TOKEN_BEDROCK in your OS env, this will reuse it.
bedrock_key = os.getenv("AWS_BEARER_TOKEN_BEDROCK")
if not bedrock_key:
    bedrock_key = getpass.getpass("Paste your Bedrock API key: ").strip()


os.environ["OPENAI_API_KEY"] = bedrock_key
os.environ["OPENAI_BASE_URL"] = "https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1"

print("✅ Env set for Bedrock OpenAI endpoint")


## Step 2: Inference with Amazon Bedrock

### Option 1: OpenAI SDK

#### Import and Initialize OpenAI Client

Now we use the **exact same OpenAI SDK** you're familiar with. The client will automatically read the environment variables we just set.

**Key Point**: This is the same OpenAI library, but now it's talking to Amazon Bedrock.

In [None]:
client = OpenAI()  
bedrock_client = boto3.client("bedrock-runtime", region_name="us-west-2")

print("✅ OpenAI client (to Bedrock) ready")
print("✅ Bedrock client region:", bedrock_client.meta.region_name)

#### Make API Calls 

The API call structure is identical to OpenAI:
- Same `messages` format with `role` and `content`
- Same `model` parameter (but uses Bedrock model IDs)  

In [None]:
response = client.chat.completions.create(
    model="openai.gpt-oss-120b-1:0",                 
    messages=[
        {"role": "system", "content": "You are a concise, highly logical assistant."},
        {"role": "user",   "content": "What is the largest city in the southern hemisphere?"}
    ],
    temperature=0.2,                       
)

print("Response:")
print(response.choices[0].message.content)


#### What's Happening Behind the Scenes?

When you use the OpenAI SDK with Bedrock, your requests are automatically translated to Bedrock's native `InvokeModel` API.

#### Request Translation
- **OpenAI SDK Request** → **Bedrock InvokeModel** 
- The request body structure remains the same
- But there are some key differences in how parameters are handled:

| Parameter | OpenAI SDK | Bedrock InvokeModel |
|-----------|------------|-------------------|
| **Model ID** | In request body | Part of the URL path |
| **Request Body** | Full chat completions format | Same format, but `model` and `stream` are optional |

### Option 2: Amazon Bedrock's InvokeModel API

The Bedrock InvokeModel API is the foundational interface for interacting directly with any model hosted on Amazon Bedrock. It provides low-level, flexible access to model inference, allowing you to send input data and receive generated responses in a consistent way across all supported models.

**Key Benefits:**
- Direct Access: Interact with any Bedrock model using a unified API endpoint.
- Fine-Grained Control: Customize inference parameters and payloads for each request.
- Privacy: Amazon Bedrock does not store your input or output data—requests are used only for inference.

#### Setup client

First, we setup the Amazon Bedrock client. 

In [None]:
region = None

if region is None:
    target_region = os.environ.get("AWS_REGION", os.environ.get("AWS_DEFAULT_REGION"))
else:
    target_region = "us-west-2"

bedrock_runtime = boto3.client('bedrock-runtime', region_name=region)

#### Inference with InvokeModel API

Then we use the InvokeModel API to perform model inference. 

In [None]:
def invoke_model(body, model_id, accept, content_type):
    """
    Invokes Amazon bedrock model to run an inference
    using the input provided in the request body.
    
    Args:
        body (dict): The invokation body to send to bedrock
        model_id (str): the model to query
        accept (str): input accept type
        content_type (str): content type
    Returns:
        Inference response from the model.
    """

    try:
        response = bedrock_runtime.invoke_model(
            body=json.dumps(body), 
            modelId=model_id, 
            accept=accept, 
            contentType=content_type
        )

        return response

    except Exception as e:
        print(f"Couldn't invoke {model_id}")
        raise e

messages = [
    {"role": "system", "content": "You are a concise, highly logical assistant."},
    {"role": "user",   "content": "What is the largest city in the northern hemisphere?"}
]
  
# OpenAI OSS models on Bedrock expect OpenAI Chat Completions format
body = {
    "messages": messages,        
    "temperature": 0.2,
}

modelId = "openai.gpt-oss-120b-1:0"
accept = "application/json"
contentType = "application/json"

response = invoke_model(body, modelId, accept, contentType)
response_body = json.loads(response.get("body").read())

# OpenAI format response
print(response_body["choices"][0]["message"]["content"])

### Option 3: Amazon Bedrock's Converse API

The Bedrock Converse API provides a consistent interface for working with all Bedrock models that support messages. This means you can write your code once and use it across different models without changes.

Key Benefits:
- Universal Interface: Same API structure works with Claude, Llama, Titan, and other models
- Model-Specific Parameters: Pass unique parameters when needed for specific models
- Privacy: Amazon Bedrock doesn't store any content you provide - data is only used for response generation
- Advanced Features: Built-in support for guardrails, tools/function calling, and prompt management 

In [None]:
response = bedrock_client.converse(
    modelId=MODEL_ID,
    messages=[
        {
            "role": "user",
            "content": [{"text": "Where is our galaxy relative to others in the Milky Way?"}]
        }
    ],
    system=[{"text": "You are a concise, highly logical assistant."}],
    inferenceConfig={
        "temperature": 0.2
    }
)

content = response.get("output", {}).get("message", {}).get("content", [])
print("".join(block.get("text", "") for block in content).strip() or "(no text)")


## Conclusion

You've successfully explored **three powerful ways** to interact with OpenAI's GPT-OSS models on Amazon Bedrock, including comprehensive tool use capabilities!

### What We've Accomplished

**1. OpenAI SDK Integration**
- Set up the familiar OpenAI SDK to work seamlessly with AWS Bedrock
- Leveraged existing OpenAI patterns while running on AWS infrastructure
- Demonstrated streaming responses with real-time token generation
- **Implemented function calling** using familiar OpenAI SDK patterns

**2. Direct InvokeModel API Access**
- Implemented low-level Bedrock InvokeModel API calls for maximum control
- Built custom functions for both streaming and non-streaming inference
- Measured performance metrics like time-to-first-token for streaming responses
- Gained fine-grained control over model parameters and request formatting
- **Built custom tool use handling** with manual request/response processing

**3. Bedrock Converse API**
- Explored the unified Converse API that works across all Bedrock models
- Demonstrated consistent message-based interactions regardless of underlying model
- Leveraged built-in support for system prompts and inference configuration
- **Integrated tool calling** using Bedrock's native toolSpec format

### Key Benefits Achieved

✅ **Flexibility**: Three different API approaches for different use cases  
✅ **Performance**: Streaming support for improved user experience  
✅ **Familiarity**: Use existing OpenAI SDK patterns with AWS infrastructure  
✅ **Control**: Direct API access when you need fine-grained customization  
✅ **Consistency**: Universal interface that works across all Bedrock models  
✅ **Privacy**: AWS Bedrock doesn't store your data - only used for inference  
✅ **Practical Comparison**: Side-by-side examples using the same function

### What's Next?

You're now equipped with comprehensive knowledge to choose the right API approach for your specific use case. Whether you need:
- The **simplicity** of the OpenAI SDK
- The **control** of InvokeModel 
- The **consistency** of Converse API

You have all the tools and examples to build powerful AI applications with external function calling on AWS Bedrock!

## Use Cases


Let's test reasoning with the Chat Completions API:

In [None]:
def run_prompt(prompt, temperature=0.02):
    """Helper function to run prompts with chat completions"""
    response = client.chat.completions.create(
        model=MODEL_ID, # update your model id to either 120b or 20b
        messages=[{"role": "user", "content": prompt}],
        temperature=temperature,
    )
    return response.choices[0].message.content

print("✅ Helper function ready!")

## Reasoning Use Case Examples

Testing complex reasoning and problem-solving capabilities with the Chat Completions API.

### Exponential Growth Problem

Tests compound growth calculations and iterative reasoning

In [None]:
prompt = """Solve and verify.

A factory makes widgets. Day 1 output is 500. Each day after, output increases by 8%.
How many days until daily output first exceeds 1,000?

Return:
- Short reasoning outline (≤4 lines)
- Final answer (integer days)
- One-line self-check (plug back)."""

print(run_prompt(prompt))

### Kinematics Problem

Tests multi-phase motion analysis and distance calculations

In [None]:
prompt = """A car accelerates uniformly from rest to 27 m/s in 9 s, then coasts at constant speed for 60 s, then brakes uniformly to rest in 6 s.
Compute total distance traveled.

Return:
- Piecewise distances
- Sum
- One-line unit/sanity check."""

print(run_prompt(prompt))

### Scientific Problem Solving

In [None]:
science_prompt = """A ball is thrown vertically upward from ground level with an initial velocity of 20 m/s.
Calculate:
1. Maximum height reached
2. Time to reach maximum height  
3. Total time in the air
4. Velocity when it returns to ground level

Use g = 9.8 m/s². Show your physics reasoning and formulas."""

print("Scientific Problem Solving:")
print("=" * 50)
print(run_prompt(science_prompt))

### System Design 

In [None]:
# System Design Example  
system_prompt = """Design a URL shortener service like bit.ly for 100M URLs per day.
Address:
1. Database schema design
2. Encoding/decoding algorithm choice
3. Scalability considerations (caching, load balancing)
4. One potential bottleneck and solution

Keep response concise but show your reasoning."""

print("System Design Example:")
print("=" * 50)
print(run_prompt(system_prompt))