# re:Invent 2025 - AIM311: Optimize Open Weight Models for Low-Latency, Cost-Effective AI Apps

## Lab 1b: API Integration Options
**Duration**: 15 minutes  
**Focus**: Understanding and using different APIs to call Bedrock models

### What You'll Learn

In this lab, you'll explore three different ways to call models on Amazon Bedrock:

1. **Invoke API** - Low-level control for production systems
2. **Converse API** - Bedrock-native multi-turn conversations
3. **ChatCompletions API** - OpenAI-compatible for easy migration

Each API has different strengths and use cases. By the end of this lab, you'll understand when to use each approach.

**üí≠ Think about:** If you're building a new application from scratch, which API would you choose? Consider your team's experience level, whether you're migrating from OpenAI, and if you need Bedrock-specific features.

---

## API Decision Table

Amazon Bedrock offers three different APIs for calling models. Here's a comparison to help you choose the right one for your use case:

| Factor | Invoke API | Converse API | ChatCompletions |
|--------|-----------|--------------|------------------|
| **Learning Curve** | High - Manual JSON handling | Medium - Bedrock-native patterns | Low - Familiar OpenAI patterns |
| **Model Support** | All Bedrock models | All Bedrock models | Limited to compatible models |
| **Control Level** | Maximum - Direct payload access | High - Structured interface | Medium - Abstracted interface |
| **Built-in Features** | None - Manual implementation | Native tool calling, guardrails, streaming | Standard OpenAI features |
| **Conversation Management** | Manual | Automatic message history | Automatic message history |
| **Type Safety** | No - JSON strings | Yes - Structured objects | Yes - SDK types |
| **Debugging** | Easy - Full visibility | Good - Structured errors | Good - SDK error handling |
| **Migration Effort** | N/A | Medium - Learn Bedrock patterns | Low - Drop-in OpenAI replacement |
| **Ecosystem Compatibility** | Custom integration needed | Large ecosystem based on Bedrock-specific implementations | Very large ecosystem based on OpenAI-specific implementations |
| **Performance** | Fastest - Minimal overhead | Fast - Optimized for Bedrock | Fast - SDK optimized |


### Quick Selection Guide

| API type | Use Cases |
|-----|-----------|
| Invoke API | Maximum control over request/response payloads<br>Building custom integrations<br>Fastest performance with minimal overhead<br>Access to model-specific parameters |
| Converse API | Building Bedrock-native applications<br>Multi-turn conversations or tool calling<br>Using Bedrock guardrails<br>Structured, type-safe interfaces |
| ChatCompletions API | Migrating from OpenAI<br>Using existing OpenAI-compatible libraries<br>Rapid prototyping with familiar patterns<br>Using LangChain, LlamaIndex, or similar frameworks 
---

### 1. Invoke API - Basic model invocation with low-Level Control

**‚úÖ Use for:** Simple request-response patterns, maximum control

```python
response = bedrock_runtime.invoke_model(
   body=json.dumps({
       "messages": [{"role": "user", "content": prompt}],
       "max_tokens": 200,
       "temperature": 0.7
   }),
   modelId=model_id,
   accept="application/json",
   contentType="application/json"
)
```

### 2. Converse API - Bedrock native Multi-turn + Tool calling

**‚úÖ Use for:** Multi-turn conversations, tool calling, streaming, Bedrock-native apps

```python
response = bedrock_runtime.converse(
   modelId=model_id,
   messages=[{"role": "user", "content": [{"text": prompt}]}],
   toolConfig={"tools": tools},
   inferenceConfig={"maxTokens": 200, "temperature": 0.7}
)
```
### 3. ChatCompletions API - OpenAI SDK compatible

**‚úÖ Use for:** Drop-in OpenAI replacement, existing codebases, rapid prototyping

```python
from openai import OpenAI

client = OpenAI(
   base_url="https://bedrock-runtime.us-west-2.amazonaws.com/model/{model_id}/v1",
   api_key="your-aws-credentials"
)

response = client.chat.completions.create(
   model=model_id,
   messages=[{"role": "user", "content": prompt}],
   stream=True
)
```

**Note:** The API examples below use models identified in the selection framework from earlier labs. Each example includes brief use case context to help you understand when to use each API



---
## üî¨ Part 1: Model-Specific Examples with Bedrock Converse API's

Now that you understand how to select and evaluate models, let's explore hands-on examples for each model family. These examples demonstrate the unique capabilities and use cases for each model.

**What you'll learn:**
- How to use each model family's unique features
- Practical code examples for different use cases
- Best practices for each model type
- How to use each API type

#### Setup client

First, we setup the Amazon Bedrock client.


In [None]:
import boto3
import os

AWS_REGION = "us-west-2"
session = boto3.Session()
credentials = session.get_credentials()
print(f"‚úÖ Credentials: {credentials.access_key[:10]}..." if credentials else "‚ùå No credentials")

# Remove OpenAI SDK environment variables if they exist
for key in ['OPENAI_API_KEY', 'OPENAI_BASE_URL', 'OPENAI_ORG_ID']:
    if key in os.environ:
        del os.environ[key]
        print(f"‚úÖ Removed {key}")

print("‚úÖ Environment cleaned - ready to use Bedrock with AWS credentials")

bedrock_runtime = boto3.client('bedrock-runtime', region_name=AWS_REGION)
print("‚úÖ Environment setup complete!")

---

## ü¶ô Llama 4 Model Family & Amazon Bedrock's Converse API

Meta's Llama 4 family offers multimodal capabilities and ultra-long context windows for advanced use cases.

### Available Models:
- **Llama 4 Maverick**: Multimodal (text + vision) with 1M context
- **Llama 4 Scout**: Ultra-long context (3.5M tokens) for massive documents

### üñºÔ∏è Use Case: Multimodal Analysis & Document Processing

**Best for:** Image analysis, document OCR, visual Q&A, chart interpretation, screenshot understanding

### Llama 4 Models - Technical Overview

| Feature | Llama 4 Maverick | Llama 4 Scout |
|---------|------------------|---------------|
| **Parameters** | 400B total (128 experts, 17B active) | 109B total (16 experts, 17B active) |
| **Context Window** | 1M tokens | 3.5M tokens |
| **Model ID** | `us.meta.llama4-maverick-17b-instruct-v1:0` | `us.meta.llama4-scout-17b-instruct-v1:0` |
| **Type** | Multimodal (text + vision) | Text-only with ultra-long context |
| **Best For** | Image analysis, visual Q&A | Ultra-long document processing |
| **Tool Calling** | ‚úÖ | ‚úÖ |
| **Streaming** | ‚úÖ | ‚úÖ |
| **Converse API** | ‚úÖ | ‚úÖ |

### Amazon Bedrock's Converse API

In [None]:
import boto3
from PIL import Image

MODEL_ID = 'us.meta.llama4-maverick-17b-instruct-v1:0'

boto_session = boto3.session.Session()
bedrock_client = boto_session.client(
    service_name='bedrock-runtime',
    region_name='us-west-2'
)

### Function to create the messages object with text prompt and image

Since we are going to be using the messages object everytime we are going to query an image, we can write a function as follows to create the payload so that it becomes reusable.

In [None]:
def make_multi_images_messages(question, image_paths):

    images_list = []
    images_media_type = []
    try:
        for img in images_path:
            with open(img, "rb") as image_file:
                image_bytes = image_file.read()
                images_list.append(image_bytes)
            img_1 = Image.open(img)
            imgformat = img_1.format
            imgformat = imgformat.lower()
            images_media_type.append(imgformat)
    except FileNotFoundError:
        print(f"Image file not found at {image_path}")
        image_data = None
        image_media_type = None
    
    messages = [            
            {
                "role": "user",
                "content": [
                {                        
                    "text": question
                },
                {
                    "image": {
                        "format": images_media_type[0],
                        "source": {
                            "bytes": images_list[0]
                        }
                    }
                },
                {
                    "image": {
                        "format": images_media_type[1],
                        "source": {
                            "bytes": images_list[1]
                        }
                    }
                }
                ]
            }
        ]

    return messages

In [None]:
question = "Describe the content in these images"
images_path = ["img/slide-1.png", "img/slide-3.png"]
messages = make_multi_images_messages(question, images_path)
for img in images_path:
    image = Image.open(img)
    image.show()
try:
    # Invoke the SageMaker endpoint
    response = bedrock_client.converse(
        modelId=MODEL_ID, # MODEL_ID defined at the beginning
        messages=messages,
        inferenceConfig={
        "maxTokens": 2048,
        "temperature": 0,
        "topP": .1
        },        
    )
    
    # Read the response 
    print(response['output']['message']['content'][0]['text'])

except Exception as e:
    print(f"An error occurred while invoking the endpoint: {str(e)}")

---

## üî¨ Part 2: ü§ñ GPT OSS Model Family & Bedrock OpenAI-Compatible API Integration

The GPT OSS family provides OpenAI-compatible models optimized for agentic workflows and complex reasoning tasks.

### Available Models:
- **GPT-OSS-120B**: High-performance reasoning for complex tasks
- **GPT-OSS-20B**: Cost-effective model for simpler workloads


### ü§ñ Use Case: Agentic Workflows & Complex Reasoning

**Example Applications:** Multi-agent systems, research tasks, tool-heavy applications, autonomous decision making

**Why GPT OSS and OpenAI-Compatible API?**
- 100% OpenAI SDK compatible for seamless migration
- Enhanced function calling for tool integration
- Advanced reasoning capabilities
- Native support for LangChain, LangGraph, CrewAI

### GPT OSS Models - Technical Overview

| Feature | GPT-OSS-120B | GPT-OSS-20B |
|---------|--------------|-------------|
| **Parameters** | 120B | 20B |
| **Model ID** | `openai.gpt-oss-120b-1:0` | `openai.gpt-oss-20b-1:0` |
| **Context Window** | 128K tokens | 128K tokens |
| **Best For** | Complex reasoning, agentic workflows | Fast inference, cost-effective deployments |
| **Tool Calling** | ‚úÖ Enhanced | ‚úÖ Enhanced |
| **Streaming** | ‚úÖ | ‚úÖ |
| **Converse API** | ‚úÖ | ‚úÖ |
| **OpenAI Compatible** | ‚úÖ 100% | ‚úÖ 100% |
| **Languages** | Multilingual | Multilingual |

### Amazon Bedrock's Chat Completions API

In [None]:
! pip install -q -U boto3 openai langchain langgraph langchain-openai langchain-core langchain-aws aws-bedrock-token-generator

In [None]:
import os
import json
import boto3

# LangChain imports
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

In [None]:
# Model Configuration
MODEL_ID = "openai.gpt-oss-120b-1:0"  # or "openai.gpt-oss-20b-1:0" for faster inference
REGION = "us-west-2"
os.environ['AWS_REGION'] = REGION

print(f"‚úÖ Using model: {MODEL_ID}")
print(f"‚úÖ Region: {REGION}")
print("‚úÖ Using AWS credentials from your environment (AWS CLI, env vars, or IAM role)")

### Using LangChain's OpenAI wrapper classes natively with Amazon Bedrock

LangChain provides a comprehensive framework for building LLM applications. We'll explore how to use LangChain's popular OpenAI wrapper classes with Bedrock's OpenAI compatible endpoint. 

**Note:** We'll set the environment variables `OPENAI_BASE_URL` and `OPENAI_API_KEY` to redirect traffic to Amazon Bedrock's OpenAI compatible endpoint and handle authentication via AWS credentials. This allows us to use `ChatOpenAI` seamlessly with Bedrock models.

In [None]:
# Configure OpenAI SDK to use Bedrock's OpenAI-compatible endpoint
# The OpenAI SDK with Bedrock requires a Bedrock API key (not AWS credentials)
# We'll use AWS's official token generator library to create short-term API keys

from aws_bedrock_token_generator import provide_token

# Generate a short-term Bedrock API key (valid for up to 12 hours)
# This automatically uses your AWS credentials and inherits their permissions
try:
    api_key = provide_token()
    print("‚úÖ Bedrock API Key generated successfully")
except Exception as e:
    print(f"‚ùå Error generating Bedrock API key: {e}")
    print("\nüí° Troubleshooting tips:")
    print("   1. Ensure your AWS credentials are configured (AWS CLI, env vars, or IAM role)")
    print("   2. Verify you have permissions to use Amazon Bedrock")
    print("   3. Check that you're in a supported region (us-west-2, us-east-1, etc.)")
    print("   4. Install the token generator: pip install aws-bedrock-token-generator")
    raise

# Set environment variables for OpenAI SDK
# IMPORTANT: The base_url should be /openai/v1 (NOT /model/{model_id}/v1)
# The model ID is specified in the chat.completions.create() call
base_url = f"https://bedrock-runtime.{REGION}.amazonaws.com/openai/v1"
os.environ['OPENAI_BASE_URL'] = base_url
os.environ['OPENAI_API_KEY'] = api_key

print(f"‚úÖ OpenAI Base URL: {base_url}")

# Initialize LangChain with ChatOpenAI using Bedrock's OpenAI-compatible endpoint
# The environment variables will automatically be picked up by ChatOpenAI
llm = ChatOpenAI(
    model=MODEL_ID,
    temperature=0.1,
    max_tokens=2000
)

print(f"‚úÖ LangChain LLM initialized with model: {MODEL_ID}")
print("‚úÖ Using ChatOpenAI (OpenAI-compatible Bedrock endpoint)")
print(f"‚úÖ API KEY: {api_key[:20]}...")

In [None]:
# Simple LangChain Chain using LCEL (LangChain Expression Language)
prompt_template = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful coding assistant. Provide clear, concise explanations."),
    ("human", "Explain the concept of {topic} in programming.")
])

# Modern approach: Use pipe operator (LCEL) instead of deprecated LLMChain
chain = prompt_template | llm | StrOutputParser()

# Test the chain
# Note: invoke() takes a dictionary as input, not keyword arguments
response = chain.invoke({"topic": "dependency injection"})

print("üíª LangChain Chain Response:")
print("=" * 50)
print(response)

---
# üéØ Key Takeaways & Next Steps

### What You've Learned in Lab 1b

You've now explored the three main ways to integrate Bedrock models into your applications:

1. **Invoke API** - Maximum control for production systems
2. **Converse API** - Bedrock-native features like tool calling and guardrails  
3. **ChatCompletions API** - OpenAI-compatible for easy migration


---

## üéâ Great Work!

You've successfully completed the model selection and API integration foundations. Lab 2 will build on this knowledge to help you optimize and evaluate models for production deployment.

**Continue to ‚Üí [Lab 2: Model Evaluation & Optimization](../lab2/Lab2a_-_Automatic_model_evaluation.ipynb)**

---