# Azure OpenAI: GPT-4o vs GPT-5 (Simple Demo)

A minimal, Azure-only notebook showing:
- How a baseline GPT-4o chat call looks (chat.completions).
- How a GPT-5 (next-gen) Responses API call looks and why it’s simpler.
- Small toggles: `reasoning_effort` and `verbosity` (when supported).

Adjust your deployment names in the cell below.

## Authentication Troubleshooting

If you're getting a 401 authentication error, here are the most common fixes:

1. **Check your environment variables** - Make sure these are set correctly:
   - `V1_AZURE_OPENAI_ENDPOINT` - Should be your Azure OpenAI endpoint URL
   - `AZURE_OPENAI_API_KEY` - Your API key from Azure portal

2. **Verify endpoint format** - The endpoint should look like:
   - `https://your-resource-name.openai.azure.com/openai/deployments/your-deployment-name/`
   - Or for the v1 format: `https://your-resource-name.openai.azure.com/openai/v1`

3. **Check deployment names** - Make sure `gpt-5` matches your actual deployment name in Azure

4. **Try Azure AD authentication** instead of API key (more reliable for some setups)

In [7]:
# Alternative: Azure AD Authentication (recommended)
import os
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

# This method uses Azure AD instead of API keys
try:
    credential = DefaultAzureCredential()
    token_provider = get_bearer_token_provider(
        credential, "https://cognitiveservices.azure.com/.default"
    )
    
    # Use AzureOpenAI client with proper endpoint format
    client = AzureOpenAI(
        azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),  # https://your-resource.openai.azure.com/
        azure_ad_token_provider=token_provider,
        api_version="2024-12-01-preview"  # Use latest API version that supports GPT-5
    )
    
    print("✅ Azure AD authentication configured")
    print("Endpoint:", os.getenv("AZURE_OPENAI_ENDPOINT"))
    
except Exception as e:
    print("❌ Azure AD auth failed:", e)
    print("Falling back to API key method...")
    
    # Fallback to API key method
    client = AzureOpenAI(
        azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
        api_key=os.getenv("AZURE_OPENAI_API_KEY"),
        api_version="2024-12-01-preview"
    )
    print("✅ API key authentication configured")

✅ Azure AD authentication configured
Endpoint: https://aifoundry825233136833-resource.cognitiveservices.azure.com/


In [9]:
# Test the connection with a simple call
def test_connection():
    try:
        response = client.chat.completions.create(
            model="gpt-5",  # Replace with your actual deployment name
            messages=[{"role": "user", "content": "Hello, can you respond with just 'OK'?"}],
            max_completion_tokens=1000
        )
        print("✅ Connection successful!")
        print("Response:", response.choices[0].message.content)
        return True
    except Exception as e:
        print("❌ Connection failed:")
        print("Error:", str(e))
        return False

# Run the test
test_connection()

✅ Connection successful!
Response: OK


True

## Environment Variables Setup

Create a `.env` file in your project root with these variables:

```bash
# Azure OpenAI settings
AZURE_OPENAI_ENDPOINT=https://your-resource-name.openai.azure.com/
AZURE_OPENAI_API_KEY=your-api-key-here
AZURE_OPENAI_DEPLOYMENT_NAME=gpt-5  # or your actual deployment name

# For v1 endpoint format (if using the OpenAI client)
V1_AZURE_OPENAI_ENDPOINT=https://your-resource-name.openai.azure.com/openai/v1
```

**Common Issues:**
- Make sure your Azure OpenAI resource has GPT-5 deployed
- Verify the deployment name matches exactly (case-sensitive)
- Check that your subscription has access to GPT-5 models
- Ensure your API key hasn't expired

In [1]:
import os

from openai import OpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
)

client = OpenAI(  
  base_url = os.getenv("V1_AZURE_OPENAI_ENDPOINT"),
  api_key=token_provider,
)

response = client.chat.completions.create(
    model="gpt-5", # replace with your model deployment name 
    messages=[
        {"role": "user", "content": "What steps should I think about when writing my first Python API?"},
    ],
    max_completion_tokens = 5000

)

print(response.model_dump_json(indent=2))

AuthenticationError: Error code: 401 - {'error': {'code': '401', 'message': 'Access denied due to invalid subscription key or wrong API endpoint. Make sure to provide a valid key for an active subscription and use a correct regional API endpoint for your resource.'}}

## GPT-4o: Classic chat.completions
Previous approach required us to give system prompts to control the behavior of the assistant.

In [80]:
questions = [
    "What are your hours of operation?",
    # "How do I reset my password?",
    # "Where can I find your pricing information?",
    # "How do I contact support?",
    # "How do I create an account?",
    # "How can I update my billing information?",
    # "What is your refund policy?",
    # "How do I track my order?",
    # "Can I change my subscription plan?",
    # "How do I delete my account?",
    # "Do you offer a free trial?",
    # "How do I integrate with your API?"
]

# Use the first question as the prompt for this example
prompt = questions[0]

started = time.time()
resp_4o = client.chat.completions.create(
    model=BASELINE_DEPLOYMENT,
    messages=[
        {
            'role': "system",
            'content': "For simple questions, give brief answers. For complex technical issues, think step by step through the problem, consider multiple approaches, analyze each option...",  # 200+ words of prompt hacks
        },
        {
            'role':'user',
            'content': prompt
        }
    ],
    temperature=0.7,
)
lat_4o = time.time() - started
text_4o = resp_4o.choices[0].message.content
usage_4o = resp_4o.usage

print('GPT-4o response:')
print(text_4o)
print('Latency (s):', round(lat_4o,3))
print('Tokens   :', getattr(usage_4o, 'total_tokens', None))

GPT-4o response:
I’m available 24/7 to assist you!
Latency (s): 1.656
Tokens   : 59


## GPT-5: chat.completions
Now with the `reasoning_effort` parameter, we can control the depth of the model's analysis and explanation without introducing additional system prompts or tokens.

In [83]:
started = time.time()
resp_5 = client.chat.completions.create(
    model=PRIMARY_DEPLOYMENT,
    messages=[
        {'role': 'user', 'content': prompt}
    ],
    reasoning_effort="high",
)

lat_5 = time.time() - started
text_5 = resp_5.choices[0].message.content
usage_5 = resp_5.usage

print('GPT-5 response:')
print(text_5)
print('Latency (s):', round(lat_5,3))
print('Tokens   :', getattr(usage_5, 'total_tokens', None))


GPT-5 response:
I’m available 24/7. If you’re asking about a specific business or location, tell me the name and city, and I’ll help you find their hours.
Latency (s): 8.495
Tokens   : 506


Minimal reasoning runs GPT-5 with few or no reasoning tokens to minimize latency and speed up time-to-first-token. Use it for deterministic, lightweight tasks (extraction, formatting, short rewrites, simple classification) where explanations aren’t needed. If you don’t specify effort, it defaults to medium—set minimal explicitly when you want speed over deliberation.

## Verbosity parameter
The cells below demonstrate some of the new GPT-5 parameters and patterns from the OpenAI Cookbook:
- `reasoning_effort`: controls how much chain-of-thought / internal reasoning effort the model should attempt
- `verbosity`: controls how detailed the response should be
- `Responses` API: the newer unified responses API (when available in your SDK) versus classic chat completions
- A quick A/B comparison pattern to compare a baseline model vs your GPT-5 deployment

These examples assume the `client`, `PRIMARY_DEPLOYMENT`, and `BASELINE_DEPLOYMENT` variables are defined earlier in the notebook (they are in the cells above). Adjust deployment names and params to match your environment.

In [84]:
prompt = "how do I deploy a container to Azure?"

# Concise answer
short_response = client.chat.completions.create(
    model=PRIMARY_DEPLOYMENT,
    messages=[{"role": "user", "content": prompt}],
    verbosity="low"
)

# Balanced explanation  
medium_response = client.chat.completions.create(
    model=PRIMARY_DEPLOYMENT, 
    messages=[{"role": "user", "content": prompt}],
    verbosity="medium"
)

# Comprehensive guide
detailed_response = client.chat.completions.create(
    model=PRIMARY_DEPLOYMENT,
    messages=[{"role": "user", "content": prompt}],
    verbosity="high"
)



In [85]:
# extractor to print latency & token counts (adapt names if different)
def extract_text_and_tokens(resp_obj):
    # SDK shapes vary; adjust the extraction below to match your openai client
    try:
        text = resp_obj.choices[0].message.content
    except Exception:
        # Try some other shapes
        text = getattr(resp_obj, 'text', str(resp_obj))
    tokens = None
    try:
        tokens = getattr(resp_obj, 'usage', None)
        if tokens is not None:
            tokens = getattr(tokens, 'total_tokens', tokens)
    except Exception:
        tokens = None
    return text, tokens

for name, resp, latency in [
    ("Short", short_response, None),   # put measured latencies if you kept them
    ("Medium", medium_response, None),
    ("Detailed", detailed_response, None),
]:
    text, token_count = extract_text_and_tokens(resp)
    # If you measured latency values (e.g., lat_short) replace `None` above and print them here
    print(f'--- {name} ---')
    print('Tokens:', token_count)
    print('Text preview:', (text[:200] + '...') if isinstance(text, str) and len(text) > 200 else text)
    print()

--- Short ---
Tokens: 4322
Text preview: Great question. “Deploying a container to Azure” can mean a few different services depending on what you need. Here’s how to choose, plus quick step-by-step guides.

Pick a target service
- Azure Cont...

--- Medium ---
Tokens: 3948
Text preview: Great question. “Deploy a container to Azure” can mean a few different services depending on what you need. Here’s a quick guide and copy-pasteable steps to get you running fast, plus options if you n...

--- Detailed ---
Tokens: 3735
Text preview: Great question. “Deploy a container to Azure” can mean a few different services depending on what you need. Here’s how to pick, followed by quick step-by-step guides.

Pick the right Azure service
- E...



## GPT-5: Responses API - Additional Exercises (Cookbook examples)



## 1. Verbosity Levels
The verbosity parameter lets you hint the model to be more or less expansive in its replies.

Values: "low", "medium", "high"

- low → terse UX, minimal prose.
- medium (default) → balanced detail.
- high → verbose, great for audits, teaching, or hand-offs.
Keep prompts stable and use the param rather than re-writing.

In [86]:
import pandas as pd
from IPython.display import display

question = "Write a poem about a boy and his first pet dog."

data = []

for verbosity in ["low", "medium", "high"]:
    response = client.responses.create(
        model="gpt-5-mini",
        input=question,
        text={"verbosity": verbosity}
    )

    # Extract text
    output_text = ""
    for item in response.output:
        if hasattr(item, "content") and item.content is not None:
            for content in item.content:
                if hasattr(content, "text"):
                    output_text += content.text

    usage = response.usage
    data.append({
        "Verbosity": verbosity,
        "Sample Output": output_text,
        "Output Tokens": usage.output_tokens
    })

# Create DataFrame
df = pd.DataFrame(data)

# Display nicely with centered headers
pd.set_option('display.max_colwidth', None)
styled_df = df.style.set_table_styles(
    [
        {'selector': 'th', 'props': [('text-align', 'center')]},  # Center column headers
        {'selector': 'td', 'props': [('text-align', 'left')]}     # Left-align table cells
    ]
)

display(styled_df)


Unnamed: 0,Verbosity,Sample Output,Output Tokens
0,low,"He found him in a cardboard box of quiet — a warm sliver of breath, a curl of ear, a clumsy paw that tapped the world like a question. The boy held him like a promise. They learned the names of morning together: puddle-splash, biscuit-mud, the small sun that follows sleep across the floor. A leash became a ribbon of trust, the backyard a kingdom of sudden discoveries. The dog learned to wait at the gate; the boy learned to wait for kindness. At dusk they traded secrets without words — a leaning shoulder, a crooked grin, a sigh that said everything about being unafraid. Years braided into each other the same way: games, storms, the soft hush of homework, and nights when the dog’s breathing slowed and the boy, now taller, kept his hand warm against the steady, faithful heartbeat of home.",314
1,medium,"He came with a cardboard box of tremors and hope, a speckled shoulder that fit under my palm. For a week I measured the world by his tail— a small metronome that kept time with my heart. He taught me names for ordinary things: the way rain smells on pavement, the exact language of mud, how a stick could be treasure, a puddle a sea. He learned me, too, how to be patient— how to wait for crumbs to fall and for trust to arrive. We traded secrets: I in whispers, he in nudges, his breath a warm tide against my midnight fears. When thunder rolled like an engine on the roof, he climbed the ladder of my lap and held my knees. He chewed the edges of my homework, not out of malice but because he kept the small truth—that play is work too. He buried my socks like relics, guarded our doorway with an honest gravity that made the house a home. Afternoons were a map of sunlit routes—porch, stoop, the willow swing, the corner store where the clerk pretended not to know us. I learned to call him back by voice, by pocketful of biscuits, and sometimes by guilt when I had stayed too long at a friend's. Years braided themselves with his paws; he grew gray at the muzzle, I grew taller, my hand no longer fit his head quite the same. There were first goodbyes in smaller things: his favorite ball, the kitchen rug he always chose for naps, the quiet of our rooms. He never taught me to be brave in stories, only in small truths— to show up, to sit, to lean when other things fell apart. He taught me how to forgive, fast and clean as a tail’s sweep, how love is less about ceremony and more about presence. Sometimes now I close the door of the past and hear a faint jingle, a memory collar catching on the ribs of memory. I still set a place for lessons at the table: a bowl of patience, a leash of care, a pocket full of courage. In the photograph he always looks alive—tongue out, eyes open, and I am the same boy, forever learning to call him back. He was my first cartographer of the world, and in every park I pass, I still follow the path he taught me to trust.",1005
2,high,"The day they brought him home, the dog fit in one palm — a warm, clumsy bundle of ears and unpracticed wags. The boy, with shoelaces still tied in loops of habit, held his breath as if he could stop time with quiet hands. They sat in the kitchen where sunlight made a slow map across the floor, and the dog sniffed for places to belong. At night the dog slept like a small, steady engine, a heartbeat pressed against the boy’s shins. Outside, the first real thunder startled both into silence, and the boy learned where to put his hands when fear came: on the soft, upturned belly, in the place that trusted him back. In the morning the dog discovered the world anew — mailboxes, puddles, the miraculous authority of squirrels — and the boy’s laugh sounded like permission for everything. Homework spread like a town of paper on the table, and the dog melted into the empty chair beside failures and triumphs. When the boy scraped his knee on a day bright as a promise, the dog licked the salt and left a tiny, glistening peace. They practiced the shape of being brave together: the boy learning to call, the dog learning to stay. They taught each other how to wait for little things — for thrown sticks, for visitors, for dinners — and how to answer the world with wagging insistence. Seasons rolled like slow pawprints across the land. Winter muffled the yard in a soft, forgetful white, and summer threw long, lazy shadows where they napped. The dog grew into his legs and the boy into his forehead, both tall enough to reach the top shelf of small adventures. Neighbors watched a pair that belonged entirely to itself — two silhouettes tipping at the edges of ordinary afternoons. Years folded like pages; collars were loosened, then tightened, and the dog’s muzzle grayed like a sky at dusk. One afternoon the boy—no longer quite a boy—sat on the porch, and the dog’s head found his knee as if it had always been home. Words were fewer now; there was a language in the touch, in the small, contented sigh that meant: I remember, too. When the last walk came, the road felt softer underfoot, each step sacred, each breath a small bright thing held between them. Afterward the house kept a silence like held breath, but the backyard still knew the ways they had loved. The boy, with older hands, dug a little hole and planted a bone of memory wrapped in forget-me-nots, and he laughed once, a sound that carried his grief and his gratitude in equal measure. Some nights he would wake and find the space by his feet empty, and there would be an ache like the absence of music. Time taught him what the dog had always known: how to meet the world with a wag, how to forgive the window’s ghosts, how to be present for the small, bright things. In the quiet that followed, the boy kept a pocket full of mornings — the scent of fur, the scrape of paws on tile, the thunk of a tail at the door — and he learned, simply, how to keep loving anything that loved him back.",1053


The output tokens scale roughly linearly with verbosity: low (560) → medium (849) → high (1288).



## Using Verbosity for Coding Use Cases
The verbosity parameter also influences the length and complexity of generated code, as well as the depth of accompanying explanations. Here's an example, wherein we use various verboisty levels for a task to generate a Python program that sorts an array of 1000000 random numbers.

In [None]:
prompt = "Output a Python program that sorts an array of 1000000 random numbers"

def ask_with_verbosity(verbosity: str, question: str):
    response = client.responses.create(
        model="gpt-5-mini",
        input=question,
        text={
            "verbosity": verbosity
        },
    )

    # Extract assistant's text output
    output_text = ""
    for item in response.output:
        if hasattr(item, "content") and item.content is not None:
            for content in item.content:
                if hasattr(content, "text"):
                    output_text += content.text

    # Token usage details
    usage = response.usage

    print("--------------------------------")
    print(f"Verbosity: {verbosity}")
    print("Output:")
    print(output_text)
    print("Tokens => input: {} | output: {}".format(
        usage.input_tokens, usage.output_tokens
    ))


# Example usage:
ask_with_verbosity("high", prompt)

--------------------------------
Verbosity: high
Output:
#!/usr/bin/env python3
"""
Generate an array of 1,000,000 random numbers and sort it, timing the operations.

This script:
 - creates a list of 1_000_000 random floats in [0.0, 1.0)
 - sorts the list in-place using Python's Timsort (list.sort())
 - reports timings and a small sample of the sorted list

Notes:
 - Python list of 1,000,000 float objects can use a significant amount of memory
   (tens of MBs). If you have memory constraints or want faster sorting, consider
   using numpy arrays (np.random.random and np.sort) which store data more compactly
   and use optimized C code.
"""

import random
import time
import sys

def main():
    N = 1_000_000  # number of random numbers

    print(f"Generating {N} random floats...")
    t0 = time.perf_counter()
    data = [random.random() for _ in range(N)]
    t1 = time.perf_counter()
    gen_time = t1 - t0
    print(f"Generation took {gen_time:.3f} seconds. (Approx memory used depends

Notes and next steps:
- If `client.responses.create` is not available in your installed SDK, adapt the cell to use `client.chat.completions.create` or upgrade the `openai` package used in this environment.
- Try different `reasoning_effort` and `verbosity` settings to see how the model trade-offs change between concision and depth.
- For tool integrations, the Cookbook shows patterns for registering and invoking external tools; add a small local tool (e.g., calculator) and adapt the request to call it to practice a safe tool flow.
- When running these cells in shared or public repos, avoid committing prompts that contain explicit or actionable harmful content; keep tests in local, secured environments.

## 2. Free-Form Function Calling
GPT‑5 can now send raw text payloads - anything from Python scripts to SQL queries - to your custom tool without wrapping the data in JSON using the new tool "type": "custom". This differs from classic structured function calls, giving you greater flexibility when interacting with external runtimes such as:

- code_exec with sandboxes (Python, C++, Java, …)
- SQL databases
- Shell environments
- Configuration generators

> Note that custom tool type does NOT support parallel tool calling.

### Quick Start Example - Compute the Area of a Circle
The code below produces a simple python code to calculate area of a circle, and instruct the model to use the freeform tool call to output the result.

In [None]:
from openai.types.responses import ToolChoice, ToolChoiceAllowedParam

tool_choice_options = ToolChoiceAllowedParam(
    name="tool_choice",
    description="Select a tool to use",
    choices=[
        ToolChoice(name="calculator", description="Performs mathematical calculations"),
    ],
)

response = client.responses.create(
    model="gpt-5-mini",
    input="Please use the code_exec tool to calculate the area of a circle with radius equal to the number of 'r's in strawberry",
    text={"format": {"type": "text"}},
    tools=[
        {
            "type": "custom",
            "name": "code_exec",
            "description": "Executes arbitrary python code",
        }
    ]
    tool_choice=tool_choice_options
)
print(response.output)

SyntaxError: invalid syntax (4040905876.py, line 3)

The model emits a tool call containing raw Python. You execute that code server‑side, capture the printed result, and send it back in a follow‑up responses.create call.

### Mini-Benchmark - Sorting an Array in Three Languages
To illustrate the use of free form tool calling, we will ask GPT‑5 to:

Generate Python, C++, and Java code that sorts a fixed array 10 times.
Print only the time (in ms) taken for each iteration in the code.
Call all three functions, and then stop

In [92]:
from typing import List, Optional

MODEL_NAME = "gpt-5"

# Tools that will be passed to every model invocation. They are defined once so
# that the configuration lives in a single place.
TOOLS = [
    {
        "type": "custom",
        "name": "code_exec_python",
        "description": "Executes python code",
    },
    {
        "type": "custom",
        "name": "code_exec_cpp",
        "description": "Executes c++ code",
    },
    {
        "type": "custom",
        "name": "code_exec_java",
        "description": "Executes java code",
    },
]

def create_response(
    input_messages: List[dict],
    previous_response_id: Optional[str] = None,
):
    """Wrapper around ``client.responses.create``.

    Parameters
    ----------
    input_messages: List[dict]
        The running conversation history to feed to the model.
    previous_response_id: str | None
        Pass the ``response.id`` from the *previous* call so the model can keep
        the thread of the conversation.  Omit on the very first request.
    """
    kwargs = {
        "model": MODEL_NAME,
        "input": input_messages,
        "text": {"format": {"type": "text"}},
        "tools": TOOLS,
    }
    if previous_response_id:
        kwargs["previous_response_id"] = previous_response_id

    return client.responses.create(**kwargs)

# Recursive 
def run_conversation(
    input_messages: List[dict],
    previous_response_id: Optional[str] = None,
):
  
    response = create_response(input_messages, previous_response_id)

    # ``response.output`` is expected to be a list where element 0 is the model
    # message.  Element 1 (if present) denotes a tool call.  When the model is
    # done with tool calls, that element is omitted.
    tool_call = response.output[1] if len(response.output) > 1 else None

    if tool_call and tool_call.type == "custom_tool_call":
        print("--- tool name ---")
        print(tool_call.name)
        print("--- tool call argument (generated code) ---")
        print(tool_call.input)
        
        # Add a synthetic *tool result* so the model can continue the thread.
        
        input_messages.append(
            {
                "type": "function_call_output",
                "call_id": tool_call.call_id,
                "output": "done", # <-- replace with the result of the tool call
            }
        )

        # Recurse with updated conversation and track the response id so the
        # model is aware of the prior turn.
        return run_conversation(input_messages, previous_response_id=response.id)
    else:
        # Base-case: no further tool call - return. 
        return 


prompt = """
Write code to sort the array of numbers in three languages: C++, Python and Java (10 times each)using code_exec functions.

ALWAYS CALL THESE THREE FUNCTIONS EXACTLY ONCE: code_exec_python, code_exec_cpp and code_exec_java tools to sort the array in each language. Stop once you've called these three functions in each language once.

Print only the time it takes to sort the array in milliseconds. 

[448, 986, 255, 884, 632, 623, 246, 439, 936, 925, 644, 159, 777, 986, 706, 723, 534, 862, 195, 686, 846, 880, 970, 276, 613, 736, 329, 622, 870, 284, 945, 708, 267, 327, 678, 807, 687, 890, 907, 645, 364, 333, 385, 262, 730, 603, 945, 358, 923, 930, 761, 504, 870, 561, 517, 928, 994, 949, 233, 137, 670, 555, 149, 870, 997, 809, 180, 498, 914, 508, 411, 378, 394, 368, 766, 486, 757, 319, 338, 159, 585, 934, 654, 194, 542, 188, 934, 163, 889, 736, 792, 737, 667, 772, 198, 971, 459, 402, 989, 949]
"""

# Initial developer message.
messages = [
    {
        "role": "developer",
        "content": prompt,
    }
]

run_conversation(messages)


--- tool name ---
code_exec_python
--- tool call argument (generated code) ---
arr = [448, 986, 255, 884, 632, 623, 246, 439, 936, 925, 644, 159, 777, 986, 706, 723, 534, 862, 195, 686, 846, 880, 970, 276, 613, 736, 329, 622, 870, 284, 945, 708, 267, 327, 678, 807, 687, 890, 907, 645, 364, 333, 385, 262, 730, 603, 945, 358, 923, 930, 761, 504, 870, 561, 517, 928, 994, 949, 233, 137, 670, 555, 149, 870, 997, 809, 180, 498, 914, 508, 411, 378, 394, 368, 766, 486, 757, 319, 338, 159, 585, 934, 654, 194, 542, 188, 934, 163, 889, 736, 792, 737, 667, 772, 198, 971, 459, 402, 989, 949]

import time

start = time.perf_counter()
for _ in range(10):
    a = arr[:]  # copy
    a.sort()
elapsed_ms = int((time.perf_counter() - start) * 1000)
print(elapsed_ms)
--- tool name ---
code_exec_cpp
--- tool call argument (generated code) ---
#include <algorithm>
#include <vector>
#include <chrono>
#include <iostream>
using namespace std;

int main() {
    vector<int> arr = {448, 986, 255, 884, 632, 623, 24

The model output three code blocks in Python, C++ and Java for the same algorithm. The output of the function call was chained back into the model as input to allow model to keep going until all the functions have been called exactly once.



## 3. Context‑Free Grammar (CFG)
### Overview
A context‑free grammar is a collection of production rules that define which strings belong to a language. Each rule rewrites a non‑terminal symbol into a sequence of terminals (literal tokens) and/or other non‑terminals, independent of surrounding context—hence context‑free. CFGs can capture the syntax of most programming languages and, in OpenAI custom tools, serve as contracts that force the model to emit only strings that the grammar accepts.

### Grammar Fundamentals

Supported Grammar Syntax

- Lark - https://lark-parser.readthedocs.io/en/stable/
- Regex - https://docs.rs/regex/latest/regex/#syntax
We use LLGuidance under the hood to constrain model sampling: https://github.com/guidance-ai/llguidance.

Unsupported Lark Features

- Lookaround in regexes ((?=...), (?!...), etc.)
- Lazy modifier (*?, +?, ??) in regexes.
- Terminal priorities, templates, %declares, %import (except %import common).
- Terminals vs Rules & Greedy Lexing

| Concept | Take-away |
|---|---|
| Terminals (UPPER) | Matched first by the lexer — longest match wins. |
| Rules (lower) | Combine terminals; cannot influence how text is tokenised. |
| Greedy lexer | Never try to “shape” free text across multiple terminals — you’ll lose control. |

** Correct vs Incorrect Pattern Design

**✅ One bounded terminal handles free‑text between anchors**

start: SENTENCE

SENTENCE: /[A-Za-z, ](the hero|a dragon)[A-Za-z, ](fought|saved)[A-Za-z, ](a treasure|the kingdom)[A-Za-z, ]./

**❌ Don’t split free‑text across multiple terminals/rules**

start: sentence

sentence: /[A-Za-z, ]+/ subject /[A-Za-z, ]+/ verb /[A-Za-z, ]+/ object /[A-Za-z, ]+/

## Example - SQL Dialect — MS SQL vs PostgreSQL
The following code example is now the canonical reference for building multi‑dialect SQL tools with CFGs. It demonstrates:

Two isolated grammar definitions (mssql_grammar_definition, postgres_grammar_definition) encoding TOP vs LIMIT semantics.
How to prompt, invoke, and inspect tool calls in a single script.
A side‑by‑side inspection of the assistant’s responses.
Define the LARK grammars for different SQL dialects

In [73]:
import textwrap

# ----------------- grammars for MS SQL dialect -----------------
mssql_grammar = textwrap.dedent(r"""
            // ---------- Punctuation & operators ----------
            SP: " "
            COMMA: ","
            GT: ">"
            EQ: "="
            SEMI: ";"

            // ---------- Start ----------
            start: "SELECT" SP "TOP" SP NUMBER SP select_list SP "FROM" SP table SP "WHERE" SP amount_filter SP "AND" SP date_filter SP "ORDER" SP "BY" SP sort_cols SEMI

            // ---------- Projections ----------
            select_list: column (COMMA SP column)*
            column: IDENTIFIER

            // ---------- Tables ----------
            table: IDENTIFIER

            // ---------- Filters ----------
            amount_filter: "total_amount" SP GT SP NUMBER
            date_filter: "order_date" SP GT SP DATE

            // ---------- Sorting ----------
            sort_cols: "order_date" SP "DESC"

            // ---------- Terminals ----------
            IDENTIFIER: /[A-Za-z_][A-Za-z0-9_]*/
            NUMBER: /[0-9]+/
            DATE: /'[0-9]{4}-[0-9]{2}-[0-9]{2}'/
    """)

# ----------------- grammars for PostgreSQL dialect -----------------
postgres_grammar = textwrap.dedent(r"""
            // ---------- Punctuation & operators ----------
            SP: " "
            COMMA: ","
            GT: ">"
            EQ: "="
            SEMI: ";"

            // ---------- Start ----------
            start: "SELECT" SP select_list SP "FROM" SP table SP "WHERE" SP amount_filter SP "AND" SP date_filter SP "ORDER" SP "BY" SP sort_cols SP "LIMIT" SP NUMBER SEMI

            // ---------- Projections ----------
            select_list: column (COMMA SP column)*
            column: IDENTIFIER

            // ---------- Tables ----------
            table: IDENTIFIER

            // ---------- Filters ----------
            amount_filter: "total_amount" SP GT SP NUMBER
            date_filter: "order_date" SP GT SP DATE

            // ---------- Sorting ----------
            sort_cols: "order_date" SP "DESC"

            // ---------- Terminals ----------
            IDENTIFIER: /[A-Za-z_][A-Za-z0-9_]*/
            NUMBER: /[0-9]+/
            DATE: /'[0-9]{4}-[0-9]{2}-[0-9]{2}'/
    """)

## Generate specific SQL dialect
Let's define the prompt, and call the function to produce MS SQL dialect

In [74]:
sql_prompt_mssql = (
    "Call the mssql_grammar to generate a query for Microsoft SQL Server that retrieve the "
    "five most recent orders per customer, showing customer_id, order_id, order_date, and total_amount, "
    "where total_amount > 500 and order_date is after '2025-01-01'. "
)

response_mssql = client.responses.create(
    model="gpt-5",
    input=sql_prompt_mssql,
    text={"format": {"type": "text"}},
    tools=[
        {
            "type": "custom",
            "name": "mssql_grammar",
            "description": "Executes read-only Microsoft SQL Server queries limited to SELECT statements with TOP and basic WHERE/ORDER BY. YOU MUST REASON HEAVILY ABOUT THE QUERY AND MAKE SURE IT OBEYS THE GRAMMAR.",
            "format": {
                "type": "grammar",
                "syntax": "lark",
                "definition": mssql_grammar
            }
        },
    ],
    parallel_tool_calls=False
)

print("--- MS SQL Query ---")
print(response_mssql.output[1].input)

--- MS SQL Query ---
SELECT TOP 5 customer_id, order_id, order_date, total_amount FROM orders WHERE total_amount > 500 AND order_date > '2025-01-01' ORDER BY order_date DESC;


The output SQL accurately uses "SELECT TOP" construct

In [None]:
sql_prompt_pg = (
    "Call the postgres_grammar to generate a query for PostgreSQL that retrieve the "
    "five most recent orders per customer, showing customer_id, order_id, order_date, and total_amount, "
    "where total_amount > 500 and order_date is after '2025-01-01'. "
)

response_pg = client.responses.create(
    model="gpt-5",
    input=sql_prompt_pg,
    text={"format": {"type": "text"}},
    tools=[
        {
            "type": "custom",
            "name": "postgres_grammar",
            "description": "Executes read-only PostgreSQL queries limited to SELECT statements with LIMIT and basic WHERE/ORDER BY. YOU MUST REASON HEAVILY ABOUT THE QUERY AND MAKE SURE IT OBEYS THE GRAMMAR.",
            "format": {
                "type": "grammar",
                "syntax": "lark",
                "definition": postgres_grammar
            }
        },
    ],
    parallel_tool_calls=False,
)

print("--- PG SQL Query ---")
print(response_pg.output[1].input)

| Dialect | Generated Query | Key Difference |
|---|---|---|
| MS SQL Server | `SELECT TOP 5 customer_id, order_id, order_date, total_amount FROM orders WHERE total_amount > 500 AND order_date > '2025-01-01' ORDER BY order_date DESC;` | Uses `TOP N` clause placed immediately after `SELECT` (before column list). |
| PostgreSQL | `SELECT customer_id, order_id, order_date, total_amount FROM orders WHERE total_amount > 500 AND order_date > '2025-01-01' ORDER BY order_date DESC LIMIT 5;` | Uses `LIMIT N` appended after `ORDER BY`. |

## Example - Regex CFG Syntax
The following code example demonstrates using the Regex CFG syntax to constrain the freeform tool call to a certain timestamp pattern.


In [75]:

timestamp_grammar_definition = r"^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01]) (?:[01]\d|2[0-3]):[0-5]\d$"

timestamp_prompt = (
        "Call the timestamp_grammar to save a timestamp for August 7th 2025 at 10AM."
)

response_mssql = client.responses.create(
    model="gpt-5",
    input=timestamp_prompt,
    text={"format": {"type": "text"}},
    tools=[
        {
            "type": "custom",
            "name": "timestamp_grammar",
            "description": "Saves a timestamp in date + time in 24-hr format.",
            "format": {
                "type": "grammar",
                "syntax": "regex",
                "definition": timestamp_grammar_definition
            }
        },
    ],
    parallel_tool_calls=False
)

print("--- Timestamp ---")
print(response_mssql.output[1].input)

--- Timestamp ---
2025-08-07 10:00


## Best Practices

Lark grammars can be tricky to perfect. While simple grammars perform most reliably, complex grammars often require iteration on the grammar definition itself, the prompt, and the tool description to ensure that the model does not go out of distribution.

- **Keep terminals bounded** – use `/[^.\n]{0,10}*\./` rather than `/.*\./`. Limit matches both by content (negated character class) and by length (`{M,N}` quantifier).
- **Prefer explicit char‑classes over `.` wildcards.**
- **Thread whitespace explicitly**, e.g. using `SP = " "`, instead of a global `%ignore`.
- **Describe your tool**: tell the model exactly what the CFG accepts and instruct it to reason heavily about compliance.

### Troubleshooting

- **API rejects the grammar because it is too complex** ➜ Simplify rules and terminals, remove `%ignore.*`.
- **Unexpected tokens** ➜ Confirm terminals aren't overlapping; check greedy lexer.
- **When the model drifts "out‑of‑distribution"** (shows up as the model producing excessively long or repetitive outputs, it is syntactically valid but is semantically wrong):
    - Tighten the grammar.
    - Iterate on the prompt (add few-shot examples) and tool description (explain the grammar and instruct the model to reason to conform to it).
    - Experiment with a higher reasoning effort (e.g, bump from medium to high).

### Resources:

- [Lark Docs](https://lark-parser.readthedocs.io/en/stable/)
- [Lark IDE](https://www.lark-parser.org/ide/)
- [LLGuidance Syntax](https://github.com/guidance-ai/llguidance/blob/main/docs/syntax.md)
- [Regex (Rust crate)](https://docs.rs/regex/latest/regex/#syntax)

## 3.6 Takeaways

Context-Free Grammar (CFG) support in GPT-5 lets you strictly constrain model output to match predefined syntax, ensuring only valid strings are generated. This is especially useful for enforcing programming language rules or custom formats, reducing post-processing and errors. By providing a precise grammar and clear tool description, you can make the model reliably stay within your target output structure.