I'll create a well-structured blog article based on the Databricks documentation about reasoning models.

---

# Understanding Reasoning Models in Databricks: A Complete Guide

**Last Updated: November 20, 2025**

Artificial intelligence has evolved beyond simple pattern matching. Today's foundation models can "think through" problems, breaking them down step-by-step before providing answers. This capability, known as reasoning, represents a significant leap forward in AI capabilities. In this comprehensive guide, we'll explore how Databricks implements reasoning models through their Foundation Model API.

## What Are Reasoning Models?

Reasoning models are advanced AI systems that can tackle complex tasks by engaging in a structured thought process before generating responses. Unlike traditional models that produce immediate outputs, reasoning models allocate computational resources to "think" about the problem, considering different approaches and refining their understanding before delivering a final answer.

Some reasoning models even provide transparency by revealing their step-by-step thought process, allowing users to understand how they arrived at their conclusions.

## Two Types of Reasoning Approaches

Databricks categorizes reasoning models into two distinct types:

### 1. Reasoning-Only Models

These models always engage in internal reasoning for every response. Examples include:
- **GPT-5 family**: databricks-gpt-5-1, databricks-gpt-5, databricks-gpt-5-mini, and databricks-gpt-5-nano
- **GPT OSS models**: databricks-gpt-oss-120b and databricks-gpt-oss-20b

These models use a `reasoning_effort` parameter that controls how deeply they think through problems. Higher reasoning effort can produce more thoughtful and accurate responses, though it may increase latency and token usage.

### 2. Hybrid Reasoning Models

Hybrid models offer flexibility by supporting both instant responses and deeper reasoning when needed. This category includes:
- **Claude models**: databricks-claude-sonnet-4 and databricks-claude-sonnet-4-5
- **Gemini 3 models**: databricks-gemini-3-pro
- **Gemini 2.5 models**: databricks-gemini-2-5-pro and databricks-gemini-2-5-flash

Hybrid models allow developers to choose when to invoke reasoning capabilities, making them ideal for applications that need to balance speed with complexity.

## Key Parameters for Controlling Reasoning

Different models use different parameters to control their reasoning behavior:

### For Claude and Gemini 2.5 Models: `thinking.budget_tokens`

This parameter controls how many tokens the model can allocate to internal thought processes. Higher budgets enable the model to tackle more complex tasks, though optimal results typically occur below 32K tokens. Importantly, `budget_tokens` must be less than `max_tokens`.

### For GPT-5 and GPT OSS Models: `reasoning_effort`

This parameter accepts different values depending on the model:
- **GPT-5.1**: Set to "none" by default, but can be overridden
- **GPT-5 models**: Set to "minimal" by default
- **GPT OSS models**: Accepts "low", "medium" (default), or "high"

### For Gemini 3 Models: `reasoning_effort`

Accepts "low" (default), "medium", or "high" values to control the depth of reasoning.

## Practical Implementation Examples

Let's look at how to query these models using Python and the OpenAI client library:

### Querying a Claude Model with Reasoning

```python
from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ.get('YOUR_DATABRICKS_TOKEN'),
    base_url=os.environ.get('YOUR_DATABRICKS_BASE_URL')
)

response = client.chat.completions.create(
    model="databricks-claude-3-7-sonnet",
    messages=[
        {"role": "user", "content": "Why is the sky blue?"}
    ],
    max_tokens=20480,
    extra_body={
        "thinking": {
            "type": "enabled",
            "budget_tokens": 10240
        }
    }
)

# Extract reasoning and answer
msg = response.choices[0].message
reasoning = msg.content[0]["summary"][0]["text"]
answer = msg.content[1]["text"]

print("Reasoning:", reasoning)
print("Answer:", answer)
```

This example demonstrates how Claude models can expose their reasoning process, allowing you to see both the thought process and the final answer.

## Understanding the Response Structure

When a reasoning model responds, it includes multiple content blocks:

1. **Reasoning block**: Contains the model's internal thought process (when exposed)
2. **Text block**: Contains the final answer

This structure allows applications to choose whether to display the reasoning process to users or only show the final answer.

## Managing Multi-Turn Conversations

For models like Claude that expose reasoning tokens, managing multi-turn conversations requires careful consideration. The reasoning blocks from only the most recent assistant turn are visible to the model and counted as input tokens.

You have two options:

**Option 1: Omit reasoning blocks** - If the model doesn't need to reference its previous thought process, you can exclude reasoning blocks to save tokens.

**Option 2: Include full assistant messages** - If you're building experiences that surface intermediate reasoning or need the model to reflect on its previous thinking, include the complete assistant message with reasoning blocks.

## How Reasoning Models Work Under the Hood

Reasoning models introduce special reasoning tokens alongside standard input and output tokens. These tokens enable the model to:

1. **Break down complex prompts** into manageable components
2. **Consider multiple approaches** to solving the problem
3. **Refine understanding** before committing to an answer
4. **Generate a final response** based on this internal deliberation

Different model families handle these reasoning tokens differently. Some expose them to users for transparency, while others keep them internal and only show the final output.

## Best Practices for Using Reasoning Models

1. **Match reasoning effort to task complexity**: Simple queries don't need high reasoning effort, while complex problems benefit from it
2. **Monitor token usage**: Reasoning consumes additional tokens, so balance quality needs with cost considerations
3. **Consider latency requirements**: Higher reasoning effort increases response time
4. **Choose the right model type**: Use reasoning-only models for consistently complex tasks, hybrid models for varied workloads
5. **Leverage transparency**: When models expose reasoning, use it to validate outputs and improve prompts

## Conclusion

Reasoning models represent a significant advancement in AI capabilities, offering enhanced problem-solving abilities for complex tasks. Databricks' Foundation Model API provides unified access to various reasoning models, each with unique characteristics and control parameters.

By understanding the differences between reasoning-only and hybrid models, and learning to effectively use parameters like `reasoning_effort` and `budget_tokens`, developers can build more sophisticated AI applications that balance performance, cost, and quality.

Whether you're building analytical tools, decision support systems, or complex reasoning applications, Databricks' reasoning models provide the flexibility and power needed to tackle challenging problems effectively.

---

**Related Resources:**
- [Query Chat Models on Databricks](https://docs.databricks.com/aws/en/machine-learning/model-serving/query-chat-models)
- [Query Embedding Models](https://docs.databricks.com/aws/en/machine-learning/model-serving/query-embedding-models)
- [Query Vision Models](https://docs.databricks.com/aws/en/machine-learning/model-serving/query-vision-models)

All reasoning models are accessed through the chat completions endpoint.

## Claude Model

In [0]:
!pip install -qq databricks-sdk[openai]>=0.35.0

In [0]:
dbutils.library.restartPython()

In [0]:
from openai import OpenAI
import base64
import httpx

client = OpenAI(
  api_key="",
  base_url="https://fe-vm-agentic-ai.cloud.databricks.com/serving-endpoints"
  )

response = client.chat.completions.create(
    model="databricks-claude-3-7-sonnet",
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
    max_tokens=20480,
    extra_body={
        "thinking": {
            "type": "enabled",
            "budget_tokens": 10240
        }
    }
)

msg = response.choices[0].message
reasoning = msg.content[0]["summary"][0]["text"]
answer = msg.content[1]["text"]

print("Reasoning:", reasoning)
print("Answer:", answer)

In [0]:
print(msg.to_json())

In [0]:
from openai import OpenAI
import base64
import httpx

client = OpenAI(
  api_key="",
  base_url="https://fe-vm-agentic-ai.cloud.databricks.com/serving-endpoints"
  )

response = client.chat.completions.create(
    model="databricks-claude-3-7-sonnet",
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
    max_tokens=20480,
    extra_body={
        "thinking": {
            "type": "disabled"
        }
    }
)

msg = response.choices[0].message
# reasoning = msg.content[0]["summary"][0]["text"]
# answer = msg.content[1]["text"]

# print("Reasoning:", reasoning)
# print("Answer:", answer)

In [0]:
print(msg.to_json())

## GPT-5.1 Model

In [0]:
from openai import OpenAI
import base64
import httpx

client = OpenAI(
  api_key="",
  base_url="https://fe-vm-agentic-ai.cloud.databricks.com/serving-endpoints"
  )

response = client.chat.completions.create(
    model="databricks-gpt-5-1",
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
    max_tokens=20480,
    reasoning_effort= "high"
)

msg = response.choices[0].message

In [0]:
print(msg.to_json())

In [0]:
print(msg.to_json())