Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
268 changes: 268 additions & 0 deletions docs/gemini-thinking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,268 @@
# Google Gemini 3 Reasoning/Thinking

Google Gemini 3 models support a "thinking mode" feature that allows the model to engage in internal reasoning before generating responses. This improves answer quality on complex, multi-step problems.

## What is Gemini Reasoning?

When reasoning is enabled, Gemini models:
- Allocate tokens for internal "thinking" before responding
- Work through problems step-by-step
- Show their reasoning process (unless excluded)
- Produce higher-quality answers on complex tasks

## Quick Start

### Fetch API

```typescript
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove code from docs/gemini-thinking.md
link to specific examples instead

const response = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${OPENROUTER_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'google/gemini-3-pro-preview',
reasoning: {
enabled: true, // Enable thinking mode
max_tokens: 4096, // Token budget for thinking
exclude: false // Show thoughts in response
},
messages: [
{
role: 'user',
content: 'Solve this complex problem step by step...'
}
]
})
});

const data = await response.json();
console.log('Reasoning:', data.choices[0].message.reasoning);
console.log('Reasoning tokens:', data.usage.completion_tokens_details.reasoning_tokens);
console.log('Answer:', data.choices[0].message.content);
```

### AI SDK v5

```typescript
import { createOpenRouter } from '@openrouter/ai-sdk-provider';
import { generateText } from 'ai';

const openrouter = createOpenRouter({
apiKey: process.env.OPENROUTER_API_KEY,
});

const result = await generateText({
model: openrouter('google/gemini-3-pro-preview'),
providerOptions: {
openrouter: {
reasoning: {
enabled: true,
maxTokens: 4096,
exclude: false
}
}
},
messages: [
{
role: 'user',
content: 'Solve this complex problem step by step...'
}
]
});

console.log('Answer:', result.text);
console.log('Reasoning tokens:', result.providerMetadata?.openrouter?.usage?.completionTokensDetails?.reasoningTokens);
```

## Examples

### Fetch Examples
- [basic-reasoning.ts](../typescript/fetch/src/gemini-thinking/basic-reasoning.ts) - Basic reasoning with a multi-step problem

### AI SDK v5 Examples
- [basic-reasoning.ts](../typescript/ai-sdk-v5/src/gemini-thinking/basic-reasoning.ts) - Basic reasoning using Vercel AI SDK v5

## API Reference

### Request Parameters

```typescript
{
model: 'google/gemini-3-pro-preview' | 'google/gemini-2.5-pro' | 'google/gemini-2.5-flash',
reasoning: {
// Option 1: Simple enable (uses default budget)
enabled: true,

// Option 2: Control token budget
max_tokens: 4096, // -1 for dynamic, 0 to disable

// Option 3: Use effort level (maps to budget automatically)
effort: 'low' | 'medium' | 'high',

// Option 4: Hide thoughts from response
exclude: true // false = show thoughts, true = hide thoughts
},
messages: [...]
}
```

### Response Format

```typescript
{
choices: [{
message: {
content: "The final answer",
reasoning: "The model's thinking process...", // Only if exclude: false
reasoning_details: [ // Structured reasoning metadata
{
type: "reasoning.text",
text: "Internal reasoning text",
format: "gemini",
index: 0
},
{
type: "reasoning.encrypted", // Google's thoughtSignature
data: "encrypted_signature_string",
format: "gemini",
index: 0
}
]
}
}],
usage: {
prompt_tokens: 123,
completion_tokens: 456,
total_tokens: 579,
completion_tokens_details: {
reasoning_tokens: 234 // Tokens used for thinking
}
}
}
```

## Supported Models

| Model | Reasoning Support | Max Thinking Tokens | Context Window |
|-------|------------------|---------------------|----------------|
| `google/gemini-3-pro-preview` | MANDATORY (always enabled) | 200,000 | 1,048,576 (1M) |
| `google/gemini-2.5-pro` | MANDATORY (always enabled) | 32,768 | 2,097,152 (2M) |
| `google/gemini-2.5-flash` | OPTIONAL | 24,576 | 1,048,576 (1M) |

**Note:** For Gemini 3 Pro and Gemini 2.5 Pro, reasoning is mandatory and always enabled. The model will use thinking tokens even if not explicitly requested.

## Key Concepts

### Thinking Budget

The `max_tokens` parameter controls how many tokens the model can use for internal reasoning:

- **-1 (dynamic)**: Model determines budget automatically
- **0**: Disable reasoning (only for optional models)
- **Positive number**: Specific token budget (clamped to model limits)

### Effort Levels

Instead of specifying exact token counts, you can use effort levels:

- **low**: Minimal thinking (faster, lower cost)
- **medium**: Balanced thinking (default)
- **high**: Maximum thinking (slower, higher quality)

OpenRouter automatically maps these to appropriate token budgets for each model.

### Excluding Thoughts

Set `exclude: true` to hide the thinking process and only receive the final answer:

```typescript
reasoning: {
enabled: true,
exclude: true // Thoughts used internally but not returned
}
```

This reduces response size and latency while still benefiting from reasoning.

## Important Notes

### Preserving Reasoning Details

**CRITICAL:** When continuing a conversation, you must include `reasoning_details` from previous responses in follow-up requests. Google requires the `thoughtSignature` to be preserved.

```typescript
// First request
const response1 = await fetch(...);

// Follow-up request - must include reasoning_details
const response2 = await fetch(..., {
body: JSON.stringify({
messages: [
{
role: 'assistant',
content: response1.choices[0].message.content,
reasoning_details: response1.choices[0].message.reasoning_details // REQUIRED
},
{
role: 'user',
content: 'Follow-up question'
}
]
})
});
```

### Cost Considerations

Reasoning tokens are billed separately:
- Thinking tokens typically cost less than output tokens
- More thinking = higher cost but better quality
- Use effort levels to balance cost vs quality

### Latency Trade-offs

More thinking tokens = longer response times:
- **Low effort**: Fast responses, good for simple tasks
- **High effort**: Slower responses, better for complex reasoning

### OpenRouter Transformation

OpenRouter automatically transforms Google's native API to OpenAI-compatible format:

| Google Native | OpenRouter Format |
|--------------|------------------|
| `generationConfig.thinkingConfig` | `reasoning` parameter |
| `usageMetadata.thoughtsTokenCount` | `usage.completion_tokens_details.reasoning_tokens` |
| `parts[].thought: true` | `message.reasoning` |
| `thoughtSignature` | `reasoning_details[].data` |

This allows you to use the standard OpenAI format while accessing Google's thinking features.

## Resources

- [OpenRouter Docs - Reasoning Tokens](https://openrouter.ai/docs/use-cases/reasoning-tokens)
- [Google Gemini API Documentation](https://ai.google.dev/docs)
- [Full Examples Repository](https://github.com/openrouter/openrouter-examples)

## Troubleshooting

### No reasoning tokens in response

**Check:**
1. Is the model supported? (Gemini 2.5+)
2. Is `reasoning.enabled` set to `true`?
3. Is the token budget > 0?

### "thought_signature" error in follow-up requests

**Solution:** Include `reasoning_details` from previous responses when continuing conversations.

### High costs

**Solution:** Use lower effort levels or reduce `max_tokens` for thinking budget.

### Slow responses

**Solution:** Use lower effort levels or smaller thinking budgets to reduce latency.
126 changes: 126 additions & 0 deletions typescript/fetch/src/gemini-thinking/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Google Gemini 3 Reasoning/Thinking Examples

This directory contains examples of using Google Gemini 3's reasoning/thinking feature via OpenRouter.

## What is Gemini Reasoning?

Gemini 3 models can engage in internal reasoning before generating responses. This "thinking mode" allows the model to:
- Work through complex problems step-by-step
- Show its reasoning process
- Improve answer quality on difficult tasks

## How It Works

1. **Request**: Set `reasoning.enabled: true` (or `reasoning.max_tokens`, or `reasoning.effort`)
2. **Processing**: The model uses "thinking tokens" for internal reasoning
3. **Response**: You receive both the reasoning process and the final answer

## Examples

### `basic-reasoning.ts`

Demonstrates basic usage of Gemini reasoning with a multi-step problem.

**Run:**
```bash
bun run src/gemini-thinking/basic-reasoning.ts
```

**Key Features:**
- Enables reasoning mode
- Shows thinking token usage
- Displays reasoning process
- Returns final answer

## API Parameters

### Request Format

```typescript
{
model: 'google/gemini-3-pro-preview',
reasoning: {
enabled: true, // Enable thinking mode
max_tokens: 4096, // Token budget for thinking
exclude: false // true = hide thoughts, false = show thoughts
},
messages: [...]
}
```

### Alternative: Effort Levels

```typescript
{
model: 'google/gemini-3-pro-preview',
reasoning: {
effort: 'medium' // 'low', 'medium', 'high'
},
messages: [...]
}
```

## Response Format

```typescript
{
choices: [{
message: {
content: "The final answer",
reasoning: "The model's thinking process...",
reasoning_details: [
{
type: "reasoning.text",
text: "Internal reasoning...",
format: "gemini"
},
{
type: "reasoning.encrypted",
data: "encrypted_signature",
format: "gemini"
}
]
}
}],
usage: {
prompt_tokens: 123,
completion_tokens: 456,
completion_tokens_details: {
reasoning_tokens: 234 // Tokens used for thinking
}
}
}
```

## Key Points

### Model Support
- ✅ `google/gemini-3-pro-preview` - Reasoning MANDATORY (always enabled)
- ✅ `google/gemini-2.5-pro` - Reasoning MANDATORY (always enabled)
- ✅ `google/gemini-2.5-flash` - Reasoning OPTIONAL

### Token Budgets
- **Gemini 3 Pro**: Max 200,000 thinking tokens, 1M context window
- **Gemini 2.5 Pro**: Max 32,768 thinking tokens
- **Gemini 2.5 Flash**: Max 24,576 thinking tokens

### Important Notes
- **Preserve reasoning_details**: Must include `reasoning_details` from previous messages in follow-up requests
- **Cost**: Thinking tokens are billed separately (usually at a lower rate)
- **Latency**: More thinking tokens = longer response time
- **Quality**: Higher thinking budgets improve answer quality on complex tasks

## OpenRouter Transformation

OpenRouter automatically transforms Google's native API to OpenAI-compatible format:

| Google Native | OpenRouter (OpenAI-compatible) |
|--------------|-------------------------------|
| `usageMetadata.thoughtsTokenCount` | `usage.completion_tokens_details.reasoning_tokens` |
| `parts[].thought: true` | `message.reasoning` |
| `thoughtSignature` | `reasoning_details[].data` |

## Resources

- [OpenRouter Docs - Reasoning Tokens](https://openrouter.ai/docs/use-cases/reasoning-tokens)
- [Google Gemini API Docs](https://ai.google.dev/docs)
Loading