Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 90 additions & 0 deletions docs/providers/deepinfra.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
---
sidebar_label: DeepInfra
description: Configure DeepInfra's high-performance AI models in Roo Code. Access Qwen Coder, Llama, and other open-source models with prompt caching and vision capabilities.
keywords:
- deepinfra
- deep infra
- roo code
- api provider
- qwen coder
- llama models
- prompt caching
- vision models
- open source ai
image: /img/social-share.jpg
---

# Using DeepInfra With Roo Code

DeepInfra provides cost-effective access to high-performance open-source models with features like prompt caching, vision support, and specialized coding models. Their infrastructure offers low latency and automatic load balancing across global edge locations.

**Website:** [https://deepinfra.com/](https://deepinfra.com/)

---

## Getting an API Key

1. **Sign Up/Sign In:** Go to [DeepInfra](https://deepinfra.com/). Create an account or sign in.
2. **Navigate to API Keys:** Access the API keys section in your dashboard.
3. **Create a Key:** Generate a new API key. Give it a descriptive name (e.g., "Roo Code").
4. **Copy the Key:** **Important:** Copy the API key immediately. Store it securely.

---

## Supported Models

Roo Code dynamically fetches available models from DeepInfra's API. The default model is:

* `Qwen/Qwen3-Coder-480B-A35B-Instruct-Turbo` (256K context, optimized for coding)

Common models available include:

* **Coding Models:** Qwen Coder series, specialized for programming tasks
* **General Models:** Llama 3.1, Mixtral, and other open-source models
* **Vision Models:** Models with image understanding capabilities
* **Reasoning Models:** Models with advanced reasoning support

Browse the full catalog at [deepinfra.com/models](https://deepinfra.com/models).

---

## Configuration in Roo Code

1. **Open Roo Code Settings:** Click the gear icon (<Codicon name="gear" />) in the Roo Code panel.
2. **Select Provider:** Choose "DeepInfra" from the "API Provider" dropdown.
3. **Enter API Key:** Paste your DeepInfra API key into the "DeepInfra API Key" field.
4. **Select Model:** Choose your desired model from the "Model" dropdown.
- Models will auto-populate after entering a valid API key
- Click "Refresh Models" to update the list

---

## Advanced Features

### Prompt Caching

DeepInfra supports prompt caching for eligible models, which:
- Reduces costs for repeated contexts
- Improves response times for similar queries
- Automatically manages cache based on task IDs

### Vision Support

Models with vision capabilities can:
- Process images alongside text
- Understand visual content for coding tasks
- Analyze screenshots and diagrams

### Custom Base URL

For enterprise deployments, you can configure a custom base URL in the advanced settings.

---

## Tips and Notes

* **Performance:** DeepInfra offers low latency with automatic load balancing across global locations.
* **Cost Efficiency:** Competitive pricing with prompt caching to reduce costs for repeated contexts.
* **Model Variety:** Access to the latest open-source models including specialized coding models.
* **Context Windows:** Models support context windows up to 256K tokens for large codebases.
* **Pricing:** Pay-per-use model with no minimums. Check [deepinfra.com](https://deepinfra.com/) for current pricing.
1 change: 1 addition & 0 deletions docs/update-notes/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ image: /img/social-share.jpg

### Version 3.26

* [3.26.7](/update-notes/v3.26.7) (2025-09-05)
* [3.26.6](/update-notes/v3.26.6) (2025-09-03)
* [3.26.5](/update-notes/v3.26.5) (2025-09-03)
* [3.26.4](/update-notes/v3.26.4) (2025-09-01)
Expand Down
63 changes: 63 additions & 0 deletions docs/update-notes/v3.26.7.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
description: Enhanced Kimi K2 models with 256K+ context windows, OpenAI service tiers for flexible pricing, and DeepInfra as a new provider with 100+ models.
keywords:
- roo code 3.26.7
- kimi k2 models
- openai service tiers
- deepinfra provider
- bug fixes
image: /img/social-share.jpg
---

# Roo Code 3.26.7 Release Notes (2025-09-05)

This release brings enhanced Kimi K2 models with massive context windows, OpenAI service tier selection, and DeepInfra as a new provider offering 100+ models.

## Kimi K2-0905: Moonshot's Latest Open Source Model is Live in Roo Code

We've upgraded to the latest Kimi K2-0905 models across multiple providers (thanks CellenLee!) ([#7663](https://github.com/RooCodeInc/Roo-Code/pull/7663), [#7693](https://github.com/RooCodeInc/Roo-Code/pull/7693)):

K2-0905 comes with three major upgrades:
- **256K Context Window**: Massive context supporting up to 256K-262K tokens, doubling the previous limit for processing much larger documents and conversations
- **Improved Tool Calling**: Enhanced function calling and tool use capabilities for better agentic workflows
- **Enhanced Front-end Development**: Superior HTML, CSS, and JavaScript generation with modern framework support

Available through Groq, Moonshot, and Fireworks providers. These models excel at handling large codebases, long conversations, and complex multi-file operations.

## OpenAI Service Tiers

We've added support for OpenAI's new Responses API service tiers ([#7646](https://github.com/RooCodeInc/Roo-Code/pull/7646)):

- **Standard Tier**: Default tier with regular pricing
- **Flex Tier**: 50% discount with slightly longer response times for non-urgent tasks
- **Priority Tier**: Faster response times for time-critical operations

Select your preferred tier directly in the UI based on your needs and budget. This gives you more control over costs while maintaining access to OpenAI's powerful models.

> **📚 Documentation**: See [OpenAI Provider Guide](/providers/openai) for detailed tier comparison and pricing.

## DeepInfra Provider

DeepInfra is now available as a model provider (thanks Thachnh!) ([#7677](https://github.com/RooCodeInc/Roo-Code/pull/7677)):

- **100+ Models**: Access to a vast selection of open-source and frontier models
- **Competitive Pricing**: Very cost-effective rates compared to other providers
- **Automatic Prompt Caching**: Built-in prompt caching for supported models like Qwen3 Coder
- **Fast Inference**: Optimized infrastructure for quick response times

DeepInfra is an excellent choice for developers looking for variety and value in their AI model selection.

> **📚 Documentation**: See [DeepInfra Provider Setup](/providers/deepinfra) to get started.

## QOL Improvements

* **Shell Security**: Added shell executable allowlist validation with platform-specific fallbacks for improved command execution safety ([#7681](https://github.com/RooCodeInc/Roo-Code/pull/7681))

## Bug Fixes

* **MCP Tool Validation**: Roo now validates MCP tool existence before execution and shows helpful error messages with available tools (thanks R-omk!) ([#7632](https://github.com/RooCodeInc/Roo-Code/pull/7632))
* **OpenAI API Key Errors**: Clear error messages now display when API keys contain invalid characters instead of cryptic ByteString errors (thanks A0nameless0man!) ([#7586](https://github.com/RooCodeInc/Roo-Code/pull/7586))
* **Follow-up Questions**: Fixed countdown timer incorrectly reappearing in task history for already answered follow-up questions (thanks XuyiK!) ([#7686](https://github.com/RooCodeInc/Roo-Code/pull/7686))
* **Moonshot Token Limit**: Resolved issue where Moonshot models were incorrectly limited to 1024 tokens, now properly respects configured limits (thanks wangxiaolong100, greyishsong!) ([#7673](https://github.com/RooCodeInc/Roo-Code/pull/7673))
* **Zsh Command Safety**: Improved handling of zsh process substitution and glob qualifiers to prevent auto-execution of potentially dangerous commands ([#7658](https://github.com/RooCodeInc/Roo-Code/pull/7658), [#7667](https://github.com/RooCodeInc/Roo-Code/pull/7667))
* **Traditional Chinese Localization**: Fixed typo in zh-TW locale text (thanks PeterDaveHello!) ([#7672](https://github.com/RooCodeInc/Roo-Code/pull/7672))
31 changes: 31 additions & 0 deletions docs/update-notes/v3.26.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -94,8 +94,32 @@ PRs: [#7474](https://github.com/RooCodeInc/Roo-Code/pull/7474), [#7492](https://

> **📚 Documentation**: See [Image Generation - Editing Existing Images](/features/image-generation#editing-existing-images) for transformation examples.

### Kimi K2-0905: Moonshot's Latest Open Source Model is Live in Roo Code

We've upgraded to the latest Kimi K2-0905 models across multiple providers (thanks CellenLee!) ([#7663](https://github.com/RooCodeInc/Roo-Code/pull/7663), [#7693](https://github.com/RooCodeInc/Roo-Code/pull/7693)):

K2-0905 comes with three major upgrades:
- **256K Context Window**: Massive context supporting up to 256K-262K tokens, doubling the previous limit for processing much larger documents and conversations
- **Improved Tool Calling**: Enhanced function calling and tool use capabilities for better agentic workflows
- **Enhanced Front-end Development**: Superior HTML, CSS, and JavaScript generation with modern framework support

Available through Groq, Moonshot, and Fireworks providers. These models excel at handling large codebases, long conversations, and complex multi-file operations.

### OpenAI Service Tiers

We've added support for OpenAI's new Responses API service tiers ([#7646](https://github.com/RooCodeInc/Roo-Code/pull/7646)):

- **Standard Tier**: Default tier with regular pricing
- **Flex Tier**: 50% discount with slightly longer response times for non-urgent tasks
- **Priority Tier**: Faster response times for time-critical operations

Select your preferred tier directly in the UI based on your needs and budget. This gives you more control over costs while maintaining access to OpenAI's powerful models.

> **📚 Documentation**: See [OpenAI Provider Guide](/providers/openai) for detailed tier comparison and pricing.

### Provider Updates

* **DeepInfra Provider**: DeepInfra is now available as a model provider with 100+ open-source and frontier models, competitive pricing, and automatic prompt caching for supported models like Qwen3 Coder (thanks Thachnh!) ([#7677](https://github.com/RooCodeInc/Roo-Code/pull/7677))
* **Kimi K2 Turbo Model**: Added support for the high-speed Kimi K2 Turbo model with 60-100 tokens/sec processing and a 131K token context window (thanks wangxiaolong100!) ([#7593](https://github.com/RooCodeInc/Roo-Code/pull/7593))
* **Qwen3 235B Thinking Model**: Added support for Qwen3-235B-A22B-Thinking-2507 model with an impressive 262K context window, enabling processing of extremely long documents and large codebases in a single request through the Chutes provider (thanks mohammad154, apple-techie!) ([#7578](https://github.com/RooCodeInc/Roo-Code/pull/7578))
* **Ollama Turbo Mode**: Added API key support for Turbo mode, enabling faster model execution with datacenter-grade hardware (thanks LivioGama!) ([#7425](https://github.com/RooCodeInc/Roo-Code/pull/7425))
Expand All @@ -104,6 +128,7 @@ PRs: [#7474](https://github.com/RooCodeInc/Roo-Code/pull/7474), [#7492](https://

### QOL Improvements

* **Shell Security**: Added shell executable allowlist validation with platform-specific fallbacks for improved command execution safety ([#7681](https://github.com/RooCodeInc/Roo-Code/pull/7681))
* **Settings Scroll Position**: Settings tabs now remember their individual scroll positions when switching between them (thanks DC-Dancao!) ([#7587](https://github.com/RooCodeInc/Roo-Code/pull/7587))
* **MCP Resource Auto-Approval**: MCP resource access requests are now automatically approved when auto-approve is enabled, eliminating manual approval steps and enabling smoother automation workflows (thanks m-ibm!) ([#7606](https://github.com/RooCodeInc/Roo-Code/pull/7606))
* **Message Queue Performance**: Improved message queueing reliability and performance by moving the queue management to the extension host, making the interface more stable ([#7604](https://github.com/RooCodeInc/Roo-Code/pull/7604))
Expand All @@ -122,6 +147,12 @@ PRs: [#7474](https://github.com/RooCodeInc/Roo-Code/pull/7474), [#7492](https://

### Bug Fixes

* **MCP Tool Validation**: Roo now validates MCP tool existence before execution and shows helpful error messages with available tools (thanks R-omk!) ([#7632](https://github.com/RooCodeInc/Roo-Code/pull/7632))
* **OpenAI API Key Errors**: Clear error messages now display when API keys contain invalid characters instead of cryptic ByteString errors (thanks A0nameless0man!) ([#7586](https://github.com/RooCodeInc/Roo-Code/pull/7586))
* **Follow-up Questions**: Fixed countdown timer incorrectly reappearing in task history for already answered follow-up questions (thanks XuyiK!) ([#7686](https://github.com/RooCodeInc/Roo-Code/pull/7686))
* **Moonshot Token Limit**: Resolved issue where Moonshot models were incorrectly limited to 1024 tokens, now properly respects configured limits (thanks wangxiaolong100, greyishsong!) ([#7673](https://github.com/RooCodeInc/Roo-Code/pull/7673))
* **Zsh Command Safety**: Improved handling of zsh process substitution and glob qualifiers to prevent auto-execution of potentially dangerous commands ([#7658](https://github.com/RooCodeInc/Roo-Code/pull/7658), [#7667](https://github.com/RooCodeInc/Roo-Code/pull/7667))
* **Traditional Chinese Localization**: Fixed typo in zh-TW locale text (thanks PeterDaveHello!) ([#7672](https://github.com/RooCodeInc/Roo-Code/pull/7672))
* **Tool Approval Fix**: Fixed an error that occurred when using insert_content and search_and_replace tools on write-protected files - these tools now handle file protection correctly ([#7649](https://github.com/RooCodeInc/Roo-Code/pull/7649))
* **Configurable Embedding Batch Size**: Fixed an issue where users with API providers having stricter batch limits couldn't use code indexing. You can now configure the embedding batch size (1-2048, default: 400) to match your provider's limits (thanks BenLampson!) ([#7464](https://github.com/RooCodeInc/Roo-Code/pull/7464))
* **OpenAI-Native Cache Reporting**: Fixed cache usage statistics and cost calculations when using the OpenAI-Native provider with cached content ([#7602](https://github.com/RooCodeInc/Roo-Code/pull/7602))
Expand Down
2 changes: 2 additions & 0 deletions sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,7 @@ const sidebars: SidebarsConfig = {
'providers/claude-code',
'providers/bedrock',
'providers/cerebras',
'providers/deepinfra',
'providers/deepseek',
'providers/doubao',
'providers/featherless',
Expand Down Expand Up @@ -221,6 +222,7 @@ const sidebars: SidebarsConfig = {
label: '3.26',
items: [
{ type: 'doc', id: 'update-notes/v3.26', label: '3.26 Combined' },
{ type: 'doc', id: 'update-notes/v3.26.7', label: '3.26.7' },
{ type: 'doc', id: 'update-notes/v3.26.6', label: '3.26.6' },
{ type: 'doc', id: 'update-notes/v3.26.5', label: '3.26.5' },
{ type: 'doc', id: 'update-notes/v3.26.4', label: '3.26.4' },
Expand Down