Skip to content

Commit bb256c6

Browse files
Release note updates + Responses API Bridge improvements (#11740)
* docs: track which items need docs * docs(anthropic.md): add tool_choice="none" to docs * docs: add docs for new anthropic + perplexity features * docs: cleanup mistral reasoning docs * docs: add links to docs * docs(index.md): update docs * docs: refactor to add a new 'integrations' tab to docs * refactor(docs/): create separate tab for integrations make it easier to highlight new integrations * docs: sort sidebar * docs: update * feat: working claude code with openai codex mini * docs: add responses api to docs * feat(index.md): update docs * fix: fix linting error
1 parent d15664c commit bb256c6

File tree

10 files changed

+149
-21
lines changed

10 files changed

+149
-21
lines changed
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
import Image from '@theme/IdealImage';
2+
3+
# Call Responses API models on Claude Code
4+
5+
This tutorial shows how to call the Responses API models like `codex-mini` and `o3-pro` from the Claude Code endpoint on LiteLLM.
6+
7+
8+
Pre-requisites:
9+
10+
- [Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview) installed
11+
- LiteLLM v1.72.6-stable or higher
12+
13+
14+
### 1. Setup config.yaml
15+
16+
```yaml
17+
model_list:
18+
- model_name: codex-mini
19+
litellm_params:
20+
model: codex-mini
21+
api_key: sk-proj-1234567890
22+
api_base: https://api.openai.com/v1
23+
```
24+
25+
### 2. Start proxy
26+
27+
```bash
28+
litellm --config /path/to/config.yaml
29+
30+
# RUNNING on http://0.0.0.0:4000
31+
```
32+
33+
### 3. Test it! (Curl)
34+
35+
```bash
36+
curl -X POST http://0.0.0.0:4000/v1/messages \
37+
-H "Authorization: Bearer sk-proj-1234567890" \
38+
-H "Content-Type: application/json" \
39+
-d '{
40+
"model": "codex-mini",
41+
"messages": [{"role": "user", "content": "What is the capital of France?"}]
42+
}'
43+
```
44+
45+
### 4. Test it! (Claude Code)
46+
47+
- Setup environment variables
48+
49+
```bash
50+
export ANTHROPIC_API_BASE="http://0.0.0.0:4000"
51+
export ANTHROPIC_API_KEY="sk-1234" # replace with your LiteLLM key
52+
```
53+
54+
- Start a Claude Code session
55+
56+
```bash
57+
claude --model codex-mini-latest
58+
```
59+
60+
- Send a message
61+
62+
<Image img={require('../../img/release_notes/claude_code_demo.png')} style={{ width: '500px', height: 'auto' }} />
Loading
Loading
Loading

docs/my-website/release_notes/v1.72.6-stable/index.md

Lines changed: 38 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,17 +31,52 @@ This version is not out yet.
3131

3232
## TLDR
3333

34-
* **Why Upgrade**
3534

35+
* **Why Upgrade**
36+
- Codex-mini on Claude Code: You can now use `codex-mini` (OpenAI’s code assistant model) via Claude Code.
37+
- MCP Permissions Management: Manage permissions for MCP Servers by Keys, Teams, Organizations (entities) on LiteLLM.
38+
- UI: Turn on/off auto refresh on logs view.
39+
- Rate Limiting: Support for output token-only rate limiting.
3640
* **Who Should Read**
41+
- Teams using `/v1/messages` API (Claude Code)
42+
- Teams using **MCP**
43+
- Teams giving access to self-hosted models and setting rate limits
3744
* **Risk of Upgrade**
38-
45+
- **Low**
46+
- No major changes to existing functionality or package updates.
3947

4048

4149
---
4250

4351
## Key Highlights
4452

53+
54+
### MCP Permissions Management
55+
56+
This release brings support for managing permissions for MCP Servers by Keys, Teams, Organizations (entities) on LiteLLM. When a MCP client attempts to list tools, LiteLLM will only return the tools the entity has permissions to access.
57+
58+
This is great for use cases that require access to restricted data (e.g Jira MCP) that you don't want everyone to use.
59+
60+
For Proxy Admins, this enables centralized management of all MCP Servers with access control. For developers, this means you'll only see the MCP tools assigned to you.
61+
62+
63+
<Image img={require('../../img/release_notes/mcp_permissions.png')}/>
64+
65+
### Codex-mini on Claude Code
66+
67+
This release brings support for calling `codex-mini` (OpenAI’s code assistant model) via Claude Code.
68+
69+
This is done by LiteLLM enabling any Responses API model (including `o3-pro`) to be called via `/chat/completions` and `/v1/messages` endpoints. This includes:
70+
71+
- Streaming calls
72+
- Non-streaming calls
73+
- Cost Tracking on success + failure for Responses API models
74+
75+
Here's how to use it [today](../../docs/tutorials/claude_responses_api)
76+
77+
78+
<Image img={require('../../img/release_notes/codex_on_claude_code.jpg')} />
79+
4580
---
4681

4782

@@ -202,7 +237,7 @@ This version is not out yet.
202237
- Make all commands show server URL - [PR](https://github.com/BerriAI/litellm/pull/10801)
203238
- **Unicorn**
204239
- Allow setting keep alive timeout - [PR](https://github.com/BerriAI/litellm/pull/11594)
205-
- **Experimental Rate Limiting v2**
240+
- **Experimental Rate Limiting v2** (enable via `EXPERIMENTAL_MULTI_INSTANCE_RATE_LIMITING="True"`)
206241
- Support specifying rate limit by output_tokens only - [PR](https://github.com/BerriAI/litellm/pull/11646)
207242
- Decrement parallel requests on call failure - [PR](https://github.com/BerriAI/litellm/pull/11646)
208243
- In-memory only rate limiting support - [PR](https://github.com/BerriAI/litellm/pull/11646)

docs/my-website/sidebars.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -507,6 +507,7 @@ const sidebars = {
507507
"tutorials/tag_management",
508508
'tutorials/litellm_proxy_aporia',
509509
"tutorials/gemini_realtime_with_audio",
510+
"tutorials/claude_responses_api",
510511
{
511512
type: "category",
512513
label: "LiteLLM Python SDK Tutorials",

litellm/completion_extras/litellm_responses_transformation/handler.py

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@ def completion(
107107
headers=headers,
108108
litellm_logging_obj=logging_obj,
109109
)
110+
110111
result = responses(
111112
**request_data,
112113
)
@@ -156,14 +157,17 @@ async def acompletion(
156157
logging_obj = validated_kwargs["logging_obj"]
157158
custom_llm_provider = validated_kwargs["custom_llm_provider"]
158159

159-
request_data = self.transformation_handler.transform_request(
160-
model=model,
161-
messages=messages,
162-
optional_params=optional_params,
163-
litellm_params=litellm_params,
164-
headers=headers,
165-
litellm_logging_obj=logging_obj,
166-
)
160+
try:
161+
request_data = self.transformation_handler.transform_request(
162+
model=model,
163+
messages=messages,
164+
optional_params=optional_params,
165+
litellm_params=litellm_params,
166+
headers=headers,
167+
litellm_logging_obj=logging_obj,
168+
)
169+
except Exception as e:
170+
raise e
167171

168172
result = await aresponses(
169173
**request_data,

litellm/completion_extras/litellm_responses_transformation/transformation.py

Lines changed: 31 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
"""
22
Handler for transforming /chat/completions api requests to litellm.responses requests
33
"""
4+
45
import json
56
from typing import (
67
TYPE_CHECKING,
@@ -62,7 +63,15 @@ def convert_chat_completion_messages_to_responses_api(
6263
if isinstance(content, str):
6364
instructions = content
6465
else:
65-
raise ValueError(f"System message must be a string: {content}")
66+
input_items.append(
67+
{
68+
"type": "message",
69+
"role": role,
70+
"content": self._convert_content_to_responses_format(
71+
content, role # type: ignore
72+
),
73+
}
74+
)
6675
elif role == "tool":
6776
# Convert tool message to function call output format
6877
input_items.append(
@@ -93,7 +102,9 @@ def convert_chat_completion_messages_to_responses_api(
93102
{
94103
"type": "message",
95104
"role": role,
96-
"content": self._convert_content_to_responses_format(content),
105+
"content": self._convert_content_to_responses_format(
106+
content, cast(str, role)
107+
),
97108
}
98109
)
99110

@@ -301,6 +312,14 @@ def get_model_response_iterator(
301312
streaming_response, sync_stream, json_mode
302313
)
303314

315+
def _convert_content_str_to_input_text(
316+
self, content: str, role: str
317+
) -> Dict[str, Any]:
318+
if role == "user" or role == "system":
319+
return {"type": "input_text", "text": content}
320+
else:
321+
return {"type": "output_text", "text": content}
322+
304323
def _convert_content_to_responses_format(
305324
self,
306325
content: Union[
@@ -309,14 +328,15 @@ def _convert_content_to_responses_format(
309328
Union["OpenAIMessageContentListBlock", "ChatCompletionThinkingBlock"]
310329
],
311330
],
331+
role: str,
312332
) -> List[Dict[str, Any]]:
313333
"""Convert chat completion content to responses API format"""
314334
verbose_logger.debug(
315335
f"Chat provider: Converting content to responses format - input type: {type(content)}"
316336
)
317337

318338
if isinstance(content, str):
319-
result = [{"type": "input_text", "text": content}]
339+
result = [self._convert_content_str_to_input_text(content, role)]
320340
verbose_logger.debug(f"Chat provider: String content -> {result}")
321341
return result
322342
elif isinstance(content, list):
@@ -326,14 +346,16 @@ def _convert_content_to_responses_format(
326346
f"Chat provider: Processing content item {i}: {type(item)} = {item}"
327347
)
328348
if isinstance(item, str):
329-
converted = {"type": "input_text", "text": item}
349+
converted = self._convert_content_str_to_input_text(item, role)
330350
result.append(converted)
331351
verbose_logger.debug(f"Chat provider: -> {converted}")
332352
elif isinstance(item, dict):
333353
# Handle multimodal content
334354
original_type = item.get("type")
335355
if original_type == "text":
336-
converted = {"type": "input_text", "text": item.get("text", "")}
356+
converted = self._convert_content_str_to_input_text(
357+
item.get("text", ""), role
358+
)
337359
result.append(converted)
338360
verbose_logger.debug(f"Chat provider: text -> {converted}")
339361
elif original_type == "image_url":
@@ -371,18 +393,17 @@ def _convert_content_to_responses_format(
371393
)
372394
else:
373395
# Default to input_text for unknown types
374-
converted = {
375-
"type": "input_text",
376-
"text": str(item.get("text", item)),
377-
}
396+
converted = self._convert_content_str_to_input_text(
397+
str(item.get("text", item)), role
398+
)
378399
result.append(converted)
379400
verbose_logger.debug(
380401
f"Chat provider: unknown({original_type}) -> {converted}"
381402
)
382403
verbose_logger.debug(f"Chat provider: Final converted content: {result}")
383404
return result
384405
else:
385-
result = [{"type": "input_text", "text": str(content)}]
406+
result = [self._convert_content_str_to_input_text(str(content), role)]
386407
verbose_logger.debug(f"Chat provider: Other content type -> {result}")
387408
return result
388409

litellm/proxy/_new_secret_config.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
11
model_list:
2+
- model_name: codex-mini
3+
litellm_params:
4+
model: codex-mini-latest
5+
api_key: os.environ/OPENAI_API_KEY
26
- model_name: "gpt-4o-mini-openai"
37
litellm_params:
48
model: gpt-4o-mini

litellm/responses/main.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,7 @@ def responses(
233233

234234
# get llm provider logic
235235
litellm_params = GenericLiteLLMParams(**kwargs)
236+
236237
## MOCK RESPONSE LOGIC
237238
if litellm_params.mock_response and isinstance(
238239
litellm_params.mock_response, str

0 commit comments

Comments
 (0)