Release note updates + Responses API Bridge improvements (#11740)

krrishdholakia · web-flow · commit bb256c6d83d9 · 2025-06-15T01:13:10.000-07:00
* docs: track which items need docs

* docs(anthropic.md): add tool_choice="none" to docs

* docs: add docs for new anthropic + perplexity features

* docs: cleanup mistral reasoning docs

* docs: add links to docs

* docs(index.md): update docs

* docs: refactor to add a new 'integrations' tab to docs

* refactor(docs/): create separate tab for integrations

make it easier to highlight new integrations

* docs: sort sidebar

* docs: update

* feat: working claude code with openai codex mini

* docs: add responses api to docs

* feat(index.md): update docs

* fix: fix linting error
diff --git a/docs/my-website/docs/tutorials/claude_responses_api.md b/docs/my-website/docs/tutorials/claude_responses_api.md
@@ -0,0 +1,62 @@
+import Image from '@theme/IdealImage';
+
+# Call Responses API models on Claude Code
+
+This tutorial shows how to call the Responses API models like `codex-mini` and `o3-pro` from the Claude Code endpoint on LiteLLM.
+
+
+Pre-requisites:
+
+- [Claude Code](https://docs.anthropic.com/en/docs/claude-code/overview) installed
+- LiteLLM v1.72.6-stable or higher
+
+
+### 1. Setup config.yaml
+
+```yaml
+model_list:
+    - model_name: codex-mini    
+      litellm_params:
+        model: codex-mini
+        api_key: sk-proj-1234567890
+        api_base: https://api.openai.com/v1
+```
+
+### 2. Start proxy
+
+```bash
+litellm --config /path/to/config.yaml
+
+# RUNNING on http://0.0.0.0:4000
+```
+
+### 3. Test it! (Curl)
+
+```bash
+curl -X POST http://0.0.0.0:4000/v1/messages \
+-H "Authorization: Bearer sk-proj-1234567890" \
+-H "Content-Type: application/json" \
+-d '{
+    "model": "codex-mini",
+    "messages": [{"role": "user", "content": "What is the capital of France?"}]
+}'
+```
+
+### 4. Test it! (Claude Code)
+
+- Setup environment variables
+
+```bash
+export ANTHROPIC_API_BASE="http://0.0.0.0:4000"
+export ANTHROPIC_API_KEY="sk-1234" # replace with your LiteLLM key
+```
+
+- Start a Claude Code session
+
+```bash
+claude --model codex-mini-latest
+```
+
+- Send a message
+
+<Image img={require('../../img/release_notes/claude_code_demo.png')} style={{ width: '500px', height: 'auto' }} />
diff --git a/docs/my-website/img/release_notes/claude_code_demo.png b/docs/my-website/img/release_notes/claude_code_demo.png
diff --git a/docs/my-website/img/release_notes/codex_on_claude_code.jpg b/docs/my-website/img/release_notes/codex_on_claude_code.jpg
diff --git a/docs/my-website/img/release_notes/mcp_permissions.png b/docs/my-website/img/release_notes/mcp_permissions.png
diff --git a/docs/my-website/release_notes/v1.72.6-stable/index.md b/docs/my-website/release_notes/v1.72.6-stable/index.md
@@ -31,17 +31,52 @@ This version is not out yet.
 
 ## TLDR
 
-* **Why Upgrade**
 
+* **Why Upgrade**
+    - Codex-mini on Claude Code: You can now use `codex-mini` (OpenAI’s code assistant model) via Claude Code.
+    - MCP Permissions Management: Manage permissions for MCP Servers by Keys, Teams, Organizations (entities) on LiteLLM.
+    - UI: Turn on/off auto refresh on logs view. 
+    - Rate Limiting: Support for output token-only rate limiting.  
 * **Who Should Read**
+    - Teams using `/v1/messages` API (Claude Code)
+    - Teams using **MCP**
+    - Teams giving access to self-hosted models and setting rate limits
 * **Risk of Upgrade**
-
+    - **Low**
+        - No major changes to existing functionality or package updates.
 
 
 ---
 
 ## Key Highlights
 
+
+### MCP Permissions Management
+
+This release brings support for managing permissions for MCP Servers by Keys, Teams, Organizations (entities) on LiteLLM. When a MCP client attempts to list tools, LiteLLM will only return the tools the entity has permissions to access.
+
+This is great for use cases that require access to restricted data (e.g Jira MCP) that you don't want everyone to use.
+
+For Proxy Admins, this enables centralized management of all MCP Servers with access control. For developers, this means you'll only see the MCP tools assigned to you.
+
+
+<Image img={require('../../img/release_notes/mcp_permissions.png')}/>
+
+### Codex-mini on Claude Code
+
+This release brings support for calling `codex-mini` (OpenAI’s code assistant model) via Claude Code.
+
+This is done by LiteLLM enabling any Responses API model (including `o3-pro`) to be called via `/chat/completions` and `/v1/messages` endpoints. This includes:
+
+- Streaming calls
+- Non-streaming calls
+- Cost Tracking on success + failure for Responses API models
+
+Here's how to use it [today](../../docs/tutorials/claude_responses_api)
+
+
+<Image img={require('../../img/release_notes/codex_on_claude_code.jpg')} />
+
 ---
 
 
@@ -202,7 +237,7 @@ This version is not out yet.
     - Make all commands show server URL - [PR](https://github.com/BerriAI/litellm/pull/10801)
 - **Unicorn**
     - Allow setting keep alive timeout - [PR](https://github.com/BerriAI/litellm/pull/11594)
-- **Experimental Rate Limiting v2**
+- **Experimental Rate Limiting v2** (enable via `EXPERIMENTAL_MULTI_INSTANCE_RATE_LIMITING="True"`)
     - Support specifying rate limit by output_tokens only - [PR](https://github.com/BerriAI/litellm/pull/11646)
     - Decrement parallel requests on call failure - [PR](https://github.com/BerriAI/litellm/pull/11646)
     - In-memory only rate limiting support - [PR](https://github.com/BerriAI/litellm/pull/11646)
diff --git a/docs/my-website/sidebars.js b/docs/my-website/sidebars.js
@@ -507,6 +507,7 @@ const sidebars = {
         "tutorials/tag_management",
         'tutorials/litellm_proxy_aporia',
         "tutorials/gemini_realtime_with_audio",
+        "tutorials/claude_responses_api",
         {
           type: "category",
           label: "LiteLLM Python SDK Tutorials",
diff --git a/litellm/completion_extras/litellm_responses_transformation/handler.py b/litellm/completion_extras/litellm_responses_transformation/handler.py
@@ -107,6 +107,7 @@ def completion(
             headers=headers,
             litellm_logging_obj=logging_obj,
         )
+
         result = responses(
             **request_data,
         )
@@ -156,14 +157,17 @@ async def acompletion(
         logging_obj = validated_kwargs["logging_obj"]
         custom_llm_provider = validated_kwargs["custom_llm_provider"]
 
-        request_data = self.transformation_handler.transform_request(
-            model=model,
-            messages=messages,
-            optional_params=optional_params,
-            litellm_params=litellm_params,
-            headers=headers,
-            litellm_logging_obj=logging_obj,
-        )
+        try:
+            request_data = self.transformation_handler.transform_request(
+                model=model,
+                messages=messages,
+                optional_params=optional_params,
+                litellm_params=litellm_params,
+                headers=headers,
+                litellm_logging_obj=logging_obj,
+            )
+        except Exception as e:
+            raise e
 
         result = await aresponses(
             **request_data,
diff --git a/litellm/completion_extras/litellm_responses_transformation/transformation.py b/litellm/completion_extras/litellm_responses_transformation/transformation.py
@@ -1,6 +1,7 @@
 """
 Handler for transforming /chat/completions api requests to litellm.responses requests
 """
+
 import json
 from typing import (
     TYPE_CHECKING,
@@ -62,7 +63,15 @@ def convert_chat_completion_messages_to_responses_api(
                 if isinstance(content, str):
                     instructions = content
                 else:
-                    raise ValueError(f"System message must be a string: {content}")
+                    input_items.append(
+                        {
+                            "type": "message",
+                            "role": role,
+                            "content": self._convert_content_to_responses_format(
+                                content, role  # type: ignore
+                            ),
+                        }
+                    )
             elif role == "tool":
                 # Convert tool message to function call output format
                 input_items.append(
@@ -93,7 +102,9 @@ def convert_chat_completion_messages_to_responses_api(
                     {
                         "type": "message",
                         "role": role,
-                        "content": self._convert_content_to_responses_format(content),
+                        "content": self._convert_content_to_responses_format(
+                            content, cast(str, role)
+                        ),
                     }
                 )
 
@@ -301,6 +312,14 @@ def get_model_response_iterator(
             streaming_response, sync_stream, json_mode
         )
 
+    def _convert_content_str_to_input_text(
+        self, content: str, role: str
+    ) -> Dict[str, Any]:
+        if role == "user" or role == "system":
+            return {"type": "input_text", "text": content}
+        else:
+            return {"type": "output_text", "text": content}
+
     def _convert_content_to_responses_format(
         self,
         content: Union[
@@ -309,14 +328,15 @@ def _convert_content_to_responses_format(
                 Union["OpenAIMessageContentListBlock", "ChatCompletionThinkingBlock"]
             ],
         ],
+        role: str,
     ) -> List[Dict[str, Any]]:
         """Convert chat completion content to responses API format"""
         verbose_logger.debug(
             f"Chat provider: Converting content to responses format - input type: {type(content)}"
         )
 
         if isinstance(content, str):
-            result = [{"type": "input_text", "text": content}]
+            result = [self._convert_content_str_to_input_text(content, role)]
             verbose_logger.debug(f"Chat provider: String content -> {result}")
             return result
         elif isinstance(content, list):
@@ -326,14 +346,16 @@ def _convert_content_to_responses_format(
                     f"Chat provider: Processing content item {i}: {type(item)} = {item}"
                 )
                 if isinstance(item, str):
-                    converted = {"type": "input_text", "text": item}
+                    converted = self._convert_content_str_to_input_text(item, role)
                     result.append(converted)
                     verbose_logger.debug(f"Chat provider:   -> {converted}")
                 elif isinstance(item, dict):
                     # Handle multimodal content
                     original_type = item.get("type")
                     if original_type == "text":
-                        converted = {"type": "input_text", "text": item.get("text", "")}
+                        converted = self._convert_content_str_to_input_text(
+                            item.get("text", ""), role
+                        )
                         result.append(converted)
                         verbose_logger.debug(f"Chat provider:   text -> {converted}")
                     elif original_type == "image_url":
@@ -371,18 +393,17 @@ def _convert_content_to_responses_format(
                             )
                         else:
                             # Default to input_text for unknown types
-                            converted = {
-                                "type": "input_text",
-                                "text": str(item.get("text", item)),
-                            }
+                            converted = self._convert_content_str_to_input_text(
+                                str(item.get("text", item)), role
+                            )
                             result.append(converted)
                             verbose_logger.debug(
                                 f"Chat provider:   unknown({original_type}) -> {converted}"
                             )
             verbose_logger.debug(f"Chat provider: Final converted content: {result}")
             return result
         else:
-            result = [{"type": "input_text", "text": str(content)}]
+            result = [self._convert_content_str_to_input_text(str(content), role)]
             verbose_logger.debug(f"Chat provider: Other content type -> {result}")
             return result
 
diff --git a/litellm/proxy/_new_secret_config.yaml b/litellm/proxy/_new_secret_config.yaml
@@ -1,4 +1,8 @@
 model_list:
+  - model_name: codex-mini    
+    litellm_params:
+      model: codex-mini-latest
+      api_key: os.environ/OPENAI_API_KEY
   - model_name: "gpt-4o-mini-openai"
     litellm_params:
       model: gpt-4o-mini
diff --git a/litellm/responses/main.py b/litellm/responses/main.py
@@ -233,6 +233,7 @@ def responses(
 
         # get llm provider logic
         litellm_params = GenericLiteLLMParams(**kwargs)
+
         ## MOCK RESPONSE LOGIC
         if litellm_params.mock_response and isinstance(
             litellm_params.mock_response, str

Original file line number	Diff line number	Diff line change
`@@ -507,6 +507,7 @@ const sidebars = {`
`507`	`507`	`"tutorials/tag_management",`
`508`	`508`	`'tutorials/litellm_proxy_aporia',`
`509`	`509`	`"tutorials/gemini_realtime_with_audio",`
	`510`	`+ "tutorials/claude_responses_api",`
`510`	`511`	`{`
`511`	`512`	`type: "category",`
`512`	`513`	`label: "LiteLLM Python SDK Tutorials",`