[internal copy of #29550] fix: passthrough endpoints duplicate logs#29598
Conversation
Two bugs caused _PROXY_track_cost_callback to see stream=True + complete_streaming_response=None on every streaming pass-through request, making the dedup guard in dispatch_success_handlers permanently inactive: 1. pass_through_endpoints.py created the Logging object with stream=False for all requests. _is_assembled_stream_success short-circuits on self.stream is not True, so has_dispatched_final_stream_success was never set and any second dispatch went through unchecked. Fix: set logging_obj.stream = True after stream detection. 2. _create_anthropic_response_logging_payload set complete_streaming_response inside the try block after litellm.completion_cost(), so a pricing error caused an early return without setting it on model_call_details. Fix: set complete_streaming_response before the try block. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Greptile SummaryThis PR fixes duplicate success-callback dispatches for streaming Anthropic passthrough endpoints by ensuring
Confidence Score: 4/5Safe to merge for the streaming dedup fix; the non-streaming Anthropic passthrough path now silently drops sync callbacks (langfuse, s3, etc.) as a side-effect.
litellm/proxy/pass_through_endpoints/llm_provider_handlers/anthropic_passthrough_logging_handler.py — the early
|
| Filename | Overview |
|---|---|
| litellm/proxy/pass_through_endpoints/llm_provider_handlers/anthropic_passthrough_logging_handler.py | Adds early complete_streaming_response assignment in _create_anthropic_response_logging_payload — applied unconditionally to both streaming and non-streaming calls, which will suppress sync callbacks (langfuse, s3) for non-streaming Anthropic passthrough requests. |
| litellm/proxy/pass_through_endpoints/pass_through_endpoints.py | Sets logging_obj.stream = True in both the explicit-stream branch and the SSE-fallback branch; straightforward and correct fix for the dedup guard precondition. |
| tests/test_litellm/proxy/pass_through_endpoints/llm_provider_handlers/test_anthropic_passthrough_logging_handler.py | Adds TestStreamFalseDeduplication class with four focused regression tests covering the dedup guard behaviour pre/post fix; tests are mock-only and well-structured. |
| tests/test_litellm/proxy/pass_through_endpoints/test_pass_through_endpoints.py | Adds two integration-level regression tests that confirm logging_obj.stream and model_call_details["stream"] are set correctly for both explicit-stream and SSE-fallback code paths. |
| tests/pass_through_unit_tests/test_unit_test_anthropic_pass_through.py | Adds model_call_details = {} initialisation to the existing mock so the new _create_anthropic_response_logging_payload write doesn't raise an AttributeError; minimal and correct. |
Reviews (6): Last reviewed commit: "fix(pass_through): gate complete_streami..." | Re-trigger Greptile
…s dict The anthropic passthrough logging payload now records the assembled response on model_call_details before cost calculation, which requires model_call_details to support item assignment. In production it is always a dict; the existing unit test stubbed the logging object with a bare Mock whose attribute is not subscriptable, so the new assignment raised TypeError. Use a real dict to match the production logging object.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
The streaming branch of pass_through_request that marks the logging object as streaming (logging_obj.stream and model_call_details["stream"]) had no unit coverage, so the patch coverage gate flagged it. Add a regression test that drives a streaming pass-through request through pass_through_request and asserts the logging object is flagged as a stream before dispatch.
PR overviewAll previously flagged issues have been addressed. No open security concerns remain on this pull request. Security reviewNo open security issues remain on this pull request. Fixed/addressed: 1 · PR risk: 0/10 |
The auto-detected streaming branch of pass_through_request (when a request that was not flagged as streaming returns a text/event-stream response) sets logging_obj.stream and model_call_details["stream"] but had no unit coverage, so the codecov patch gate failed at 60%. Drive a non-streaming pass-through request whose upstream response is SSE through pass_through_request and assert the logging object is flagged as a stream before dispatch.
perform_redaction only scrubs complete_streaming_response when model_call_details["stream"] is True. Setting it unconditionally for non-streaming Anthropic pass-through responses left the assembled response unredacted in model_call_details, which is handed to logging callbacks as kwargs when message logging is disabled. Only record it for actual streaming responses so redaction always applies.
| # Only record complete_streaming_response for actual streaming responses. | ||
| # perform_redaction scrubs this field only when stream is True, so setting | ||
| # it on a non-streaming response would bypass message redaction. | ||
| if logging_obj.model_call_details.get("stream") is True: | ||
| logging_obj.model_call_details["complete_streaming_response"] = ( | ||
| litellm_model_response |
There was a problem hiding this comment.
complete_streaming_response set unconditionally, breaks sync callbacks for non-streaming calls
_create_anthropic_response_logging_payload is called for both streaming and non-streaming Anthropic passthrough responses. Setting complete_streaming_response in model_call_details unconditionally causes success_handler to hit its early-return guard at line 2127 (if "complete_streaming_response" in self.model_call_details: return) even for non-streaming calls. Any user with sync callbacks configured (e.g. success_callback = ["langfuse"] or "s3") would have those callbacks silently skipped for all non-streaming Anthropic passthrough requests, because the executor-submitted success_handler exits before iterating the callback list. The guard at line 1635 (if self.stream is not True: return False) prevents _is_assembled_stream_success from misfiring, but success_handler's early-return at 2127 is unconditional. The assignment should be guarded so it only fires for streaming calls.
There was a problem hiding this comment.
This is already addressed on HEAD in a5b0053. The complete_streaming_response write in _create_anthropic_response_logging_payload is no longer unconditional; it is now gated on logging_obj.model_call_details.get("stream") is True (line 120). For non-streaming Anthropic pass-through requests model_call_details["stream"] is never set to True (only the two streaming branches in pass_through_endpoints.py set it), so the key is never written for non-streaming calls, success_handler's early-return at line 2127 is not taken, and sync callbacks such as langfuse and s3 keep firing. The guard scopes the assignment to streaming calls exactly as suggested, so there is no regression for non-streaming requests
2bbdbfa
into
litellm_internal_staging
* fix duplicate cost callbacks for anthropic streaming pass-through Two bugs caused _PROXY_track_cost_callback to see stream=True + complete_streaming_response=None on every streaming pass-through request, making the dedup guard in dispatch_success_handlers permanently inactive: 1. pass_through_endpoints.py created the Logging object with stream=False for all requests. _is_assembled_stream_success short-circuits on self.stream is not True, so has_dispatched_final_stream_success was never set and any second dispatch went through unchecked. Fix: set logging_obj.stream = True after stream detection. 2. _create_anthropic_response_logging_payload set complete_streaming_response inside the try block after litellm.completion_cost(), so a pricing error caused an early return without setting it on model_call_details. Fix: set complete_streaming_response before the try block. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix stream * add stream to logging obj * test(pass_through): give mock logging object a real model_call_details dict The anthropic passthrough logging payload now records the assembled response on model_call_details before cost calculation, which requires model_call_details to support item assignment. In production it is always a dict; the existing unit test stubbed the logging object with a bare Mock whose attribute is not subscriptable, so the new assignment raised TypeError. Use a real dict to match the production logging object. * test(pass_through): cover streaming logging-obj stream flag The streaming branch of pass_through_request that marks the logging object as streaming (logging_obj.stream and model_call_details["stream"]) had no unit coverage, so the patch coverage gate flagged it. Add a regression test that drives a streaming pass-through request through pass_through_request and asserts the logging object is flagged as a stream before dispatch. * test(pass_through): cover SSE-response stream flag fallback branch The auto-detected streaming branch of pass_through_request (when a request that was not flagged as streaming returns a text/event-stream response) sets logging_obj.stream and model_call_details["stream"] but had no unit coverage, so the codecov patch gate failed at 60%. Drive a non-streaming pass-through request whose upstream response is SSE through pass_through_request and assert the logging object is flagged as a stream before dispatch. * fix(pass_through): gate complete_streaming_response on stream flag perform_redaction only scrubs complete_streaming_response when model_call_details["stream"] is True. Setting it unconditionally for non-streaming Anthropic pass-through responses left the assembled response unredacted in model_call_details, which is handed to logging callbacks as kwargs when message logging is disabled. Only record it for actual streaming responses so redaction always applies. --------- Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> (cherry picked from commit 2bbdbfa)
Automated copy of #29550 into
litellm_internal_stagingfor pr-babysitter.Original head:
mubashir1osmani/litellm:litellm__fix_passthrough_stream_dedup@57311e4df3d2