Skip to content

[internal copy of #29550] fix: passthrough endpoints duplicate logs#29598

Merged
mateo-berri merged 9 commits into
litellm_internal_stagingfrom
litellm__fix_passthrough_stream_dedup
Jun 3, 2026
Merged

[internal copy of #29550] fix: passthrough endpoints duplicate logs#29598
mateo-berri merged 9 commits into
litellm_internal_stagingfrom
litellm__fix_passthrough_stream_dedup

Conversation

@mateo-berri
Copy link
Copy Markdown
Collaborator

Automated copy of #29550 into litellm_internal_staging for pr-babysitter.

Original head: mubashir1osmani/litellm:litellm__fix_passthrough_stream_dedup @ 57311e4df3d2

mubashir1osmani and others added 4 commits June 2, 2026 17:51
Two bugs caused _PROXY_track_cost_callback to see stream=True +
complete_streaming_response=None on every streaming pass-through request,
making the dedup guard in dispatch_success_handlers permanently inactive:

1. pass_through_endpoints.py created the Logging object with stream=False
   for all requests. _is_assembled_stream_success short-circuits on
   self.stream is not True, so has_dispatched_final_stream_success was
   never set and any second dispatch went through unchecked.
   Fix: set logging_obj.stream = True after stream detection.

2. _create_anthropic_response_logging_payload set complete_streaming_response
   inside the try block after litellm.completion_cost(), so a pricing error
   caused an early return without setting it on model_call_details.
   Fix: set complete_streaming_response before the try block.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 3, 2026

Greptile Summary

This PR fixes duplicate success-callback dispatches for streaming Anthropic passthrough endpoints by ensuring logging_obj.stream is set to True before chunk_processor is invoked, and by recording complete_streaming_response on model_call_details early so the dedup guard in dispatch_success_handlers can fire even when cost calculation raises.

  • pass_through_endpoints.py: sets logging_obj.stream = True in both the explicit-stream=True branch and the SSE-fallback branch, activating _is_assembled_stream_success's dedup guard.
  • anthropic_passthrough_logging_handler.py: assigns complete_streaming_response to model_call_details before entering the try block so the key is present even when pricing fails — but does this unconditionally for all calls, not only streaming ones.
  • Four new regression tests document the pre/post-fix behaviour of the guard; the existing unit test mock is updated to initialise model_call_details as a dict.

Confidence Score: 4/5

Safe to merge for the streaming dedup fix; the non-streaming Anthropic passthrough path now silently drops sync callbacks (langfuse, s3, etc.) as a side-effect.

_create_anthropic_response_logging_payload is called for both streaming and non-streaming Anthropic responses. The unconditional complete_streaming_response write causes success_handler to return early (line 2127 in litellm_logging.py) on every non-streaming call, so any sync callback registered via success_callback = ["langfuse"] or "s3" will stop firing for non-streaming Anthropic passthrough requests. The intended fix is scoped to streaming calls only.

litellm/proxy/pass_through_endpoints/llm_provider_handlers/anthropic_passthrough_logging_handler.py — the early complete_streaming_response write needs a logging_obj.stream is True guard.

Important Files Changed

Filename Overview
litellm/proxy/pass_through_endpoints/llm_provider_handlers/anthropic_passthrough_logging_handler.py Adds early complete_streaming_response assignment in _create_anthropic_response_logging_payload — applied unconditionally to both streaming and non-streaming calls, which will suppress sync callbacks (langfuse, s3) for non-streaming Anthropic passthrough requests.
litellm/proxy/pass_through_endpoints/pass_through_endpoints.py Sets logging_obj.stream = True in both the explicit-stream branch and the SSE-fallback branch; straightforward and correct fix for the dedup guard precondition.
tests/test_litellm/proxy/pass_through_endpoints/llm_provider_handlers/test_anthropic_passthrough_logging_handler.py Adds TestStreamFalseDeduplication class with four focused regression tests covering the dedup guard behaviour pre/post fix; tests are mock-only and well-structured.
tests/test_litellm/proxy/pass_through_endpoints/test_pass_through_endpoints.py Adds two integration-level regression tests that confirm logging_obj.stream and model_call_details["stream"] are set correctly for both explicit-stream and SSE-fallback code paths.
tests/pass_through_unit_tests/test_unit_test_anthropic_pass_through.py Adds model_call_details = {} initialisation to the existing mock so the new _create_anthropic_response_logging_payload write doesn't raise an AttributeError; minimal and correct.

Reviews (6): Last reviewed commit: "fix(pass_through): gate complete_streami..." | Re-trigger Greptile

…s dict

The anthropic passthrough logging payload now records the assembled
response on model_call_details before cost calculation, which requires
model_call_details to support item assignment. In production it is always
a dict; the existing unit test stubbed the logging object with a bare Mock
whose attribute is not subscriptable, so the new assignment raised
TypeError. Use a real dict to match the production logging object.
@mateo-berri
Copy link
Copy Markdown
Collaborator Author

@greptileai

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@mateo-berri
Copy link
Copy Markdown
Collaborator Author

@greptileai

The streaming branch of pass_through_request that marks the logging object
as streaming (logging_obj.stream and model_call_details["stream"]) had no
unit coverage, so the patch coverage gate flagged it. Add a regression test
that drives a streaming pass-through request through pass_through_request and
asserts the logging object is flagged as a stream before dispatch.
@mateo-berri
Copy link
Copy Markdown
Collaborator Author

@greptileai

@veria-ai
Copy link
Copy Markdown
Contributor

veria-ai Bot commented Jun 3, 2026

PR overview

All previously flagged issues have been addressed. No open security concerns remain on this pull request.

Security review

No open security issues remain on this pull request.

Fixed/addressed: 1 · PR risk: 0/10

The auto-detected streaming branch of pass_through_request (when a request
that was not flagged as streaming returns a text/event-stream response) sets
logging_obj.stream and model_call_details["stream"] but had no unit coverage,
so the codecov patch gate failed at 60%. Drive a non-streaming pass-through
request whose upstream response is SSE through pass_through_request and assert
the logging object is flagged as a stream before dispatch.
@mateo-berri
Copy link
Copy Markdown
Collaborator Author

@greptileai

perform_redaction only scrubs complete_streaming_response when
model_call_details["stream"] is True. Setting it unconditionally for
non-streaming Anthropic pass-through responses left the assembled
response unredacted in model_call_details, which is handed to logging
callbacks as kwargs when message logging is disabled. Only record it for
actual streaming responses so redaction always applies.
@mateo-berri
Copy link
Copy Markdown
Collaborator Author

@greptileai

Comment on lines +117 to +122
# Only record complete_streaming_response for actual streaming responses.
# perform_redaction scrubs this field only when stream is True, so setting
# it on a non-streaming response would bypass message redaction.
if logging_obj.model_call_details.get("stream") is True:
logging_obj.model_call_details["complete_streaming_response"] = (
litellm_model_response
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 complete_streaming_response set unconditionally, breaks sync callbacks for non-streaming calls

_create_anthropic_response_logging_payload is called for both streaming and non-streaming Anthropic passthrough responses. Setting complete_streaming_response in model_call_details unconditionally causes success_handler to hit its early-return guard at line 2127 (if "complete_streaming_response" in self.model_call_details: return) even for non-streaming calls. Any user with sync callbacks configured (e.g. success_callback = ["langfuse"] or "s3") would have those callbacks silently skipped for all non-streaming Anthropic passthrough requests, because the executor-submitted success_handler exits before iterating the callback list. The guard at line 1635 (if self.stream is not True: return False) prevents _is_assembled_stream_success from misfiring, but success_handler's early-return at 2127 is unconditional. The assignment should be guarded so it only fires for streaming calls.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already addressed on HEAD in a5b0053. The complete_streaming_response write in _create_anthropic_response_logging_payload is no longer unconditional; it is now gated on logging_obj.model_call_details.get("stream") is True (line 120). For non-streaming Anthropic pass-through requests model_call_details["stream"] is never set to True (only the two streaming branches in pass_through_endpoints.py set it), so the key is never written for non-streaming calls, success_handler's early-return at line 2127 is not taken, and sync callbacks such as langfuse and s3 keep firing. The guard scopes the assignment to streaming calls exactly as suggested, so there is no regression for non-streaming requests

@mateo-berri mateo-berri requested a review from yuneng-berri June 3, 2026 19:03
@mateo-berri mateo-berri merged commit 2bbdbfa into litellm_internal_staging Jun 3, 2026
145 of 146 checks passed
@mateo-berri mateo-berri deleted the litellm__fix_passthrough_stream_dedup branch June 3, 2026 19:13
stvnksslr pushed a commit to stvnksslr/litellm that referenced this pull request Jun 5, 2026
* fix duplicate cost callbacks for anthropic streaming pass-through

Two bugs caused _PROXY_track_cost_callback to see stream=True +
complete_streaming_response=None on every streaming pass-through request,
making the dedup guard in dispatch_success_handlers permanently inactive:

1. pass_through_endpoints.py created the Logging object with stream=False
   for all requests. _is_assembled_stream_success short-circuits on
   self.stream is not True, so has_dispatched_final_stream_success was
   never set and any second dispatch went through unchecked.
   Fix: set logging_obj.stream = True after stream detection.

2. _create_anthropic_response_logging_payload set complete_streaming_response
   inside the try block after litellm.completion_cost(), so a pricing error
   caused an early return without setting it on model_call_details.
   Fix: set complete_streaming_response before the try block.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix stream

* add stream to logging obj

* test(pass_through): give mock logging object a real model_call_details dict

The anthropic passthrough logging payload now records the assembled
response on model_call_details before cost calculation, which requires
model_call_details to support item assignment. In production it is always
a dict; the existing unit test stubbed the logging object with a bare Mock
whose attribute is not subscriptable, so the new assignment raised
TypeError. Use a real dict to match the production logging object.

* test(pass_through): cover streaming logging-obj stream flag

The streaming branch of pass_through_request that marks the logging object
as streaming (logging_obj.stream and model_call_details["stream"]) had no
unit coverage, so the patch coverage gate flagged it. Add a regression test
that drives a streaming pass-through request through pass_through_request and
asserts the logging object is flagged as a stream before dispatch.

* test(pass_through): cover SSE-response stream flag fallback branch

The auto-detected streaming branch of pass_through_request (when a request
that was not flagged as streaming returns a text/event-stream response) sets
logging_obj.stream and model_call_details["stream"] but had no unit coverage,
so the codecov patch gate failed at 60%. Drive a non-streaming pass-through
request whose upstream response is SSE through pass_through_request and assert
the logging object is flagged as a stream before dispatch.

* fix(pass_through): gate complete_streaming_response on stream flag

perform_redaction only scrubs complete_streaming_response when
model_call_details["stream"] is True. Setting it unconditionally for
non-streaming Anthropic pass-through responses left the assembled
response unredacted in model_call_details, which is handed to logging
callbacks as kwargs when message logging is disabled. Only record it for
actual streaming responses so redaction always applies.

---------

Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
(cherry picked from commit 2bbdbfa)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants