Skip to content

Handle TransferEncodingError as graceful network disconnect in media input loop#807

Open
livepeer-tessa wants to merge 2 commits intomainfrom
fix/transfer-encoding-error-graceful-disconnect
Open

Handle TransferEncodingError as graceful network disconnect in media input loop#807
livepeer-tessa wants to merge 2 commits intomainfrom
fix/transfer-encoding-error-graceful-disconnect

Conversation

@livepeer-tessa
Copy link
Copy Markdown
Contributor

Fixes #805

What

When an orchestrator goes down or is restarted mid-session, aiohttp raises ClientPayloadError (specifically TransferEncodingError: 400) on open trickle connections. Previously:

  1. _media_input_loop caught this in the generic except Exception and logged ERROR - Media input loop failed: ...
  2. The control channel (via JSONLReader in livepeer-python-gateway) also errored, logging ERROR - Control channel subscription error: ...
  3. Teardown then attempted trickle DELETE calls to the now-dead orchestrator, generating multiple ERROR - Trickle DELETE exception logs

This is pure network-level disconnect noise — the session ends regardless, we were just logging it wrong and making the stack trace look like a bug.

Changes

src/scope/cloud/livepeer_app.py

  • Catch aiohttp.ClientPayloadError in _media_input_loop before the generic handler → log at WARNING instead of ERROR
  • Catch aiohttp.ClientConnectorError during media_output.close() → log at DEBUG (orchestrator already gone)

Companion PR in livepeer-python-gateway: livepeer/livepeer-python-gateway#2

  • channel_reader.py: Same treatment for ChannelReader and JSONLReader — clean return instead of LivepeerGatewayError
  • trickle_publisher.py: Demote ClientConnectorError in _run_delete from ERROR to DEBUG

Before / After

Before:

ERROR - Media input loop failed: Response payload is not completed: <TransferEncodingError: 400, ...>
ERROR - Control channel subscription error: Trickle JSONL subscription error: ClientPayloadError: ...
ERROR - Trickle DELETE exception url=http://34.169.235.70:8935/...

After:

WARNING - Media input loop: orchestrator disconnected mid-stream: ...
WARNING - Trickle JSONL channel disconnected (network): TransferEncodingError: ...
DEBUG   - Trickle DELETE: orchestrator unreachable (suppressed) url=...

Related: #771 (same pattern, EOFError on clean disconnect)

livepeer-robot added 2 commits April 2, 2026 06:25
- graph_executor.py: deduplicate fan-in stream edges before queue
  construction, preferring pipeline-node sources over source-node
  sources for the same input port; raise a clearer error (with edge
  details) when two pipeline nodes both target the same port
- graph_executor.py: in _validate_edge_ports, include VACE ports for
  VACEEnabledPipeline instances regardless of static config_class.inputs;
  gracefully handle PipelineNotAvailableException (pipeline reloading)
  by logging a warning and skipping port checks for that node
- frame_processor.py: in _setup_graph_from_pipeline_ids, only add the
  last VACEEnabledPipeline to vace_input_video_ids (not all of them),
  preventing fan-in when a preprocessor like yolo_mask is also a
  VACEEnabledPipeline

Fixes #804

Signed-off-by: livepeer-robot <robot@livepeer.org>
…input loop

When an orchestrator truncates the trickle connection mid-stream, aiohttp
raises ClientPayloadError (subclass TransferEncodingError). Previously this
was caught by the broad 'except Exception' handler and logged at ERROR level,
causing noisy logs and unclean teardown.

- Catch aiohttp.ClientPayloadError before the generic handler; log at WARNING
  and let the finally block run the normal media_output.close() path
- Suppress ClientConnectorError during media_output.close() (logged at DEBUG)
  when the orchestrator is already unreachable at teardown time

The deeper fix (in livepeer-python-gateway channel_reader.py / trickle_publisher.py)
ensures the control channel subscription also terminates cleanly without raising.
See: livepeer/livepeer-python-gateway#2

Fixes: #805
Related: #771
Signed-off-by: livepeer-robot <robot@livepeer.org>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 2, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5530defd-fc5c-4422-a377-0896062cf160

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/transfer-encoding-error-graceful-disconnect

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

🚀 fal.ai Preview Deployment

App ID daydream/scope-pr-807--preview
WebSocket wss://fal.run/daydream/scope-pr-807--preview/ws
Commit 75aa239

Livepeer Runner

App ID daydream/scope-livepeer-pr-807--preview
WebSocket wss://fal.run/daydream/scope-livepeer-pr-807--preview/ws
Auth private

Testing

Connect to this preview deployment by running this on your branch:

uv run build && SCOPE_CLOUD_APP_ID="daydream/scope-pr-807--preview/ws" uv run daydream-scope

Livepeer mode:

SCOPE_CLOUD_MODE=livepeer SCOPE_CLOUD_APP_ID="daydream/scope-livepeer-pr-807--preview/ws" uv run daydream-scope

🧪 E2E tests will run automatically against this deployment.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

❌ E2E Tests failed

Status failed
fal App daydream/scope-pr-807--preview
Run View logs

Test Artifacts

Check the workflow run for screenshots, traces, and failure details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[fal.ai/livepeer-staging] Media input loop fails with TransferEncodingError 400 — 'Not enough data to satisfy transfer length header'

1 participant