perf(pam launch): batch ICE candidate flush and skip unneeded waits#1984
Merged
idimov-keeper merged 1 commit intoreleasefrom Apr 23, 2026
Merged
perf(pam launch): batch ICE candidate flush and skip unneeded waits#1984idimov-keeper merged 1 commit intoreleasefrom
idimov-keeper merged 1 commit intoreleasefrom
Conversation
Contributor
idimov-keeper
commented
Apr 23, 2026
- Batch buffered ICE candidates into one HTTP POST
- Skip WebSocket-ready wait + backend_delay in --no-trickle-ice mode
- Always emit PamConnectTiming checkpoints at DEBUG level
Three related tunnel-open improvements measured against the post-PR2
release (~10.3s grand total ``ready_for_prompt``): this brings
``pam launch`` down to ~9.3s on trickle mode (~1s saved end-to-end) and
~700ms off ``pam launch --no-trickle-ice``. Gateway changes are not
required — batch support has existed on the gateway's Python side since
1.7.0 (commit 60f594b3, released 2025-07-24), and trickle ICE itself
requires gateway >= 1.7.0 so any client that uses the default path is
guaranteed to be talking to a batch-capable gateway.
Also affects ``pam tunnel start`` (secondary beneficiary — same tunnel
setup path). ``pam tunnel start --no-trickle-ice`` is on an already-
optimized path in tunnel_helpers.py and is untouched.
1. Batch buffered ICE candidates into one HTTP POST
--------------------------------------------------
Every trickle-mode offer flushed the local candidate buffer by calling
``_send_ice_candidate_immediately`` in a loop — 7-8 candidates * ~500ms
serial round-trip each = ~3.5s of HTTP time after the offer was acked.
Add ``TunnelSignalHandler._send_ice_candidates_batch(candidates, tube_id)``
that sends all candidates in a single ``icecandidate`` action with
payload ``{"candidates": [c1, c2, ..., cN]}`` — the gateway already
iterates ``for candidate in ice_candidates`` in
``WebRTCSessionAction.add_ice_candidates_to_conversation_tunnel`` and
the per-candidate ``add_ice_candidate`` PyO3 binding is spawn-and-return,
so one batch costs the gateway ~the same as one candidate. Converts the
5 client-side flush sites: 3 in ``tunnel_helpers.py`` (SDP answer in
WS listener, state-change-to-connected, post-offer flush in
``start_rust_tunnel``) and 2 in ``pam_launch/terminal_connection.py``
(streaming offer branch, non-streaming SDP-answer handler).
``_send_ice_candidate_immediately`` is kept for the single-candidate
live path (post-offer candidate that arrives one at a time) at
``tunnel_helpers.py:1727`` — that one is already one HTTP call per
event.
Net webrtc-tunnel phase drop: 6189ms -> 2965ms (-3.2s). Most of that
moves to ``cli_session.webrtc_data_plane_connected`` (974ms -> 3355ms)
because the ICE pair selection / data-channel open was previously
hidden behind the serial HTTP loop and is now the exposed critical
path. End-to-end wall-clock saving: ~1s per launch.
2. Skip WebSocket-ready wait + backend_delay in --no-trickle-ice mode
--------------------------------------------------------------------
``_open_terminal_webrtc_tunnel`` was unconditionally blocking for
``tunnel_session.websocket_ready_event.wait()`` + ``WEBSOCKET_BACKEND_DELAY``
(~700ms total) before sending the offer. In non-trickle mode the SDP
answer arrives in the HTTP response body of the offer POST itself and
ICE candidates are carried inside the offer SDP — there is no
streamed conversation on the WebSocket to wait on. Wrap that block in
``if trickle_ice:`` and skip entirely in the non-trickle branch. The
listener keeps running in the background for async signaling
(disconnect / state changes); the main thread just does not block on
its readiness. Matches the pattern already used by
``tunnel_helpers.py::start_rust_tunnel`` for non-trickle mode.
Saves ~700ms on every ``pam launch --no-trickle-ice``.
3. Always emit PamConnectTiming checkpoints at DEBUG level
---------------------------------------------------------
Commander's ``debug --file=<path>`` installs a file log handler with an
explicit ``record.levelno != logging.INFO`` filter (see
``cli.py::setup_file_logging``) so user-facing ``logging.info(...)``
prints stay out of the debug log. PamConnectTiming previously bumped
its checkpoint / summary records to INFO when ``PAM_CONNECT_TIMING=1``
was set, which meant those records were being silently dropped by the
file-debug filter — timing lines never appeared in the captured log
when ``debug --file`` + ``PAM_CONNECT_TIMING=1`` were used together.
Always emit at DEBUG regardless of the env var. ``connect_timing_log_enabled()``
still gates whether to emit at all; only the chosen level changes.
DEBUG passes the file-debug filter cleanly.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
sk-keeper
pushed a commit
that referenced
this pull request
Apr 24, 2026
…1984) Three related tunnel-open improvements measured against the post-PR2 release (~10.3s grand total ``ready_for_prompt``): this brings ``pam launch`` down to ~9.3s on trickle mode (~1s saved end-to-end) and ~700ms off ``pam launch --no-trickle-ice``. Gateway changes are not required — batch support has existed on the gateway's Python side since 1.7.0 (commit 60f594b3, released 2025-07-24), and trickle ICE itself requires gateway >= 1.7.0 so any client that uses the default path is guaranteed to be talking to a batch-capable gateway. Also affects ``pam tunnel start`` (secondary beneficiary — same tunnel setup path). ``pam tunnel start --no-trickle-ice`` is on an already- optimized path in tunnel_helpers.py and is untouched. 1. Batch buffered ICE candidates into one HTTP POST -------------------------------------------------- Every trickle-mode offer flushed the local candidate buffer by calling ``_send_ice_candidate_immediately`` in a loop — 7-8 candidates * ~500ms serial round-trip each = ~3.5s of HTTP time after the offer was acked. Add ``TunnelSignalHandler._send_ice_candidates_batch(candidates, tube_id)`` that sends all candidates in a single ``icecandidate`` action with payload ``{"candidates": [c1, c2, ..., cN]}`` — the gateway already iterates ``for candidate in ice_candidates`` in ``WebRTCSessionAction.add_ice_candidates_to_conversation_tunnel`` and the per-candidate ``add_ice_candidate`` PyO3 binding is spawn-and-return, so one batch costs the gateway ~the same as one candidate. Converts the 5 client-side flush sites: 3 in ``tunnel_helpers.py`` (SDP answer in WS listener, state-change-to-connected, post-offer flush in ``start_rust_tunnel``) and 2 in ``pam_launch/terminal_connection.py`` (streaming offer branch, non-streaming SDP-answer handler). ``_send_ice_candidate_immediately`` is kept for the single-candidate live path (post-offer candidate that arrives one at a time) at ``tunnel_helpers.py:1727`` — that one is already one HTTP call per event. Net webrtc-tunnel phase drop: 6189ms -> 2965ms (-3.2s). Most of that moves to ``cli_session.webrtc_data_plane_connected`` (974ms -> 3355ms) because the ICE pair selection / data-channel open was previously hidden behind the serial HTTP loop and is now the exposed critical path. End-to-end wall-clock saving: ~1s per launch. 2. Skip WebSocket-ready wait + backend_delay in --no-trickle-ice mode -------------------------------------------------------------------- ``_open_terminal_webrtc_tunnel`` was unconditionally blocking for ``tunnel_session.websocket_ready_event.wait()`` + ``WEBSOCKET_BACKEND_DELAY`` (~700ms total) before sending the offer. In non-trickle mode the SDP answer arrives in the HTTP response body of the offer POST itself and ICE candidates are carried inside the offer SDP — there is no streamed conversation on the WebSocket to wait on. Wrap that block in ``if trickle_ice:`` and skip entirely in the non-trickle branch. The listener keeps running in the background for async signaling (disconnect / state changes); the main thread just does not block on its readiness. Matches the pattern already used by ``tunnel_helpers.py::start_rust_tunnel`` for non-trickle mode. Saves ~700ms on every ``pam launch --no-trickle-ice``. 3. Always emit PamConnectTiming checkpoints at DEBUG level --------------------------------------------------------- Commander's ``debug --file=<path>`` installs a file log handler with an explicit ``record.levelno != logging.INFO`` filter (see ``cli.py::setup_file_logging``) so user-facing ``logging.info(...)`` prints stay out of the debug log. PamConnectTiming previously bumped its checkpoint / summary records to INFO when ``PAM_CONNECT_TIMING=1`` was set, which meant those records were being silently dropped by the file-debug filter — timing lines never appeared in the captured log when ``debug --file`` + ``PAM_CONNECT_TIMING=1`` were used together. Always emit at DEBUG regardless of the env var. ``connect_timing_log_enabled()`` still gates whether to emit at all; only the chosen level changes. DEBUG passes the file-debug filter cleanly. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.