perf(pam launch): batch ICE candidate flush and skip unneeded waits by idimov-keeper · Pull Request #1984 · Keeper-Security/Commander

idimov-keeper · 2026-04-23T01:32:09Z

Batch buffered ICE candidates into one HTTP POST
Skip WebSocket-ready wait + backend_delay in --no-trickle-ice mode
Always emit PamConnectTiming checkpoints at DEBUG level

Three related tunnel-open improvements measured against the post-PR2 release (~10.3s grand total ``ready_for_prompt``): this brings ``pam launch`` down to ~9.3s on trickle mode (~1s saved end-to-end) and ~700ms off ``pam launch --no-trickle-ice``. Gateway changes are not required — batch support has existed on the gateway's Python side since 1.7.0 (commit 60f594b3, released 2025-07-24), and trickle ICE itself requires gateway >= 1.7.0 so any client that uses the default path is guaranteed to be talking to a batch-capable gateway. Also affects ``pam tunnel start`` (secondary beneficiary — same tunnel setup path). ``pam tunnel start --no-trickle-ice`` is on an already- optimized path in tunnel_helpers.py and is untouched. 1. Batch buffered ICE candidates into one HTTP POST -------------------------------------------------- Every trickle-mode offer flushed the local candidate buffer by calling ``_send_ice_candidate_immediately`` in a loop — 7-8 candidates * ~500ms serial round-trip each = ~3.5s of HTTP time after the offer was acked. Add ``TunnelSignalHandler._send_ice_candidates_batch(candidates, tube_id)`` that sends all candidates in a single ``icecandidate`` action with payload ``{"candidates": [c1, c2, ..., cN]}`` — the gateway already iterates ``for candidate in ice_candidates`` in ``WebRTCSessionAction.add_ice_candidates_to_conversation_tunnel`` and the per-candidate ``add_ice_candidate`` PyO3 binding is spawn-and-return, so one batch costs the gateway ~the same as one candidate. Converts the 5 client-side flush sites: 3 in ``tunnel_helpers.py`` (SDP answer in WS listener, state-change-to-connected, post-offer flush in ``start_rust_tunnel``) and 2 in ``pam_launch/terminal_connection.py`` (streaming offer branch, non-streaming SDP-answer handler). ``_send_ice_candidate_immediately`` is kept for the single-candidate live path (post-offer candidate that arrives one at a time) at ``tunnel_helpers.py:1727`` — that one is already one HTTP call per event. Net webrtc-tunnel phase drop: 6189ms -> 2965ms (-3.2s). Most of that moves to ``cli_session.webrtc_data_plane_connected`` (974ms -> 3355ms) because the ICE pair selection / data-channel open was previously hidden behind the serial HTTP loop and is now the exposed critical path. End-to-end wall-clock saving: ~1s per launch. 2. Skip WebSocket-ready wait + backend_delay in --no-trickle-ice mode -------------------------------------------------------------------- ``_open_terminal_webrtc_tunnel`` was unconditionally blocking for ``tunnel_session.websocket_ready_event.wait()`` + ``WEBSOCKET_BACKEND_DELAY`` (~700ms total) before sending the offer. In non-trickle mode the SDP answer arrives in the HTTP response body of the offer POST itself and ICE candidates are carried inside the offer SDP — there is no streamed conversation on the WebSocket to wait on. Wrap that block in ``if trickle_ice:`` and skip entirely in the non-trickle branch. The listener keeps running in the background for async signaling (disconnect / state changes); the main thread just does not block on its readiness. Matches the pattern already used by ``tunnel_helpers.py::start_rust_tunnel`` for non-trickle mode. Saves ~700ms on every ``pam launch --no-trickle-ice``. 3. Always emit PamConnectTiming checkpoints at DEBUG level --------------------------------------------------------- Commander's ``debug --file=<path>`` installs a file log handler with an explicit ``record.levelno != logging.INFO`` filter (see ``cli.py::setup_file_logging``) so user-facing ``logging.info(...)`` prints stay out of the debug log. PamConnectTiming previously bumped its checkpoint / summary records to INFO when ``PAM_CONNECT_TIMING=1`` was set, which meant those records were being silently dropped by the file-debug filter — timing lines never appeared in the captured log when ``debug --file`` + ``PAM_CONNECT_TIMING=1`` were used together. Always emit at DEBUG regardless of the env var. ``connect_timing_log_enabled()`` still gates whether to emit at all; only the chosen level changes. DEBUG passes the file-debug filter cleanly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…1984) Three related tunnel-open improvements measured against the post-PR2 release (~10.3s grand total ``ready_for_prompt``): this brings ``pam launch`` down to ~9.3s on trickle mode (~1s saved end-to-end) and ~700ms off ``pam launch --no-trickle-ice``. Gateway changes are not required — batch support has existed on the gateway's Python side since 1.7.0 (commit 60f594b3, released 2025-07-24), and trickle ICE itself requires gateway >= 1.7.0 so any client that uses the default path is guaranteed to be talking to a batch-capable gateway. Also affects ``pam tunnel start`` (secondary beneficiary — same tunnel setup path). ``pam tunnel start --no-trickle-ice`` is on an already- optimized path in tunnel_helpers.py and is untouched. 1. Batch buffered ICE candidates into one HTTP POST -------------------------------------------------- Every trickle-mode offer flushed the local candidate buffer by calling ``_send_ice_candidate_immediately`` in a loop — 7-8 candidates * ~500ms serial round-trip each = ~3.5s of HTTP time after the offer was acked. Add ``TunnelSignalHandler._send_ice_candidates_batch(candidates, tube_id)`` that sends all candidates in a single ``icecandidate`` action with payload ``{"candidates": [c1, c2, ..., cN]}`` — the gateway already iterates ``for candidate in ice_candidates`` in ``WebRTCSessionAction.add_ice_candidates_to_conversation_tunnel`` and the per-candidate ``add_ice_candidate`` PyO3 binding is spawn-and-return, so one batch costs the gateway ~the same as one candidate. Converts the 5 client-side flush sites: 3 in ``tunnel_helpers.py`` (SDP answer in WS listener, state-change-to-connected, post-offer flush in ``start_rust_tunnel``) and 2 in ``pam_launch/terminal_connection.py`` (streaming offer branch, non-streaming SDP-answer handler). ``_send_ice_candidate_immediately`` is kept for the single-candidate live path (post-offer candidate that arrives one at a time) at ``tunnel_helpers.py:1727`` — that one is already one HTTP call per event. Net webrtc-tunnel phase drop: 6189ms -> 2965ms (-3.2s). Most of that moves to ``cli_session.webrtc_data_plane_connected`` (974ms -> 3355ms) because the ICE pair selection / data-channel open was previously hidden behind the serial HTTP loop and is now the exposed critical path. End-to-end wall-clock saving: ~1s per launch. 2. Skip WebSocket-ready wait + backend_delay in --no-trickle-ice mode -------------------------------------------------------------------- ``_open_terminal_webrtc_tunnel`` was unconditionally blocking for ``tunnel_session.websocket_ready_event.wait()`` + ``WEBSOCKET_BACKEND_DELAY`` (~700ms total) before sending the offer. In non-trickle mode the SDP answer arrives in the HTTP response body of the offer POST itself and ICE candidates are carried inside the offer SDP — there is no streamed conversation on the WebSocket to wait on. Wrap that block in ``if trickle_ice:`` and skip entirely in the non-trickle branch. The listener keeps running in the background for async signaling (disconnect / state changes); the main thread just does not block on its readiness. Matches the pattern already used by ``tunnel_helpers.py::start_rust_tunnel`` for non-trickle mode. Saves ~700ms on every ``pam launch --no-trickle-ice``. 3. Always emit PamConnectTiming checkpoints at DEBUG level --------------------------------------------------------- Commander's ``debug --file=<path>`` installs a file log handler with an explicit ``record.levelno != logging.INFO`` filter (see ``cli.py::setup_file_logging``) so user-facing ``logging.info(...)`` prints stay out of the debug log. PamConnectTiming previously bumped its checkpoint / summary records to INFO when ``PAM_CONNECT_TIMING=1`` was set, which meant those records were being silently dropped by the file-debug filter — timing lines never appeared in the captured log when ``debug --file`` + ``PAM_CONNECT_TIMING=1`` were used together. Always emit at DEBUG regardless of the env var. ``connect_timing_log_enabled()`` still gates whether to emit at all; only the chosen level changes. DEBUG passes the file-debug filter cleanly. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

idimov-keeper merged commit c9c034e into release Apr 23, 2026
4 checks passed

idimov-keeper deleted the pam-launch-batch-ice-candidate-flush branch April 23, 2026 01:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(pam launch): batch ICE candidate flush and skip unneeded waits#1984

perf(pam launch): batch ICE candidate flush and skip unneeded waits#1984
idimov-keeper merged 1 commit intoreleasefrom
pam-launch-batch-ice-candidate-flush

idimov-keeper commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

idimov-keeper commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant