Skip to content

perf(pam launch): batch ICE candidate flush and skip unneeded waits#1984

Merged
idimov-keeper merged 1 commit intoreleasefrom
pam-launch-batch-ice-candidate-flush
Apr 23, 2026
Merged

perf(pam launch): batch ICE candidate flush and skip unneeded waits#1984
idimov-keeper merged 1 commit intoreleasefrom
pam-launch-batch-ice-candidate-flush

Conversation

@idimov-keeper
Copy link
Copy Markdown
Contributor

  1. Batch buffered ICE candidates into one HTTP POST
  2. Skip WebSocket-ready wait + backend_delay in --no-trickle-ice mode
  3. Always emit PamConnectTiming checkpoints at DEBUG level

Three related tunnel-open improvements measured against the post-PR2
release (~10.3s grand total ``ready_for_prompt``): this brings
``pam launch`` down to ~9.3s on trickle mode (~1s saved end-to-end) and
~700ms off ``pam launch --no-trickle-ice``. Gateway changes are not
required — batch support has existed on the gateway's Python side since
1.7.0 (commit 60f594b3, released 2025-07-24), and trickle ICE itself
requires gateway >= 1.7.0 so any client that uses the default path is
guaranteed to be talking to a batch-capable gateway.

Also affects ``pam tunnel start`` (secondary beneficiary — same tunnel
setup path). ``pam tunnel start --no-trickle-ice`` is on an already-
optimized path in tunnel_helpers.py and is untouched.

1. Batch buffered ICE candidates into one HTTP POST
--------------------------------------------------
Every trickle-mode offer flushed the local candidate buffer by calling
``_send_ice_candidate_immediately`` in a loop — 7-8 candidates * ~500ms
serial round-trip each = ~3.5s of HTTP time after the offer was acked.
Add ``TunnelSignalHandler._send_ice_candidates_batch(candidates, tube_id)``
that sends all candidates in a single ``icecandidate`` action with
payload ``{"candidates": [c1, c2, ..., cN]}`` — the gateway already
iterates ``for candidate in ice_candidates`` in
``WebRTCSessionAction.add_ice_candidates_to_conversation_tunnel`` and
the per-candidate ``add_ice_candidate`` PyO3 binding is spawn-and-return,
so one batch costs the gateway ~the same as one candidate. Converts the
5 client-side flush sites: 3 in ``tunnel_helpers.py`` (SDP answer in
WS listener, state-change-to-connected, post-offer flush in
``start_rust_tunnel``) and 2 in ``pam_launch/terminal_connection.py``
(streaming offer branch, non-streaming SDP-answer handler).

``_send_ice_candidate_immediately`` is kept for the single-candidate
live path (post-offer candidate that arrives one at a time) at
``tunnel_helpers.py:1727`` — that one is already one HTTP call per
event.

Net webrtc-tunnel phase drop: 6189ms -> 2965ms (-3.2s). Most of that
moves to ``cli_session.webrtc_data_plane_connected`` (974ms -> 3355ms)
because the ICE pair selection / data-channel open was previously
hidden behind the serial HTTP loop and is now the exposed critical
path. End-to-end wall-clock saving: ~1s per launch.

2. Skip WebSocket-ready wait + backend_delay in --no-trickle-ice mode
--------------------------------------------------------------------
``_open_terminal_webrtc_tunnel`` was unconditionally blocking for
``tunnel_session.websocket_ready_event.wait()`` + ``WEBSOCKET_BACKEND_DELAY``
(~700ms total) before sending the offer. In non-trickle mode the SDP
answer arrives in the HTTP response body of the offer POST itself and
ICE candidates are carried inside the offer SDP — there is no
streamed conversation on the WebSocket to wait on. Wrap that block in
``if trickle_ice:`` and skip entirely in the non-trickle branch. The
listener keeps running in the background for async signaling
(disconnect / state changes); the main thread just does not block on
its readiness. Matches the pattern already used by
``tunnel_helpers.py::start_rust_tunnel`` for non-trickle mode.

Saves ~700ms on every ``pam launch --no-trickle-ice``.

3. Always emit PamConnectTiming checkpoints at DEBUG level
---------------------------------------------------------
Commander's ``debug --file=<path>`` installs a file log handler with an
explicit ``record.levelno != logging.INFO`` filter (see
``cli.py::setup_file_logging``) so user-facing ``logging.info(...)``
prints stay out of the debug log. PamConnectTiming previously bumped
its checkpoint / summary records to INFO when ``PAM_CONNECT_TIMING=1``
was set, which meant those records were being silently dropped by the
file-debug filter — timing lines never appeared in the captured log
when ``debug --file`` + ``PAM_CONNECT_TIMING=1`` were used together.

Always emit at DEBUG regardless of the env var. ``connect_timing_log_enabled()``
still gates whether to emit at all; only the chosen level changes.
DEBUG passes the file-debug filter cleanly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@idimov-keeper idimov-keeper merged commit c9c034e into release Apr 23, 2026
4 checks passed
@idimov-keeper idimov-keeper deleted the pam-launch-batch-ice-candidate-flush branch April 23, 2026 01:33
sk-keeper pushed a commit that referenced this pull request Apr 24, 2026
…1984)

Three related tunnel-open improvements measured against the post-PR2
release (~10.3s grand total ``ready_for_prompt``): this brings
``pam launch`` down to ~9.3s on trickle mode (~1s saved end-to-end) and
~700ms off ``pam launch --no-trickle-ice``. Gateway changes are not
required — batch support has existed on the gateway's Python side since
1.7.0 (commit 60f594b3, released 2025-07-24), and trickle ICE itself
requires gateway >= 1.7.0 so any client that uses the default path is
guaranteed to be talking to a batch-capable gateway.

Also affects ``pam tunnel start`` (secondary beneficiary — same tunnel
setup path). ``pam tunnel start --no-trickle-ice`` is on an already-
optimized path in tunnel_helpers.py and is untouched.

1. Batch buffered ICE candidates into one HTTP POST
--------------------------------------------------
Every trickle-mode offer flushed the local candidate buffer by calling
``_send_ice_candidate_immediately`` in a loop — 7-8 candidates * ~500ms
serial round-trip each = ~3.5s of HTTP time after the offer was acked.
Add ``TunnelSignalHandler._send_ice_candidates_batch(candidates, tube_id)``
that sends all candidates in a single ``icecandidate`` action with
payload ``{"candidates": [c1, c2, ..., cN]}`` — the gateway already
iterates ``for candidate in ice_candidates`` in
``WebRTCSessionAction.add_ice_candidates_to_conversation_tunnel`` and
the per-candidate ``add_ice_candidate`` PyO3 binding is spawn-and-return,
so one batch costs the gateway ~the same as one candidate. Converts the
5 client-side flush sites: 3 in ``tunnel_helpers.py`` (SDP answer in
WS listener, state-change-to-connected, post-offer flush in
``start_rust_tunnel``) and 2 in ``pam_launch/terminal_connection.py``
(streaming offer branch, non-streaming SDP-answer handler).

``_send_ice_candidate_immediately`` is kept for the single-candidate
live path (post-offer candidate that arrives one at a time) at
``tunnel_helpers.py:1727`` — that one is already one HTTP call per
event.

Net webrtc-tunnel phase drop: 6189ms -> 2965ms (-3.2s). Most of that
moves to ``cli_session.webrtc_data_plane_connected`` (974ms -> 3355ms)
because the ICE pair selection / data-channel open was previously
hidden behind the serial HTTP loop and is now the exposed critical
path. End-to-end wall-clock saving: ~1s per launch.

2. Skip WebSocket-ready wait + backend_delay in --no-trickle-ice mode
--------------------------------------------------------------------
``_open_terminal_webrtc_tunnel`` was unconditionally blocking for
``tunnel_session.websocket_ready_event.wait()`` + ``WEBSOCKET_BACKEND_DELAY``
(~700ms total) before sending the offer. In non-trickle mode the SDP
answer arrives in the HTTP response body of the offer POST itself and
ICE candidates are carried inside the offer SDP — there is no
streamed conversation on the WebSocket to wait on. Wrap that block in
``if trickle_ice:`` and skip entirely in the non-trickle branch. The
listener keeps running in the background for async signaling
(disconnect / state changes); the main thread just does not block on
its readiness. Matches the pattern already used by
``tunnel_helpers.py::start_rust_tunnel`` for non-trickle mode.

Saves ~700ms on every ``pam launch --no-trickle-ice``.

3. Always emit PamConnectTiming checkpoints at DEBUG level
---------------------------------------------------------
Commander's ``debug --file=<path>`` installs a file log handler with an
explicit ``record.levelno != logging.INFO`` filter (see
``cli.py::setup_file_logging``) so user-facing ``logging.info(...)``
prints stay out of the debug log. PamConnectTiming previously bumped
its checkpoint / summary records to INFO when ``PAM_CONNECT_TIMING=1``
was set, which meant those records were being silently dropped by the
file-debug filter — timing lines never appeared in the captured log
when ``debug --file`` + ``PAM_CONNECT_TIMING=1`` were used together.

Always emit at DEBUG regardless of the env var. ``connect_timing_log_enabled()``
still gates whether to emit at all; only the chosen level changes.
DEBUG passes the file-debug filter cleanly.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant