Skip to content

Fix headless PTY background launches on macOS#26

Merged
aannoo merged 5 commits intoaannoo:mainfrom
mmkzer0:fix/headless-pty-eof
Apr 23, 2026
Merged

Fix headless PTY background launches on macOS#26
aannoo merged 5 commits intoaannoo:mainfrom
mmkzer0:fix/headless-pty-eof

Conversation

@mmkzer0
Copy link
Copy Markdown
Contributor

@mmkzer0 mmkzer0 commented Apr 22, 2026

Fix headless PTY background launches on macOS

...oh man, this was an odyssey.

Summary

Fixes the macOS --headless --go regressions on the PTY-backed launch path.

Validated end result:

  • headless Codex works again
  • headless OpenCode works again
  • bare headless Claude without a prompt/task now fails fast with a clear error
  • prompt/task-based headless Claude still works

Motivation

Headless launches on the PTY path had regressed in several overlapping ways, which made this investigation a bit deceptive at first:

  • background PTY launches could drop instance identity before hcom pty even started
  • non-TTY stdin handling on macOS could prematurely tear down the PTY loop
  • invalid/non-useful stdin could stay in the poll set and cause hot wakeups
  • the non-blocking inject listener could keep waking spuriously on macOS after WouldBlock
  • bare hcom claude --headless without a prompt/task looked like “headless PTY is broken” but is actually a separate unsupported detached-Claude path

User-facing symptoms included:

  • Codex/OpenCode ending up in launch_failed
  • missing delivery.start / notify.registered
  • PTY spin / high CPU
  • Claude launching and exiting immediately in unsupported bare headless mode

What changed

Launcher / background PTY env

  • Propagate HCOM_INSTANCE_NAME into the background PTY runner env
  • Apply tool_extra_env(tool) on background PTY launches too

PTY loop hardening

  • Do not treat stdin EOF as terminal disconnect when stdin is not a TTY
  • Fully remove stdin from the poll set when it should no longer be polled
  • Drop stdin polling on POLLNVAL
  • Ignore stdin POLLHUP for non-TTY headless PTY sessions
  • Add one-iteration listener backoff after inject-listener WouldBlock

PTY child identity

  • Mark PTY child processes with HCOM_LAUNCHED=1

Claude safety guard

  • Reject bare hcom claude --headless without a prompt/task
  • Keep prompt/task-based headless Claude behavior intact

Notes / implementation details

  • The main headless Codex/OpenCode fix was restoring HCOM_INSTANCE_NAME on the background PTY path so delivery can initialize for the correct instance.
  • The PTY poll loop changes are specifically aimed at non-interactive stdin on macOS (/dev/null, closed pipes, invalid fds).
  • The inject listener still stays non-blocking; the fix is not to change socket mode, but to back off for one iteration after WouldBlock.
  • The HCOM_LAUNCHED=1 PTY-child env fix ensures downstream hooks/tool integrations still recognize PTY-managed child sessions as hcom-launched.
  • Bare headless Claude is intentionally not “fixed” into a PTY-backed live session model in this PR. The safe change here is explicit early validation.

Validation

Automated

  • cargo test
  • Added regression coverage for:
    • background runner env propagation
    • Claude headless validation
    • PTY child launch env
    • inject listener accept helper behavior / timing resilience

Manual

  • ./target/release/hcom codex --headless --go
    • launches successfully
    • transitions to listening
    • round-trip message delivery works
  • ./target/release/hcom opencode --headless --go
    • launches successfully
    • transitions to listening
    • round-trip message delivery works
  • ./target/release/hcom claude --headless --go
    • now fails fast with a clear prompt/task requirement
  • ./target/release/hcom claude --headless --go 'Ping!'
    • launches successfully
    • later message delivery round-trip works

Active-binary PTY re-check

  • Rebuilt release binary
  • Launched fresh PTY-backed headless Codex from ./target/release/hcom
  • Observed normal delivery.start / notify.registered
  • Observed no new recent proxy.poll_spin_trace / POLLNVAL log entries
  • Focused process check showed normal CPU for PTY wrapper and child

Final pre-PR hardening

  • Updated PTY stdin handling so non-TTY POLLHUP disables stdin polling instead of breaking the PTY loop
  • Made the inject accept regression test resilient to non-blocking localhost timing
  • Reran focused PTY tests and full cargo test successfully

Deferred follow-up

Not included in this PR:

  • true PTY-backed headless Claude as a live background session model
  • deeper investigation of Claude initial-prompt swallowing on detached launches
  • any broader resume/fork cleanup around tool marker vars beyond the validated fixes here
    (those are deferred follow-up items rather than blockers for restoring stable headless Codex/OpenCode PTY launches)

Commit stack

  • fix(pty): don't treat stdin EOF as terminal disconnect when stdin is not a TTY
  • fix(pty): mark PTY child processes as hcom-launched
  • fix(pty): fully remove stdin from poll set when not a TTY
  • fix(headless): restore stable PTY background launches
  • fix(pty): tighten headless stdin and inject test handling

mmkzer0 added 5 commits April 22, 2026 19:24
…not a TTY

When running hcom in headless/background mode, the launcher spawns the
PTY process with stdin=/dev/null. On /dev/null, poll() immediately
returns POLLIN (always readable), and the subsequent read returns Ok(0)
(EOF). The existing code treated any stdin EOF as 'terminal gone' and
broke the poll loop, which then killed the child agent with SIGTERM and
marked it as launch_failed.

However, the poll-timeout path already had an isatty() guard for this
exact scenario — but when stdin=/dev/null, poll() never times out
because /dev/null is always readable, so the isatty check was bypassed.

Fix: guard the Ok(0) => break path with an isatty check. Only break on
stdin EOF if stdin is actually a TTY. If stdin is /dev/null or a pipe,
ignore EOF and continue polling.

Fixes: background/headless agent launches being killed immediately.
Set HCOM_LAUNCHED=1 in hcom pty child environments so downstream hooks and tool integrations correctly recognize PTY-managed sessions as hcom launches.
The previous fix used PollFlags::empty() for stdin when poll_stdin=false,
but on macOS poll() may still return immediately for a readable fd even
with events=0 (e.g. /dev/null in headless mode). This caused a tight
busy-wait loop consuming 99.9% CPU.

Fix: dynamically build poll_fds and completely omit stdin when we're no
longer polling it. Adjust inject_listener and client indices accordingly.

Also wraps stdin handling in 'if poll_stdin' to avoid accidentally
checking inject listener revents when stdin is absent.
- propagate HCOM_INSTANCE_NAME into background PTY runner env
- apply tool-specific PTY env on background launches
- back off inject listener after WouldBlock on macOS
- drop stdin from PTY polling on POLLNVAL
- reject bare Claude headless launches without a prompt/task
- add regression tests for launcher env and PTY helpers
- ignore stdin POLLHUP for non-TTY headless PTY sessions
- make inject accept regression test resilient to non-blocking timing
Copilot AI review requested due to automatic review settings April 22, 2026 19:34
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Restores stable macOS headless/background launches on the PTY-backed path by fixing env propagation, hardening the PTY poll loop against non-interactive stdin edge cases, and adding an explicit guard for unsupported “bare” headless Claude launches.

Changes:

  • Propagate instance/tool-specific environment into background PTY runner launches and mark PTY-spawned children as hcom-launched.
  • Harden the PTY event loop for macOS headless stdin/poll quirks and add inject-listener backoff behavior.
  • Fail fast for hcom claude --headless when no prompt/task is provided, with regression tests.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/pty/mod.rs Adds child env injection support and updates poll loop behavior for stdin + inject listener handling.
src/pty/inject.rs Changes accept() to return whether a connection was accepted; adds focused tests.
src/main.rs Supplies PTY child env (HCOM_LAUNCHED=1) and adds a regression test.
src/launcher.rs Ensures background PTY runner env includes HCOM_INSTANCE_NAME and tool_extra_env; adds tests.
src/commands/launch.rs Adds validation to reject bare headless Claude without prompt/task; adds tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/pty/mod.rs
Comment on lines +741 to +749
// Only include stdin in poll set while we're actively polling it.
// When stdin is a non-TTY (e.g. /dev/null in headless mode), we stop
// polling it to avoid busy-waiting — but we must fully remove it from
// the poll set, not just pass empty events, because some platforms
// (macOS) may still return immediately for a readable fd even with
// events=0.
if poll_stdin {
poll_fds.push(PollFd::new(stdin_borrowed, PollFlags::POLLIN));
}
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

poll_stdin can become false specifically when poll reports POLLNVAL (invalid/closed stdin fd). In that case, the next loop iteration still constructs stdin_borrowed via unsafe { BorrowedFd::borrow_raw(stdin_raw) } (a few lines above this hunk) even though stdin may be closed, which is undefined behavior. Suggestion: only create/borrow stdin fd inside the if poll_stdin { ... } block (or switch the isatty checks to libc::isatty(stdin_raw) so no BorrowedFd is needed for potentially-invalid fds).

Copilot uses AI. Check for mistakes.
Comment thread src/pty/mod.rs
Comment on lines +751 to +761
// Include the inject listener unless we're in backoff (macOS spurious POLLIN).
// Reset backoff here so it applies for exactly one iteration.
let include_listener = !listener_backoff;
listener_backoff = false;
let inject_listener_idx: Option<usize> = if include_listener {
let idx = poll_fds.len();
poll_fds.push(PollFd::new(inject_listener_fd, PollFlags::POLLIN));
Some(idx)
} else {
None
};
Copy link

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one-iteration listener_backoff strategy can introduce large accept latency: when the listener is excluded, poll() can block for up to poll_timeout (10s) if the PTY is idle, so new inject connections (including delivery thread injections) may not be accepted until the timeout expires. Consider bounding this by using a much shorter poll timeout during backoff, or keeping the listener in the poll set and applying a small sleep/yield when accept() returns WouldBlock.

Copilot uses AI. Check for mistakes.
@aannoo aannoo merged commit 3716c1c into aannoo:main Apr 23, 2026
13 of 14 checks passed
@aannoo
Copy link
Copy Markdown
Owner

aannoo commented Apr 23, 2026

nice

@mmkzer0 mmkzer0 deleted the fix/headless-pty-eof branch April 23, 2026 08:57
aannoo added a commit that referenced this pull request Apr 24, 2026
…istener backoff (#27)

* fix(status): unify hook detection with verify_* and silence expected dev_root warnings

hcom status used its own naive detection (substring match on settings.json,
file-existence for the opencode plugin) while hcom hooks used the canonical
verify_* functions. After hcom hooks remove, status kept reporting Claude/Gemini
as installed because permission strings like "Bash(hcom ...)" still contained
"hcom". Route all four tool checks through the same verify_* functions so status
and hooks never disagree.

router.read_dev_root_from_kv logged WARN on every expected "not configured"
state (fresh HCOM_DIR where the db doesn't exist yet, unset dev_root key
returning QueryReturnedNoRows). These fired on every hcom invocation and every
hook dispatch. Treat them silently; keep WARN for real errors (permission
denied, corruption, etc.).

* fix(launch): normalize claude + --headless + prompt to print-headless

Before: `hcom claude --headless --hcom-prompt 'task'` (or a positional
task) passed validation but launched as a detached plain `claude` — the
spec never saw `-p`/`--print`, so `add_background_defaults` didn't fire
and the child never got `--output-format stream-json --verbose`.

Thread initial_prompt into prepare_launch_execution. When
tool == "claude" && background && a prompt is present, inject `-p` into
the merged args before calling add_background_defaults so the child
always runs through claude's print mode.

Claude-specific — --headless semantics for codex/gemini/opencode
unchanged.

Addresses finding 1 of the PR #26 post-merge review.

* fix(pty): cap poll timeout during inject-listener backoff

The macOS kqueue spurious-POLLIN workaround drops the inject listener
from the poll set for one iteration after WouldBlock. The poll timeout
stayed at 10s (or 5s with debug) for that iteration, so an inject
connection arriving while the listener was backed off could wait the
full timeout before the listener re-entered the poll set.

Cap the poll timeout at 100ms whenever the listener is excluded so the
next iteration (with the listener re-included) runs quickly.

Addresses finding 2 of the PR #26 post-merge review.

* fix(scripts): pass -p explicitly to bundled claude --headless launches

The launcher now normalizes claude + --headless + prompt to inject -p,
but adding it explicitly in the bundled scripts makes intent clear and
is belt-and-suspenders. Guard the non-claude cases (fatcow/debate can
launch any tool) so -p only appears when the tool is claude.

* fix(relay): enforce claude headless validation on remote launch path

The local CLI short-circuits into dispatch_remote before local
validate/prepare fires when --device is set, so the remote handler has
to enforce the Claude invariant itself. Before this commit, bare remote
`hcom claude --headless --device X` would prepare as background=true
pty=false with no -p and fall straight through launcher::launch,
preserving the unsupported detached plain-claude path on the remote.

Call validate_claude_headless_launch after prepare_remote_launch so
remote and local paths reject the same inputs. Bare claude --headless
returns the same prompt/task error; claude --headless --hcom-prompt is
normalized through the -p injection in prepare_launch_execution.

Addresses @mafo's blocker in the PR #27 review thread.

---------

Co-authored-by: aannoo <aannoo@users.noreply.github.com>
aannoo added a commit that referenced this pull request Apr 24, 2026
PR #26 landed `--headless` for claude via the detached print-mode path
(-p --output-format stream-json --verbose, one-shot, exits after the
prompt), and deferred a true PTY-backed live headless session. This
commit adds that path.

User-facing
- New `--pty` launch flag. For claude, combines with `--headless` to
  launch the interactive TUI in a PTY wrapper inside a detached runner
  — same shape other tools already use, giving claude a live background
  session that accepts hcom inject. `--pty` alone (no --headless) keeps
  today's foreground claude-pty behavior.
- `hcom claude --headless --pty` with no prompt is now a valid launch;
  the session sits idle waiting for hcom messages.
- `--pty` added to `SHARED_LAUNCH_FLAGS` so it shows up in help for
  both fresh launches and resume/fork.

Plumbing
- `HcomLaunchFlags.pty` parsed in `extract_launch_flags`; forwarded
  into `prepare_launch_execution` and the dispatch-remote payload.
- `prepare_launch_execution` now skips the -p injection + background
  defaults when `--pty` is set — the PTY wrapper hosts the TUI, and
  forcing print mode there would end the session on first reply.
- `validate_claude_headless_launch` gets a `use_pty` parameter. Bare
  `claude --headless --pty` is allowed; the prompt-required invariant
  still applies to the NativePrint path.
- `RemoteLaunchRequest.pty` carries the flag across the relay.

Launcher
- Drop the `bail!("Claude PTY does not support headless/background
  mode")` guard in `launch()`.
- Route `LaunchTool::ClaudePty` through `launch_pty_or_background` the
  way gemini/codex/opencode do, so background mode spawns the PTY
  wrapper in a detached runner.
- Introduce `LaunchBackend` enum (`InteractiveVisible` / `HeadlessPty`
  / `NativePrint`). Resolved once in `launch()` from the already-
  prepared (tool, background, pty) triple. Dispatch branches on the
  enum where it clarifies intent (claude's NativePrint-vs-visible
  split); the other tools' paths already encode the split via
  LaunchTool and use `launch_pty_or_background` directly.

Tests
- Unit: `--pty --headless` claude does not inject -p and keeps use_pty
  true; interactive claude-pty unchanged.
- Unit: validator allows no-prompt `--pty --headless` and still rejects
  no-prompt non-PTY headless.
- Unit: `LaunchBackend::resolve` matrix for each (tool, background,
  pty) combination.

Live
- `env -u HCOM_INSTANCE_NAME hcom claude --headless --pty
  --hcom-prompt 'respond via hcom send then stop'` → round-trip reply.
- `env -u HCOM_INSTANCE_NAME hcom claude --headless --pty` (no prompt)
  → reaches listening, replies to `hcom send`.
aannoo added a commit that referenced this pull request Apr 24, 2026
…#28)

* feat: PTY-headless claude via --pty, plus LaunchBackend enum

PR #26 landed `--headless` for claude via the detached print-mode path
(-p --output-format stream-json --verbose, one-shot, exits after the
prompt), and deferred a true PTY-backed live headless session. This
commit adds that path.

User-facing
- New `--pty` launch flag. For claude, combines with `--headless` to
  launch the interactive TUI in a PTY wrapper inside a detached runner
  — same shape other tools already use, giving claude a live background
  session that accepts hcom inject. `--pty` alone (no --headless) keeps
  today's foreground claude-pty behavior.
- `hcom claude --headless --pty` with no prompt is now a valid launch;
  the session sits idle waiting for hcom messages.
- `--pty` added to `SHARED_LAUNCH_FLAGS` so it shows up in help for
  both fresh launches and resume/fork.

Plumbing
- `HcomLaunchFlags.pty` parsed in `extract_launch_flags`; forwarded
  into `prepare_launch_execution` and the dispatch-remote payload.
- `prepare_launch_execution` now skips the -p injection + background
  defaults when `--pty` is set — the PTY wrapper hosts the TUI, and
  forcing print mode there would end the session on first reply.
- `validate_claude_headless_launch` gets a `use_pty` parameter. Bare
  `claude --headless --pty` is allowed; the prompt-required invariant
  still applies to the NativePrint path.
- `RemoteLaunchRequest.pty` carries the flag across the relay.

Launcher
- Drop the `bail!("Claude PTY does not support headless/background
  mode")` guard in `launch()`.
- Route `LaunchTool::ClaudePty` through `launch_pty_or_background` the
  way gemini/codex/opencode do, so background mode spawns the PTY
  wrapper in a detached runner.
- Introduce `LaunchBackend` enum (`InteractiveVisible` / `HeadlessPty`
  / `NativePrint`). Resolved once in `launch()` from the already-
  prepared (tool, background, pty) triple. Dispatch branches on the
  enum where it clarifies intent (claude's NativePrint-vs-visible
  split); the other tools' paths already encode the split via
  LaunchTool and use `launch_pty_or_background` directly.

Tests
- Unit: `--pty --headless` claude does not inject -p and keeps use_pty
  true; interactive claude-pty unchanged.
- Unit: validator allows no-prompt `--pty --headless` and still rejects
  no-prompt non-PTY headless.
- Unit: `LaunchBackend::resolve` matrix for each (tool, background,
  pty) combination.

Live
- `env -u HCOM_INSTANCE_NAME hcom claude --headless --pty
  --hcom-prompt 'respond via hcom send then stop'` → round-trip reply.
- `env -u HCOM_INSTANCE_NAME hcom claude --headless --pty` (no prompt)
  → reaches listening, replies to `hcom send`.

* fix(launch): reject --pty + -p/--print conflict

`hcom claude --headless --pty -p 'task'` (or bare `--pty -p`) routed
through ClaudePty and spawned the PTY wrapper, but -p is claude's
one-shot print mode — it answers and exits, tearing down the live
session the moment the first reply lands. Silent misbehavior.

Reject the combination explicitly in validate_claude_headless_launch
rather than stripping -p, so the user notices: --pty means live TUI
session, -p means print mode. Shared validator runs on both local and
remote paths, so the remote path inherits the check.

Addresses @mafo's edge-case finding on PR #28.

---------

Co-authored-by: aannoo <aannoo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants