Skip to content

fix(win): prevent UI hang — spawn git without leaking handles into the child#799

Closed
Flipper1994 wants to merge 2 commits into
DeusData:mainfrom
Flipper1994:fix/windows-ui-popen-handle-inheritance
Closed

fix(win): prevent UI hang — spawn git without leaking handles into the child#799
Flipper1994 wants to merge 2 commits into
DeusData:mainfrom
Flipper1994:fix/windows-ui-popen-handle-inheritance

Conversation

@Flipper1994

@Flipper1994 Flipper1994 commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes a Windows-only deadlock that makes the web UI hang forever. With --ui=true,
the MCP tool list_projects never returns and, because the UI HTTP server is
single-threaded, the whole server stops responding — the UI page loads but stays
blank. Closes #798.

Root cause

list_projects shells out to git per project for git context
(git_capturecbm_popen, src/git/git_context.c). On Windows cbm_popen was
the CRT _popen, which does CreateProcess(bInheritHandles = TRUE) and leaks
every inheritable handle into the git child — including the Winsock/AFD handles
that exist only in UI mode. git-for-Windows (MSYS2/Cygwin) classifies each inherited
handle via NtQueryObject on startup, which deadlocks on an inherited
socket/AFD handle. git never runs, so the server blocks forever in fgets on git's
stdout pipe. Confirmed with gdb: request thread in fgets/git_capture, the
git.exe child in ntdll!ZwQueryObject.

The plain MCP-stdio server and the CLI are unaffected (no socket handles to inherit),
which is why only the UI hangs.

The fix

Reimplement cbm_popen on Windows (src/foundation/compat_fs.c) to spawn via
CreateProcess + STARTUPINFOEX with an explicit
PROC_THREAD_ATTRIBUTE_HANDLE_LIST containing only the stdout write-end and a NUL
handle for the child's stdin/stderr. The git child now inherits only the pipe —
no sockets, no MCP stdin pipe, no Winsock handles — so there is no foreign handle for
NtQueryObject to deadlock on. cbm_pclose reaps the process via a small
FILE*HANDLE table. The POSIX path is unchanged (popen already opens its pipe
O_CLOEXEC). This is centralized in cbm_popen, so it also covers the watcher
(src/watcher/watcher.c) and git-history pass, which shell out to git the same way.

Validation (Windows, production cbm-with-ui binary)

before after
POST /rpc list_projects (UI) hang / timeout 200 in ~1 s, 6× stable
leftover git/cmd processes persist forever none
web UI in browser blank page renders, stays responsive

Checklist

  • Every commit is signed off (git commit -s)
  • Tests pass locally — note: this repo's test.sh builds with ASan/UBSan, which
    are unavailable on the MSYS2/MinGW toolchain used to reproduce this; the fix is
    #ifdef _WIN32-only and does not touch the POSIX path.
  • Lint passes
  • Regression test — added in the httpd suite (tests/test_httpd.c,
    tests/test_main.c). The git NtQueryObject deadlock is
    environment-sensitive (it does not reproduce on every git-for-Windows/MSYS
    build, nor under Linux CI), so the load-bearing guard tests the fix's
    contract directly and deterministically: cbm_popen_isolates_inheritable_handle
    creates an inheritable marker handle and spawns a re-exec of the test runner
    via cbm_popen (hidden __handle_probe mode), asserting the child cannot
    read the marker — verified RED without the fix (leaked==1), GREEN with
    it
    . Two companion end-to-end liveness invariants (git-context resolve and
    POST /rpc list_projects under a watchdog while the UI server holds live
    sockets) catch Windows: web UI hangs — list_projects deadlocks the HTTP server (git via _popen handle inheritance) #798 on affected git builds and any future hang regression.

On Windows with the UI enabled (--ui=true), list_projects hangs forever and
wedges the single-threaded HTTP server, so the web UI never loads.

Root cause: list_projects resolves git context per project via git_capture ->
cbm_popen (git_context.c). On Windows cbm_popen was _popen, which spawns the
child with CreateProcess(bInheritHandles=TRUE) and leaks every inheritable
handle into the git child -- including the Winsock/AFD handles that exist only
in UI mode. git-for-Windows (MSYS2/Cygwin) classifies each inherited handle via
NtQueryObject on startup, which deadlocks on an inherited socket/AFD handle, so
git never runs and the server blocks forever in fgets on git's stdout pipe.
Confirmed with gdb (request thread in fgets/git_capture, git.exe child in
ntdll!ZwQueryObject). The plain MCP-stdio server and the CLI are unaffected --
no socket handles to inherit -- which is why only the UI hangs.

Fix: reimplement cbm_popen on Windows with CreateProcess + STARTUPINFOEX and an
explicit PROC_THREAD_ATTRIBUTE_HANDLE_LIST containing only the stdout write-end
and a NUL handle for stdin/stderr. The git child now inherits only the pipe, so
there is no foreign handle for NtQueryObject to deadlock on. cbm_pclose reaps
via a small FILE*->HANDLE table. The POSIX path is unchanged (popen already
sets O_CLOEXEC). Centralized in cbm_popen, so it also covers watcher.c and the
git-history pass.

Signed-off-by: Flipper <jacobphilipp@ymail.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Flipper1994 Flipper1994 requested a review from DeusData as a code owner July 3, 2026 14:07
@DeusData DeusData added bug Something isn't working windows Windows-specific issues ux/behavior Display bugs, docs, adoption UX stability/performance Server crashes, OOM, hangs, high CPU/memory priority/high Needs near-term maintainer attention; high-impact bug, regression, safety issue, or release blocker. labels Jul 4, 2026
@DeusData

DeusData commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Thanks for the Windows UI hang fix for #798. Triage: high-priority Windows/UI stability.

Review will focus on the custom Windows cbm_popen implementation: handle inheritance must be closed for git-for-Windows without leaking handles, hanging the HTTP server, or changing POSIX behavior. A regression that proves list_projects returns under UI mode on Windows is the important guard.

…le leak

Guards the cbm_popen handle-inheritance fix. The git NtQueryObject deadlock
is environment-sensitive (does not reproduce on every git-for-Windows/MSYS
build), so a git-based test cannot be relied on to go RED. Instead
cbm_popen_isolates_inheritable_handle tests the fix's contract directly:
it creates an inheritable marker handle and spawns a re-exec of the test
runner via cbm_popen (hidden __handle_probe mode), asserting the child
cannot read the marker. Verified RED without the fix, GREEN with it.

Two companion end-to-end liveness invariants (git-context resolve and
POST /rpc list_projects under a watchdog while the UI server holds live
sockets) catch DeusData#798 on affected git builds and any future hang regression.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Flipper <jacobphilipp@ymail.com>
@Flipper1994

Copy link
Copy Markdown
Contributor Author

@DeusData Added the regression guard you asked for (0c2daca, test-only).

The tricky part: the git NtQueryObject deadlock is environment-sensitive — it does not reproduce on every git-for-Windows/MSYS build (nor under Linux CI, which has no sockets), so a git-based test can't be relied on to go RED. So the load-bearing guard tests the fix's contract directly:

  • cbm_popen_isolates_inheritable_handle (tests/test_httpd.c + a hidden __handle_probe re-exec mode in tests/test_main.c): creates an inheritable marker handle, spawns a re-exec of the test runner through cbm_popen, and asserts the child cannot read the marker — i.e. no foreign handle crosses into the child. Verified RED without the fix (leaked==1), GREEN with it. This is the exact leak that caused the deadlock, tested deterministically on any Windows.
  • Two companion end-to-end liveness invariants — cbm_git_context_resolve (the per-project path list_projects runs) and POST /rpc list_projects, both under a watchdog while the UI server holds live sockets. They enforce "must-not-hang / must-respond under UI mode" and catch Windows: web UI hangs — list_projects deadlocks the HTTP server (git via _popen handle inheritance) #798 on affected git builds; same pattern as the environment-sensitive guards already in test_git_context.c.

Checklist status:

  • Handle inheritance closed for git-for-Windows — STARTUPINFOEX + PROC_THREAD_ATTRIBUTE_HANDLE_LIST = {stdout pipe, NUL}; every error path closes its handles.
  • No server hang / POSIX unchanged — Windows path is #ifdef _WIN32-only; POSIX popen (already O_CLOEXEC) untouched.
  • Regression test — done, as above; the checklist box is now ticked.

All three tests run in the existing httpd suite (no new build wiring), so they execute on the current test-windows CI leg. No production code changed in this follow-up.

@DeusData

DeusData commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Thank you — this was a genuinely hard Windows bug found and fixed correctly: leaked inheritable handles keeping the MSYS2 git child hung on the AFD socket walk, matching #798's stacks exactly. We carried it over the line as d2a5975 (PR #846) with you credited as co-author, plus three hardening deltas from review: the isolated spawn no longer silently falls back to _popen (that would reintroduce the deadlock) and logs on failure, cmd.exe is resolved via %COMSPEC%/System32 (no search-path planting), and the command line is UTF-16 (CreateProcessW) so non-ASCII repo paths survive. The full UI-mode hang reproduction (listening socket + single-threaded server + MSYS2 handle walk) is flagged as a follow-up. Closes #798. Closing in favor of the distill — thanks again!

@DeusData DeusData closed this Jul 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working priority/high Needs near-term maintainer attention; high-impact bug, regression, safety issue, or release blocker. stability/performance Server crashes, OOM, hangs, high CPU/memory ux/behavior Display bugs, docs, adoption UX windows Windows-specific issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Windows: web UI hangs — list_projects deadlocks the HTTP server (git via _popen handle inheritance)

2 participants