Skip to content

fix poll() / getline() buffered IO desync#109

Open
trollkotze wants to merge 2 commits intoDeusData:mainfrom
trollkotze:fix/poll-getline-buffered-io-desync
Open

fix poll() / getline() buffered IO desync#109
trollkotze wants to merge 2 commits intoDeusData:mainfrom
trollkotze:fix/poll-getline-buffered-io-desync

Conversation

@trollkotze
Copy link

@trollkotze trollkotze commented Mar 22, 2026

fixes #78 (at least for me, on Linux Mint 21.2, glibc version 2.35)
(perhaps also relevant to #77, which I referenced here by accident first, see further down in discussion for that)

PR description generated by AI (Claude Opus 4.6), with some human commentary at the bottom.

Problem

Some MCP clients (OpenCode specifically) that send notifications/initialized and tools/list in rapid succession after receiving the initialize response (as the @modelcontextprotocol/sdk does by default) hit a 60-second stall. The server never responds to tools/list until the STORE_IDLE_TIMEOUT_S poll timeout expires, causing the client to report "Failed to get tools".

Root cause

cbm_mcp_server_run() mixes poll() on the raw fd with getline() on the buffered FILE*. When getline() reads the initialize request, libc may read ahead and pull subsequent messages into its internal userspace buffer. On the next iteration, poll() checks the kernel pipe buffer — which is now empty — and blocks for 60 seconds, unaware that getline() already has data ready to return.

Fix

Before calling poll(), check whether the FILE* has buffered data available. If it does, skip poll() and go straight to getline().

Confirmed with a bidirectional wire-logging proxy: tools/list response now arrives in <1ms instead of 60s. Tested with OpenCode (which uses the MCP SDK's StdioClientTransport). Claude Code and other JSONL clients are unaffected.

Human commentary (from me)

Related issue: #78, probably #77 as well
See issue comment on #78 for more detail on the initial error investigation and fix. (later amended here for mac)

@cristicalin
Copy link

@trollkotze I tried your fix on a mac where I have issue #77 but the fix is linux specific and breaks the build on macOS:

src/mcp/mcp.c:2392:33: error: no member named '_IO_read_ptr' in 'struct __sFILE'
 2392 |         int has_buffered = (in->_IO_read_ptr < in->_IO_read_end);  // glibc-specific
      |                             ~~  ^
src/mcp/mcp.c:2392:52: error: no member named '_IO_read_end' in 'struct __sFILE'
 2392 |         int has_buffered = (in->_IO_read_ptr < in->_IO_read_end);  // glibc-specific
      |                                                ~~  ^
2 errors generated.
src/pipeline/pass_githistory.c:171:38: error: missing field 'interhunk_lines' initializer [-Werror,-Wmissing-field-initializers]
  171 |         git_diff_options diff_opts = GIT_DIFF_OPTIONS_INIT;
      |                                      ^
/opt/homebrew/Cellar/libgit2/1.9.2_1/include/git2/diff.h:480:95: note: expanded from macro 'GIT_DIFF_OPTIONS_INIT'
  480 |         {GIT_DIFF_OPTIONS_VERSION, 0, GIT_SUBMODULE_IGNORE_UNSPECIFIED, {NULL,0}, NULL, NULL, NULL, 3}
      |                                                                                                      ^
1 error generated.
make: *** [build/c/codebase-memory-mcp] Error 1

@trollkotze
Copy link
Author

trollkotze commented Mar 22, 2026

@cristicalin

@trollkotze I tried your fix on a mac where I have issue #77 but the fix is linux specific and breaks the build on macOS:

...

Thanks.
I just pushed a potential fix that I can't test myself on MacOS right now. Generated by Claude Opus again, without own understanding. Sounds a bit sketchy, but like it could make sense. Maybe you can test and report back if it works for you.

But perhaps the suggestion at the end would be better. But that's beyond my paygrade.

the alternative is to replace the whole poll() + getline() approach with read() + manual line buffering

Here more complete explanation from the AI:

Actually, the simplest correct fix that works everywhere:

src/mcp/mcp.c:2392

#ifdef __GLIBC__
        int has_buffered = (in->_IO_read_ptr < in->_IO_read_end);
#else
        /* macOS / BSD: use __srget-style check.
         * fp->_r is the count of unread bytes in the buffer. */
        int has_buffered = (in->_r > 0);
#endif
        if (!has_buffered) {
            struct pollfd pfd = {.fd = fd, .events = POLLIN};
            int pr = poll(&pfd, 1, STORE_IDLE_TIMEOUT_S * 1000);
            if (pr < 0) break;
            if (pr == 0) {
                cbm_mcp_server_evict_idle(srv, STORE_IDLE_TIMEOUT_S);
                continue;
            }
        }

On macOS/BSD, FILE (aka struct __sFILE) has _r — the number of unread bytes remaining in the buffer. This is stable across macOS, FreeBSD, OpenBSD, and NetBSD. It's used by macOS's own libc internally.
For Windows (MSVC), the existing code already uses WaitForSingleObject in a separate #ifdef _WIN32 branch, so this doesn't apply there.
If the maintainer wants something that doesn't touch FILE internals at all, the alternative is to replace the whole poll() + getline() approach with read() + manual line buffering — but that's a bigger change. The #ifdef GLIBC / #else approach is a two-line fix that covers the two platforms they actually build on.

@cristicalin
Copy link

the issue according to claude itself is in the shutdown logic, I tested this and it works now with claude code 2.1.81 on top of the original fix you pushed:

diff --git a/src/main.c b/src/main.c
index 79618fa..d67bf69 100644
--- a/src/main.c
+++ b/src/main.c
@@ -280,17 +280,21 @@ int main(int argc, char **argv) {
         g_http_server = NULL;
     }

+    /* Join auto-index / update threads BEFORE freeing watcher or store,
+     * since the auto-index thread may call cbm_watcher_watch() on
+     * srv->watcher after pipeline completion. */
+    cbm_mcp_server_free(g_server);
+    g_server = NULL;
+
     if (watcher_started) {
         cbm_watcher_stop(g_watcher);
         cbm_thread_join(&watcher_tid);
     }
     cbm_watcher_free(g_watcher);
+    g_watcher = NULL;
+
     cbm_store_close(watch_store);
-    cbm_mcp_server_free(g_server);
     cbm_config_close(runtime_config);

-    g_watcher = NULL;
-    g_server = NULL;
-
     return rc;
 }
diff --git a/src/mcp/mcp.c b/src/mcp/mcp.c
index 3e79140..ee89eb6 100644
--- a/src/mcp/mcp.c
+++ b/src/mcp/mcp.c
@@ -2389,7 +2389,13 @@ int cbm_mcp_server_run(cbm_mcp_server_t *srv, FILE *in, FILE *out) {
             continue;
         }
 #else
-        int has_buffered = (in->_IO_read_ptr < in->_IO_read_end);  // glibc-specific
+#if defined(__GLIBC__)
+        int has_buffered = (in->_IO_read_ptr < in->_IO_read_end);
+#elif defined(__APPLE__) || defined(__FreeBSD__) || defined(__NetBSD__) || defined(__OpenBSD__)
+        int has_buffered = (in->_r > 0);
+#else
+        int has_buffered = 0;  /* conservative: always poll */
+#endif
         if (!has_buffered) {
             struct pollfd pfd = {.fd = fd, .events = POLLIN};
             int pr = poll(&pfd, 1, STORE_IDLE_TIMEOUT_S * 1000);
diff --git a/src/pipeline/pass_githistory.c b/src/pipeline/pass_githistory.c
index 7a03504..5e2e94f 100644
--- a/src/pipeline/pass_githistory.c
+++ b/src/pipeline/pass_githistory.c
@@ -168,7 +168,10 @@ static int parse_git_log(const char *repo_path, commit_t **out, int *out_count)

         /* Diff parent_tree <E2><86><92> tree to find changed files */
         git_diff *diff = NULL;
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wmissing-field-initializers"
         git_diff_options diff_opts = GIT_DIFF_OPTIONS_INIT;
+#pragma GCC diagnostic pop
         if (git_diff_tree_to_tree(&diff, repo, parent_tree, tree, &diff_opts) == 0) {
             commit_t current = {0};

@trollkotze
Copy link
Author

@cristicalin Okay. Thanks for the feedback. I can't follow up on that, because I don't understand it.

If you feel confident that that is all good, feel free to make your own PR with that fix on top of mine. I don't take ownership of anything. It's all AI generated anyway.

@cristicalin
Copy link

same here, the fix is generated by claude, let's let @DeusData decide if they want to incorporate the fix or not

@trollkotze
Copy link
Author

@cristicalin Oh, I just noticed, I referenced the wrong issue here. I meant it was related to #78 (problem with OpenCode), not #77 (problem with Claude Code). But if it's related to both, and maybe helps with both, then fine, lol. (Gonna fix the references in earlier messages)

I'll leave this for more knowledgeable people to sort out now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenCode compatibility issues: local MCP handshake times out and ingest_traces schema is rejected

2 participants