fix poll() / getline() buffered IO desync#109
fix poll() / getline() buffered IO desync#109trollkotze wants to merge 2 commits intoDeusData:mainfrom
Conversation
|
@trollkotze I tried your fix on a mac where I have issue #77 but the fix is linux specific and breaks the build on macOS: |
Thanks. But perhaps the suggestion at the end would be better. But that's beyond my paygrade.
Here more complete explanation from the AI:
|
|
the issue according to claude itself is in the shutdown logic, I tested this and it works now with claude code 2.1.81 on top of the original fix you pushed: diff --git a/src/main.c b/src/main.c
index 79618fa..d67bf69 100644
--- a/src/main.c
+++ b/src/main.c
@@ -280,17 +280,21 @@ int main(int argc, char **argv) {
g_http_server = NULL;
}
+ /* Join auto-index / update threads BEFORE freeing watcher or store,
+ * since the auto-index thread may call cbm_watcher_watch() on
+ * srv->watcher after pipeline completion. */
+ cbm_mcp_server_free(g_server);
+ g_server = NULL;
+
if (watcher_started) {
cbm_watcher_stop(g_watcher);
cbm_thread_join(&watcher_tid);
}
cbm_watcher_free(g_watcher);
+ g_watcher = NULL;
+
cbm_store_close(watch_store);
- cbm_mcp_server_free(g_server);
cbm_config_close(runtime_config);
- g_watcher = NULL;
- g_server = NULL;
-
return rc;
}
diff --git a/src/mcp/mcp.c b/src/mcp/mcp.c
index 3e79140..ee89eb6 100644
--- a/src/mcp/mcp.c
+++ b/src/mcp/mcp.c
@@ -2389,7 +2389,13 @@ int cbm_mcp_server_run(cbm_mcp_server_t *srv, FILE *in, FILE *out) {
continue;
}
#else
- int has_buffered = (in->_IO_read_ptr < in->_IO_read_end); // glibc-specific
+#if defined(__GLIBC__)
+ int has_buffered = (in->_IO_read_ptr < in->_IO_read_end);
+#elif defined(__APPLE__) || defined(__FreeBSD__) || defined(__NetBSD__) || defined(__OpenBSD__)
+ int has_buffered = (in->_r > 0);
+#else
+ int has_buffered = 0; /* conservative: always poll */
+#endif
if (!has_buffered) {
struct pollfd pfd = {.fd = fd, .events = POLLIN};
int pr = poll(&pfd, 1, STORE_IDLE_TIMEOUT_S * 1000);
diff --git a/src/pipeline/pass_githistory.c b/src/pipeline/pass_githistory.c
index 7a03504..5e2e94f 100644
--- a/src/pipeline/pass_githistory.c
+++ b/src/pipeline/pass_githistory.c
@@ -168,7 +168,10 @@ static int parse_git_log(const char *repo_path, commit_t **out, int *out_count)
/* Diff parent_tree <E2><86><92> tree to find changed files */
git_diff *diff = NULL;
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wmissing-field-initializers"
git_diff_options diff_opts = GIT_DIFF_OPTIONS_INIT;
+#pragma GCC diagnostic pop
if (git_diff_tree_to_tree(&diff, repo, parent_tree, tree, &diff_opts) == 0) {
commit_t current = {0}; |
|
@cristicalin Okay. Thanks for the feedback. I can't follow up on that, because I don't understand it. If you feel confident that that is all good, feel free to make your own PR with that fix on top of mine. I don't take ownership of anything. It's all AI generated anyway. |
|
same here, the fix is generated by claude, let's let @DeusData decide if they want to incorporate the fix or not |
|
@cristicalin Oh, I just noticed, I referenced the wrong issue here. I meant it was related to #78 (problem with OpenCode), not #77 (problem with Claude Code). But if it's related to both, and maybe helps with both, then fine, lol. (Gonna fix the references in earlier messages) I'll leave this for more knowledgeable people to sort out now. |
fixes #78 (at least for me, on Linux Mint 21.2, glibc version 2.35)
(perhaps also relevant to #77, which I referenced here by accident first, see further down in discussion for that)
PR description generated by AI (Claude Opus 4.6), with some human commentary at the bottom.
Problem
Some MCP clients (OpenCode specifically) that send notifications/initialized and tools/list in rapid succession after receiving the initialize response (as the @modelcontextprotocol/sdk does by default) hit a 60-second stall. The server never responds to tools/list until the STORE_IDLE_TIMEOUT_S poll timeout expires, causing the client to report "Failed to get tools".
Root cause
cbm_mcp_server_run() mixes poll() on the raw fd with getline() on the buffered FILE*. When getline() reads the initialize request, libc may read ahead and pull subsequent messages into its internal userspace buffer. On the next iteration, poll() checks the kernel pipe buffer — which is now empty — and blocks for 60 seconds, unaware that getline() already has data ready to return.
Fix
Before calling poll(), check whether the FILE* has buffered data available. If it does, skip poll() and go straight to getline().
Confirmed with a bidirectional wire-logging proxy: tools/list response now arrives in <1ms instead of 60s. Tested with OpenCode (which uses the MCP SDK's StdioClientTransport). Claude Code and other JSONL clients are unaffected.
Human commentary (from me)
Related issue: #78, probably #77 as well
See issue comment on #78 for more detail on the initial error investigation and fix. (later amended here for mac)