Skip to content

fix: preserve BrainBar MCP initialize handshake under backpressure#247

Merged
EtanHey merged 1 commit intomainfrom
fix/brainbar-mcp-handshake-reliability
Apr 17, 2026
Merged

fix: preserve BrainBar MCP initialize handshake under backpressure#247
EtanHey merged 1 commit intomainfrom
fix/brainbar-mcp-handshake-reliability

Conversation

@EtanHey
Copy link
Copy Markdown
Owner

@EtanHey EtanHey commented Apr 17, 2026

Summary

  • keep BrainBar socket writes non-blocking so a briefly backpressured client can still complete the MCP initialize handshake
  • add a socket integration reproducer for the dropped-handshake case and preserve buffered Content-Length frames between test reads
  • fix the stale EntityCard.Relation parity fixture so the BrainBar test target compiles on main

Test plan

  • swift test --package-path brain-bar --filter 'SocketIntegrationTests/testInitializeHandshakeSurvivesBriefBackpressureBurst|SocketIntegrationTests/testServerDisconnectsStalledClient|SocketIntegrationTests/testStdioAdapterBridgesInitializeAndSubscribe'
  • swift test --package-path brain-bar --filter 'SocketIntegrationTests|MCPFramingTests|MCPRouterTests|TextFormatterParityTests' (socket/MCP framing coverage passed; unrelated baseline failures remain in MCPRouterTests and TextFormatterParityTests on current main expectations)
  • bash brain-bar/build-app.sh (build and bundle completed; LaunchAgent bootstrap failed locally with Bootstrap failed: 5: Input/output error)

Note

Fix BrainBar MCP initialize handshake to survive backpressure by replacing synchronous writes with async chunked flush

  • Replaces the synchronous write loop in BrainBarServer.sendResponse with an async, non-blocking queue: outbound data is enqueued via enqueueWrite and flushed in 4 KB chunks by flushPendingWrites.
  • On EAGAIN/EWOULDBLOCK, the server schedules a 2 ms retry instead of busy-waiting; clients are only disconnected after a 250 ms stall with no write progress.
  • Database markDelivered calls are deferred until the full payload has been written to the socket via an onDelivered callback, preventing premature delivery acknowledgment.
  • Adds a socket integration test (testInitializeHandshakeSurvivesBriefBackpressureBurst) that constrains the client receive buffer to trigger backpressure and verifies the initialize response and subsequent requests succeed.
  • Behavioral Change: responses are no longer delivered synchronously from the caller's perspective; delivery callbacks fire asynchronously after the socket write completes.
📊 Macroscope summarized ddbd400. 3 files reviewed, 1 issue evaluated, 0 issues filtered, 1 comment posted

🗂️ Filtered Issues

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 17, 2026

Warning

Rate limit exceeded

@EtanHey has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 21 minutes and 46 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 21 minutes and 46 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8e164f4f-f2e3-4c85-8d9a-9e6810831e20

📥 Commits

Reviewing files that changed from the base of the PR and between 05c878c and ddbd400.

📒 Files selected for processing (3)
  • brain-bar/Sources/BrainBar/BrainBarServer.swift
  • brain-bar/Tests/BrainBarTests/SocketIntegrationTests.swift
  • brain-bar/Tests/BrainBarTests/TextFormatterParityTests.swift
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/brainbar-mcp-handshake-reliability

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@EtanHey
Copy link
Copy Markdown
Owner Author

EtanHey commented Apr 17, 2026

Follow-up handshake fix is pushed on ddbd400. @codex review @cursor @BugBot re-review @coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 17, 2026

@EtanHey Sure! I'll kick off a fresh review of the PR with the latest changes on commit ddbd4000.


🧠 Learnings used
Learnt from: CR
Repo: EtanHey/brainlayer PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-14T02:20:54.656Z
Learning: Request codex review, cursor review, and bugbot review for BrainLayer PRs

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 87
File: brain-bar/Sources/BrainBar/BrainBarServer.swift:118-129
Timestamp: 2026-03-18T00:12:15.607Z
Learning: In `brain-bar/Sources/BrainBar/MCPRouter.swift` (Swift, BrainBar daemon), the socket-before-DB startup pattern means the Unix socket binds immediately (~1ms) while the database may take several seconds to open on cold start (8GB file). Any tool handler that accesses `database` MUST throw an explicit error (e.g., `ToolError.noDatabase`) when `database` is nil — never return empty or default results (e.g., `guard let db else { return "[]" }` is forbidden). The false-success pattern hides startup timing issues from MCP clients. Flag any `guard let db = database else { return ... }` patterns that silently return defaults instead of throwing.

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-03-17T01:04:11.749Z
Learning: The socket path `/tmp/brainbar.sock` is intentional for the BrainBar Swift daemon (brain-bar/) and must NOT be changed to `/tmp/brainlayer.sock`. BrainBar is a new daemon that coexists with the existing Python `brainlayer-mcp` (which uses `/tmp/brainlayer.sock`) during the migration period. The different paths avoid conflicts and allow A/B testing. Once BrainBar is proven stable, the Python server will be retired and `.mcp.json` will point to `/tmp/brainbar.sock` via socat.

Learnt from: EtanHey
Repo: EtanHey/brainlayer PR: 0
File: :0-0
Timestamp: 2026-03-17T01:04:22.497Z
Learning: In BrainLayer, the BrainBar daemon uses the socket path `/tmp/brainbar.sock` (NOT `/tmp/brainlayer.sock`). BrainBar is a new native Swift daemon designed to coexist with the existing Python `brainlayer-mcp` server during the migration period. Different socket paths avoid conflicts and enable A/B testing. Once BrainBar is proven stable, the Python server will be retired.
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Comment on lines 718 to 746
private func readMCPMessage(fd: Int32, timeout: TimeInterval = 5.0) throws -> [String: Any] {
var buffer = Data()
var buffer = bufferedMessagesByFD[fd] ?? Data()
var readBuf = [UInt8](repeating: 0, count: 65536)
let deadline = Date().addingTimeInterval(timeout)

while Date() < deadline {
if let message = try decodeBufferedMCPMessage(fd: fd, buffer: &buffer) {
return message
}

let n = read(fd, &readBuf, readBuf.count)
if n > 0 {
buffer.append(contentsOf: readBuf[0..<n])
// Try to parse Content-Length framed response
if let headerEnd = buffer.range(of: Data("\r\n\r\n".utf8)) {
let headerStr = String(data: buffer[buffer.startIndex..<headerEnd.lowerBound], encoding: .utf8) ?? ""
if let clLine = headerStr.split(separator: "\r\n").first(where: { $0.hasPrefix("Content-Length:") }) {
let cl = Int(clLine.split(separator: ":")[1].trimmingCharacters(in: .whitespaces)) ?? 0
let bodyStart = headerEnd.upperBound
if buffer.count >= bodyStart + cl {
let bodyData = buffer[bodyStart..<(bodyStart + cl)]
return try JSONSerialization.jsonObject(with: bodyData) as? [String: Any] ?? [:]
}
}
if let message = try decodeBufferedMCPMessage(fd: fd, buffer: &buffer) {
return message
}
} else if n == 0 {
bufferedMessagesByFD.removeValue(forKey: fd)
break // EOF
} else if errno != EAGAIN && errno != EINTR && errno != EWOULDBLOCK {
bufferedMessagesByFD.removeValue(forKey: fd)
break
}
Thread.sleep(forTimeInterval: 0.01)
}

bufferedMessagesByFD[fd] = buffer
throw NSError(domain: "test", code: 4, userInfo: [NSLocalizedDescriptionKey: "Timeout reading response"])
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low BrainBarTests/SocketIntegrationTests.swift:718

After the readMCPMessage function breaks from the while loop on EOF (line 736) or error (line 739), it removes the fd's buffer from bufferedMessagesByFD. However, line 744 then unconditionally re-adds buffer to the dictionary, restoring the stale data that was just cleaned up. If a subsequent test reuses the same fd number, this stale data will be prepended to its first read, potentially causing message parsing failures or cross-test data contamination. Consider moving the save operation so it only executes on timeout, not on EOF/error.

         } else if n == 0 {
             bufferedMessagesByFD.removeValue(forKey: fd)
             break // EOF
         } else if errno != EAGAIN && errno != EINTR && errno != EWOULDBLOCK {
             bufferedMessagesByFD.removeValue(forKey: fd)
             break
         }
         Thread.sleep(forTimeInterval: 0.01)
     }
 
-    bufferedMessagesByFD[fd] = buffer
-    throw NSError(domain: "test", code: 4, userInfo: [NSLocalizedDescriptionKey: "Timeout reading response"])
+    throw NSError(domain: "test", code: 4, userInfo: [NSLocalizedDescriptionKey: "Timeout reading response"])
 }
 
 private func decodeBufferedMCPMessage
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file brain-bar/Tests/BrainBarTests/SocketIntegrationTests.swift around lines 718-746:

After the `readMCPMessage` function breaks from the while loop on EOF (line 736) or error (line 739), it removes the fd's buffer from `bufferedMessagesByFD`. However, line 744 then unconditionally re-adds `buffer` to the dictionary, restoring the stale data that was just cleaned up. If a subsequent test reuses the same fd number, this stale data will be prepended to its first read, potentially causing message parsing failures or cross-test data contamination. Consider moving the save operation so it only executes on timeout, not on EOF/error.

Evidence trail:
brain-bar/Tests/BrainBarTests/SocketIntegrationTests.swift lines 710-755 at REVIEWED_COMMIT. Line 736 removes buffer on EOF (`bufferedMessagesByFD.removeValue(forKey: fd)`), line 739 removes it on error with same call, both followed by `break`. Line 744 (`bufferedMessagesByFD[fd] = buffer`) executes after any loop exit, unconditionally re-adding the buffer that was just cleaned up.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ddbd4000aa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +363 to +372
guard var state = clients[fd] else { return false }
state.pendingWrites.append(
PendingWrite(
data: data,
totalWritten: 0,
lastProgressAt: DispatchTime.now().uptimeNanoseconds,
onDelivered: onDelivered
)
)
clients[fd] = state
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prevent unbounded pending-write growth for slow readers

This change queues every outbound frame in pendingWrites but never caps queue length/bytes, and the server continues processing incoming requests while writes are backpressured. In a slow-reader scenario (or a client that periodically drains just enough to avoid the 250ms stall timeout), enqueueWrite keeps appending responses indefinitely, so memory can grow without bound and eventually destabilize/kill BrainBar. Please add per-client queue limits (or stop reading/handling new requests once a write backlog threshold is crossed) so backpressured clients cannot cause unbounded buffering.

Useful? React with 👍 / 👎.

@EtanHey EtanHey merged commit 58e0948 into main Apr 17, 2026
7 checks passed
@EtanHey EtanHey deleted the fix/brainbar-mcp-handshake-reliability branch April 17, 2026 08:11
EtanHey added a commit that referenced this pull request Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant