fix: two tests flaky under parallel CI load (S27 + trace snapshot)#880
Conversation
Both pass locally but fail consistently in CI's heavy parallel run (9474 tests / 378 files) — the repo's "no flaky tests under resource contention" case. Neither is caused by any feature change; they fail identically on unrelated PRs (#854/#858/#863), blocking all of them. - `real-tool-simulation` S27: the progressive-suggestion dedup state is a module-global Set. The test's `beforeEach` reset used a dynamic `await import`, which under parallel CI can resolve to a different module instance than the tool's static import — so the real Set is never reset and accumulates `sql_analyze` from S25/S26 → S27 sees no suggestion. Fix: import `PostConnectSuggestions` statically (same instance the tools use); reset in S27 too. - `tracing-adversarial-snapshot` "shows 'running' status": waited a fixed 50ms for a debounced async snapshot write, too short under CI load → read a stale snapshot. Fix: poll the on-disk status until expected (timeout 4s) instead of a fixed sleep. Closes #879 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughReplaces a hardcoded sleep with a polling helper that reads the on-disk trace snapshot until the summary.status matches an expected value, converts a dynamic module import to a static ChangesTest Stability Improvements
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/opencode/test/altimate/tracing-adversarial-snapshot.test.ts`:
- Around line 44-58: The pollStatus function uses a non-null assertion on
tracer.getTracePath(), which can return undefined; replace that usage by reading
tracer.getTracePath() into a local variable (e.g., tracePath) before the loop
and add a defensive null/undefined check that throws a clear Error if no trace
path is available, then use tracePath (without !) inside fs.readFile; update the
function signature/body references to refer to tracePath to avoid the crash when
a FileExporter is absent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: ca31c6a6-459e-4f94-a39a-e29140ff60fd
📒 Files selected for processing (2)
packages/opencode/test/altimate/tracing-adversarial-snapshot.test.tspackages/opencode/test/session/real-tool-simulation.test.ts
| async function pollStatus(tracer: { getTracePath(): string | undefined }, expected: string, timeoutMs = 4000) { | ||
| const start = Date.now() | ||
| let last = "<none>" | ||
| while (Date.now() - start < timeoutMs) { | ||
| try { | ||
| const snap = JSON.parse(await fs.readFile(tracer.getTracePath()!, "utf-8")) as TraceFile | ||
| last = snap.summary.status | ||
| if (last === expected) return snap | ||
| } catch { | ||
| /* file mid-write or not yet created — keep polling */ | ||
| } | ||
| await new Promise((r) => setTimeout(r, 25)) | ||
| } | ||
| throw new Error(`timed out after ${timeoutMs}ms waiting for status '${expected}' (last seen '${last}')`) | ||
| } |
There was a problem hiding this comment.
Add defensive null check for getTracePath() return value.
Line 49 uses a non-null assertion (!) on tracer.getTracePath(), but the type signature correctly indicates it can return undefined (e.g., when using HttpExporter only, as shown in the test at lines 460-464). If pollStatus is called with a tracer lacking a FileExporter, it will crash with an unclear error message.
🛡️ Proposed fix: add explicit null check
async function pollStatus(tracer: { getTracePath(): string | undefined }, expected: string, timeoutMs = 4000) {
+ const tracePath = tracer.getTracePath()
+ if (!tracePath) {
+ throw new Error("getTracePath() returned undefined - ensure tracer has a FileExporter configured")
+ }
const start = Date.now()
let last = "<none>"
while (Date.now() - start < timeoutMs) {
try {
- const snap = JSON.parse(await fs.readFile(tracer.getTracePath()!, "utf-8")) as TraceFile
+ const snap = JSON.parse(await fs.readFile(tracePath, "utf-8")) as TraceFile
last = snap.summary.status
if (last === expected) return snap
} catch {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async function pollStatus(tracer: { getTracePath(): string | undefined }, expected: string, timeoutMs = 4000) { | |
| const start = Date.now() | |
| let last = "<none>" | |
| while (Date.now() - start < timeoutMs) { | |
| try { | |
| const snap = JSON.parse(await fs.readFile(tracer.getTracePath()!, "utf-8")) as TraceFile | |
| last = snap.summary.status | |
| if (last === expected) return snap | |
| } catch { | |
| /* file mid-write or not yet created — keep polling */ | |
| } | |
| await new Promise((r) => setTimeout(r, 25)) | |
| } | |
| throw new Error(`timed out after ${timeoutMs}ms waiting for status '${expected}' (last seen '${last}')`) | |
| } | |
| async function pollStatus(tracer: { getTracePath(): string | undefined }, expected: string, timeoutMs = 4000) { | |
| const tracePath = tracer.getTracePath() | |
| if (!tracePath) { | |
| throw new Error("getTracePath() returned undefined - ensure tracer has a FileExporter configured") | |
| } | |
| const start = Date.now() | |
| let last = "<none>" | |
| while (Date.now() - start < timeoutMs) { | |
| try { | |
| const snap = JSON.parse(await fs.readFile(tracePath, "utf-8")) as TraceFile | |
| last = snap.summary.status | |
| if (last === expected) return snap | |
| } catch { | |
| /* file mid-write or not yet created — keep polling */ | |
| } | |
| await new Promise((r) => setTimeout(r, 25)) | |
| } | |
| throw new Error(`timed out after ${timeoutMs}ms waiting for status '${expected}' (last seen '${last}')`) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/opencode/test/altimate/tracing-adversarial-snapshot.test.ts` around
lines 44 - 58, The pollStatus function uses a non-null assertion on
tracer.getTracePath(), which can return undefined; replace that usage by reading
tracer.getTracePath() into a local variable (e.g., tracePath) before the loop
and add a defensive null/undefined check that throws a clear Error if no trace
path is available, then use tracePath (without !) inside fs.readFile; update the
function signature/body references to refer to tracePath to avoid the crash when
a FileExporter is absent.
The "TypeScript" job runs all 9500+ tests in one parallel bun process. Under CPU contention a few slower tests (real fs/spawn/git-bootstrap) get starved and exceed the 30s per-test timeout NON-deterministically — different tests each run (observed: 32s and 51s timeouts). This blocks every PR with failures unrelated to the diff. 90s gives ~3x headroom over the worst observed, removing the flakiness without masking genuinely-hung tests. Part of #879. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
dev-punia-altimate
left a comment
There was a problem hiding this comment.
Multi-Persona Review — Verdict: ship
The PR safely implements metadata passthrough for opportunity detail endpoints with robust error handling, comprehensive test coverage, and alignment with Pydantic best practices. No critical or high-severity issues were found; all findings are low or medium and do not block deployment.
15/15 agents completed · 260s · 5 findings (0 critical, 0 high, 2 medium)
Medium
- [code-reviewer] json.loads() on None raises TypeError, but the code handles it with a try/except block that catches ValueError and TypeError — this is correct, but the comment implies it's only for malformed JSON, while it also handles None. →
app/service/opportunities.py:1606- 💡 Update comment to clarify that None and malformed JSON both degrade to None: 'Handle None or malformed JSON by degrading to None rather than 500-ing.'
- [tech-lead] JSON parsing and metadata assignment logic is implemented directly in the service layer without encapsulation in a dedicated utility or helper function, reducing reusability and testability. →
app/service/opportunities.py:1605- 💡 Extract the JSON parsing and metadata assignment into a private helper function in the service module (e.g., _parse_opportunity_metadata) to improve reuse, testability, and separation of concerns.
Low
- [tech-lead] The variable name 'raw' in get_opportunity_detail is ambiguous and does not clearly indicate it refers to the ClickHouse row data. →
app/service/opportunities.py:1603- 💡 Rename 'raw' to 'ch_row' or 'clickhouse_row' to improve clarity and align with project naming conventions for data sources.
- [cto] The OpportunityMetadata model uses ConfigDict(extra="allow") to forward unknown metadata keys, which is a sound pattern for extensibility. However, the frontend is only consuming agent_instructions, so future metadata keys may accumulate without clear ownership or validation.
- 💡 Consider adding a lightweight metadata schema registry or annotation system to document expected keys and their purposes, even if they're not enforced — this prevents unbounded metadata growth and improves maintainability.
- [cross-repo-impact] New field 'metadata' of type OpportunityMetadata added to OpportunityDetailResponse — frontend must handle this new nested object to render agent_instructions →
app/schemas/opportunities.py:260- 💡 Check altimate-frontend PR #2744 to ensure it properly consumes and renders metadata.agent_instructions; confirm no breaking changes if metadata is null
Multi-Persona Review · vllm:qwen3-next-80b (waves) + vllm-fallback (synth) ·
What does this PR do?
Fixes two tests that pass locally but fail consistently in CI's parallel run (9474 tests / 378 files) — the repo's "no flaky tests under resource contention" case. They fail identically on three unrelated PRs (#854, #858, #863), blocking all of them; neither is caused by any feature change.
real-tool-simulationS27 (sql_analyze parse error): the progressive-suggestion dedup state is a module-globalSet. ThebeforeEachreset used a dynamicawait import, which under parallel CI can resolve to a different module instance than the tool's static import — so the real Set is never reset and accumulatessql_analyzefrom S25/S26, leaving S27 with no suggestion. Fix: importPostConnectSuggestionsstatically (same instance the tools use); reset in S27 too.tracing-adversarial-snapshot(shows 'running' status): waited a fixed50msfor a debounced async snapshot write — too short under CI load → reads a stale snapshot. Fix: poll the on-disk status until expected (4s timeout) instead of a fixed sleep.Type of change
Issue for this PR
Closes #879
How did you verify your code works?
tsgotypecheck clean; marker check clean (test-only, no upstream-shared files).Checklist
Summary by cubic
Fixes two parallel-CI flaky tests by resetting
PostConnectSuggestionsvia static import (and in S27) and by polling trace snapshot status (4s timeout) instead of a 50ms sleep. Also raises Bun test timeout from 30s to 90s to prevent load-induced timeouts. Closes #879.Written for commit 79df91a. Summary will update on new commits.
Summary by CodeRabbit