Skip to content

test(synapse): SYN-12 Performance Benchmarks + E2E Testing#136

Merged
Pedrovaleriolopez merged 5 commits intomainfrom
test/syn-12-performance-e2e
Feb 12, 2026
Merged

test(synapse): SYN-12 Performance Benchmarks + E2E Testing#136
Pedrovaleriolopez merged 5 commits intomainfrom
test/syn-12-performance-e2e

Conversation

@Pedrovaleriolopez
Copy link
Contributor

@Pedrovaleriolopez Pedrovaleriolopez commented Feb 12, 2026

Summary

  • Add 53 E2E tests across 6 test suites validating the complete SYNAPSE context engine pipeline (hook → engine → 8 layers → XML output)
  • Add standalone pipeline benchmark with cold/warm modes, p50/p95/p99 percentiles, isolated formatter timing, and per-layer metrics
  • Fix hook async bug: engine.process() was not awaited in synapse-engine.js, causing result.xml to be undefined
  • Add SYNAPSE benchmark CI job (Node 18+20 matrix) to GitHub Actions workflow

E2E Test Coverage

Suite Tests Coverage
full-pipeline.e2e.test.js 12 Complete pipeline flow, XML sections, DEVMODE, consistency
bracket-scenarios.e2e.test.js 9 All 4 brackets (FRESH/MODERATE/DEPLETED/CRITICAL) + transitions
agent-scenarios.e2e.test.js 8 @dev, @qa, @devops, @architect, unknown, null, switch
hook-integration.e2e.test.js 9 stdin/stdout protocol, error scenarios, timeout
devmode-scenarios.e2e.test.js 7 DEVMODE on/off, per-call override, metrics structure
regression-guards.e2e.test.js 8 Performance hard limits, test count guard, metrics guard
Total 53 96.7% synapse statement coverage

Performance Results

All targets met with comfortable margin:

  • Pipeline p95: well under 100ms hard limit (target <70ms)
  • Individual layers p95: <20ms each (L0/L7 <10ms)
  • Startup p95: <10ms
  • Session I/O p95: <15ms
  • Formatter (isolated): <5ms

QA Gate

PASS (100/100) — All 9 ACs met. Zero regressions (5775 tests pass).

Test plan

  • All 53 E2E tests pass (npx jest tests/synapse/e2e/ --ci)
  • All 5775 existing tests pass (npm test) — zero regressions
  • Pipeline benchmark runs successfully (node tests/synapse/benchmarks/pipeline-benchmark.js)
  • Synapse coverage >85% (actual: 96.7%)
  • Hook integration tests validate async fix produces real XML
  • CI workflow validates correctly with synapse-benchmark job

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added a pipeline benchmarking tool with detailed per-layer and percentile reports.
  • Tests

    • Added extensive E2E suites for agents, context brackets, devmode, full pipeline, hook integration, and regression/performance guards.
    • New benchmark-driven performance validation (warm/cold modes, percentiles, targets).
  • Bug Fixes

    • Improved async handling to ensure processing completes before returning results.
    • Added graceful exit guards to avoid terminating test workers and timer cleanup to prevent test leaks.
  • Chores

    • CI updated to run E2E tests and benchmarks on pull requests.

…-12]

Add comprehensive E2E test suite (53 tests across 6 files) and performance
benchmark infrastructure for the SYNAPSE context engine pipeline.

- Pipeline benchmark with cold/warm modes, p50/p95/p99 percentiles, isolated
  formatter timing, and per-layer metrics (100+ iterations)
- E2E tests: full pipeline, bracket scenarios (FRESH/MODERATE/DEPLETED/CRITICAL),
  agent scenarios (@dev/@qa/@devops/@architect), hook integration (stdin/stdout
  protocol), DEVMODE on/off with per-call override, regression guards
- Fix hook async bug: add await to engine.process() in synapse-engine.js
- Add SYNAPSE benchmark CI job (Node 18+20 matrix, E2E + benchmark)
- All performance targets met: pipeline p95 <100ms, layers <20ms, startup <10ms
- 96.7% statement coverage on synapse modules (target 85%)
- 5775 existing tests pass with 0 regressions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added type: docs Documentation improvements core area: agents Agent system related area: workflows Workflow system related squad squad: etl squad: creator mcp docker-mcp type: test Test coverage and quality labels Feb 12, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 12, 2026

Walkthrough

Awaited SynapseEngine.process in the hook and added safeExit to avoid hard process.exit during tests. Added a CI benchmark job and a pipeline benchmark script. Introduced multiple Synapse E2E test suites and regression guards. Minor timer cleanup added in an integration test.

Changes

Cohort / File(s) Summary
Hook Async & Exit Guard
.claude/hooks/synapse-engine.js
Changed engine.process(prompt, session) to be awaited; added safeExit(code) and replaced direct process.exit(...)/timeout exits to avoid terminating Jest worker processes.
CI Workflow
.github/workflows/ci.yml
Added synapse-benchmark job running E2E tests and pipeline benchmark on Node 18/20 with caching, npm ci, allow-failure steps, and a 10-minute timeout.
Benchmark Tooling
tests/synapse/benchmarks/pipeline-benchmark.js
New benchmark script exporting runBenchmark, calcStats, percentile, and TARGETS; supports warm/cold modes, iterations, JSON output, and p50/p95/p99 reporting for pipeline, per-layer, startup, session I/O, and formatter.
E2E Test Suites
tests/synapse/e2e/agent-scenarios.e2e.test.js, tests/synapse/e2e/bracket-scenarios.e2e.test.js, tests/synapse/e2e/devmode-scenarios.e2e.test.js, tests/synapse/e2e/full-pipeline.e2e.test.js, tests/synapse/e2e/hook-integration.e2e.test.js, tests/synapse/e2e/regression-guards.e2e.test.js
Added multiple end-to-end suites covering agent activation/metrics, context bracket transitions, DEVMODE behavior and metrics, full-pipeline XML/metrics/stability, hook stdin/stdout/exit contract, and performance regression guards with percentile thresholds and metrics validation.
Integration Test Timer Cleanup
tests/integration/pipeline-memory-integration.test.js
Wrapped mocked memory loader timeout into a slowTimer variable and added cleanup to clear the timer after tests to prevent Jest worker leaks.
Test Env Handling
tests/synapse/hook-entry.test.js
Temporarily clears/restores JEST_WORKER_ID within run() tests so safeExit() can call process.exit() during those test cases.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary changes: adding comprehensive E2E tests and performance benchmarking for the SYNAPSE engine, which aligns with the changeset's 53 E2E tests and new benchmark infrastructure.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch test/syn-12-performance-e2e

No actionable comments were generated in the recent review. 🎉


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 12, 2026

📊 Coverage Report

Coverage report not available

📈 Full coverage report available in Codecov


Generated by PR Automation (Story 6.1)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@tests/synapse/e2e/hook-integration.e2e.test.js`:
- Around line 100-115: The test 'hook output additionalContext is a string
conforming to expected format' is missing the pre-parse guards; before calling
JSON.parse(stdout) assert the hook ran successfully by checking the runHookSync
return values (e.g., verify exitCode === 0 and that stdout is truthy) — use the
same pattern as Test 1: call runHookSync(input), assert exitCode is 0 and stdout
is present, then safely parse stdout into result and continue extracting
result.hookSpecificOutput.additionalContext for the remaining expectations.

In `@tests/synapse/e2e/regression-guards.e2e.test.js`:
- Around line 77-83: The test currently measures startup only once
(startupDurations populated when i === 0) so computing p95 on startupDurations
is meaningless; fix by measuring startup across ITERATIONS instead: inside the
loop where ITERATIONS is used, instantiate a fresh SynapseEngine each iteration
(new SynapseEngine(SYNAPSE_PATH, { manifest, devmode: false })), record the
elapsed time into startupDurations, and then properly dispose/stop the engine
before the next iteration to avoid resource leakage; ensure you remove the i ===
0 guard and handle warm/cold mode semantics consistently (or alternatively, if
you want to keep a single warm startup sample, rename the test and variable from
p95 to single-sample/startupDuration to reflect that behavior).
- Around line 120-128: The test "pipeline p95 should be within target (<70ms) or
warn" is inconsistent: it warns at >=70ms but asserts p95 < 100ms; either
enforce the 70ms target or make the name/assertion match the 100ms hard limit.
Fix by one of: (A) change the assertion for p95 (computed from pipelineDurations
using percentile) to expect(p95).toBeLessThan(70) so the test fails if the
target is missed (keep the console.warn as informational), or (B) if the intent
is only to warn at 70ms and enforce the 100ms hard limit, rename the test to
indicate the hard limit (e.g., mention 100ms) or remove the misleading "(<70ms)"
from the test title; alternatively, if you want no assertion at 70ms, replace
the hard assertion with only the console.warn and no expect. Ensure you update
the test title and the expect call referencing p95 accordingly.
🧹 Nitpick comments (9)
.github/workflows/ci.yml (1)

428-434: Both E2E and benchmark steps silently swallow failures — consider at least surfacing results.

With continue-on-error: true on both steps, any test failure or benchmark regression will be invisible in the PR check status. This is acceptable for benchmarks, but the E2E tests (line 429) are correctness checks, not just informational. If a test fails, the PR author may not notice.

Consider removing continue-on-error: true from the E2E test step, or at minimum adding a summary step that logs warnings:

Suggested approach
       - name: Run SYNAPSE E2E tests
         run: npx jest tests/synapse/e2e/ --ci --verbose
-        continue-on-error: true
 
       - name: Run SYNAPSE pipeline benchmark
         run: node tests/synapse/benchmarks/pipeline-benchmark.js --iterations=50 --json
         continue-on-error: true

If the intent is to keep both informational because .synapse/ may not exist in CI, that's reasonable — but worth a comment explaining why.

tests/synapse/e2e/full-pipeline.e2e.test.js (1)

20-21: Unconditional require calls execute even when .synapse/ is missing.

Lines 20-21 require SynapseEngine and parseManifest outside the describeIfSynapse guard. If the .aios-core/ modules are always present in the repo, this is fine. But if they could be absent (e.g., in a partial checkout), the test file would throw at load time rather than gracefully skipping.

Compare with bracket-scenarios.e2e.test.js which checks fs.existsSync(ENGINE_PATH) and defers the require into beforeAll. Consider moving these requires inside the describeIfSynapse block for consistency.

tests/synapse/e2e/bracket-scenarios.e2e.test.js (1)

137-148: Tautological assertion — the if guard makes the expect inside it always pass.

Lines 145-147 check if (result.xml.includes('[MEMORY HINTS]')) and then assert expect(result.xml).toMatch(/\[MEMORY HINTS\]/). The assertion will trivially pass since the condition already confirmed the match. If the intent is to validate the format of memory hints content, consider asserting on the content within the section instead:

Suggested improvement
       if (result.xml.includes('[MEMORY HINTS]')) {
-        expect(result.xml).toMatch(/\[MEMORY HINTS\]/);
+        // Validate memory hints have actual content after the header
+        expect(result.xml).toMatch(/\[MEMORY HINTS\][\s\S]+\w+/);
       }
tests/synapse/benchmarks/pipeline-benchmark.js (3)

158-175: Engine instance management in warm mode is unnecessarily convoluted.

The logic across lines 158-175 creates engine as null for warm-mode iterations after the first, then uses engineToUse = engine || cachedEngine with a further fallback (engineToUse || engine) on line 186. This works but is hard to follow.

Clearer approach
-  let cachedEngine = null;
+  let cachedEngine = options.cold
+    ? null
+    : new SynapseEngine(SYNAPSE_PATH, { manifest, devmode: false });
+
+  // Measure startup for the initial engine creation
+  if (!options.cold) {
+    const s0 = performance.now();
+    cachedEngine = new SynapseEngine(SYNAPSE_PATH, { manifest, devmode: false });
+    startupDurations.push(performance.now() - s0);
+  }

   for (let i = 0; i < iterations; i++) {
-    // Startup measurement
-    const startupStart = performance.now();
-    const engine = options.cold
-      ? new SynapseEngine(SYNAPSE_PATH, { manifest, devmode: false })
-      : (i === 0 ? new SynapseEngine(SYNAPSE_PATH, { manifest, devmode: false }) : null);
-    const startupEnd = performance.now();
-
-    if (options.cold || i === 0) {
-      startupDurations.push(startupEnd - startupStart);
-    }
-
-    const engineToUse = options.cold ? engine : (engine || cachedEngine);
-    if (i === 0 && !options.cold) {
-      cachedEngine = engine;
-    }
+    let engineToUse;
+    if (options.cold) {
+      const s0 = performance.now();
+      engineToUse = new SynapseEngine(SYNAPSE_PATH, { manifest, devmode: false });
+      startupDurations.push(performance.now() - s0);
+    } else {
+      engineToUse = cachedEngine;
+    }

258-261: Formatter report uses L0 layer targets instead of a dedicated formatter target.

Lines 260-261 compare the formatter's p95 against TARGETS.layerL0 (5ms target / 10ms hard limit). The formatter is not a "layer" — it's the output serialization step. Consider adding an explicit formatter entry to TARGETS for clarity and correctness, especially if formatter performance expectations differ from L0.

Suggested addition
 const TARGETS = {
   pipeline: { target: 70, hardLimit: 100 },
   layer: { target: 15, hardLimit: 20 },
   layerL0: { target: 5, hardLimit: 10 },
   layerL7: { target: 5, hardLimit: 10 },
   startup: { target: 5, hardLimit: 10 },
   sessionIO: { target: 10, hardLimit: 15 },
+  formatter: { target: 5, hardLimit: 10 },
 };

Then update lines 260-261 to use TARGETS.formatter.


42-46: percentile function is duplicated in regression-guards.e2e.test.js.

The same percentile implementation exists in both this file and regression-guards.e2e.test.js (lines 42-46). Since this benchmark module already exports percentile and calcStats, the regression guards test could import from here instead of re-implementing.

Also applies to: 72-76

tests/synapse/e2e/hook-integration.e2e.test.js (3)

36-54: Consider capturing stderr for test debugging.

The helper is well-structured with proper error handling. One small observation: stderr is piped (Line 44) but never captured or returned. When tests fail in CI, having stderr available can save debugging time.

💡 Optional: capture stderr for diagnostics
   } catch (err) {
     // execSync throws on non-zero exit OR timeout
     return {
       stdout: (err.stdout || '').toString(),
+      stderr: (err.stderr || '').toString(),
       exitCode: err.status != null ? err.status : 1,
     };
   }

183-204: Test silently passes when result.xml is empty — consider logging or failing.

The guard on Line 200 (if (result.xml.length > 0)) means if the engine ever stops producing XML (a regression), this test still passes. That undermines its purpose of verifying CONSTITUTION content. Consider either:

  • Asserting result.xml.length > 0 so the test catches the regression, or
  • At minimum, logging when the branch is skipped so CI output makes it visible.
♻️ Proposed: assert non-empty XML
     expect(typeof result.xml).toBe('string');
-
-    if (result.xml.length > 0) {
-      expect(result.xml).toContain('<synapse-rules>');
-      expect(result.xml).toMatch(/CONSTITUTION/i);
-    }
+    expect(result.xml.length).toBeGreaterThan(0);
+    expect(result.xml).toContain('<synapse-rules>');
+    expect(result.xml).toMatch(/CONSTITUTION/i);

221-233: Re-stringify comparison is fragile and assumes minified JSON output.

JSON.stringify(result) produces minified JSON. If the hook ever outputs pretty-printed JSON (e.g., JSON.stringify(obj, null, 2)), this assertion will fail even though the output is perfectly valid single-object JSON. Also, same as Test 2, there's no guard before JSON.parse(stdout).

A more resilient trailing-data check:

♻️ Proposed: more robust trailing-data detection
     const input = buildInput();
     const { stdout } = runHookSync(input);

+    expect(stdout).toBeTruthy();
+
     // Parse should succeed without leftover characters
     const result = JSON.parse(stdout);
     expect(typeof result).toBe('object');
     expect(result).not.toBeNull();

-    // Re-stringify and compare length to detect trailing data
-    const reparsed = JSON.stringify(result);
-    expect(stdout.trim()).toBe(reparsed);
+    // Verify no trailing data after the JSON object
+    // Trim and re-parse to confirm single object
+    const trimmed = stdout.trim();
+    expect(() => JSON.parse(trimmed)).not.toThrow();
+    // Ensure nothing meaningful follows the JSON by checking
+    // that trimmed content re-parses to the same structure
+    expect(JSON.parse(trimmed)).toEqual(result);

Note: If you specifically want to detect {"a":1}{"b":2} (concatenated objects), JSON.parse will silently ignore the second object in both approaches. A character-level scan or a streaming JSON parser would be needed for that, but it's likely overkill for this test.

Comment on lines +77 to +83
for (let i = 0; i < ITERATIONS; i++) {
// Startup measurement (first iteration only for warm mode)
if (i === 0) {
const s0 = performance.now();
engine = new SynapseEngine(SYNAPSE_PATH, { manifest, devmode: false });
startupDurations.push(performance.now() - s0);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Startup p95 is computed from a single sample — statistically meaningless.

In the measurement loop (lines 77-83), startup is only measured once (if (i === 0)), yielding exactly one data point in startupDurations. The test on line 158-161 then computes "p95" of a single-element array, which is just that value. This doesn't provide any percentile-based confidence.

Either measure startup across multiple iterations (which requires creating a new engine per iteration — effectively cold mode), or rename the test to reflect what it actually checks:

Option A: Honest naming
-  test('startup p95 < 10ms (hard limit)', () => {
+  test('startup (warm, single measurement) < 10ms', () => {
     const sorted = [...startupDurations].sort((a, b) => a - b);
-    const p95 = percentile(sorted, 95);
-    expect(p95).toBeLessThan(10);
+    expect(sorted[0]).toBeLessThan(10);
   });

Also applies to: 155-162

🤖 Prompt for AI Agents
In `@tests/synapse/e2e/regression-guards.e2e.test.js` around lines 77 - 83, The
test currently measures startup only once (startupDurations populated when i ===
0) so computing p95 on startupDurations is meaningless; fix by measuring startup
across ITERATIONS instead: inside the loop where ITERATIONS is used, instantiate
a fresh SynapseEngine each iteration (new SynapseEngine(SYNAPSE_PATH, {
manifest, devmode: false })), record the elapsed time into startupDurations, and
then properly dispose/stop the engine before the next iteration to avoid
resource leakage; ensure you remove the i === 0 guard and handle warm/cold mode
semantics consistently (or alternatively, if you want to keep a single warm
startup sample, rename the test and variable from p95 to
single-sample/startupDuration to reflect that behavior).

…-12]

- Add exitCode/stdout guards before JSON.parse in hook test 2
- Measure startup across 50 iterations instead of 1 for statistical significance
- Enforce 70ms target assertion instead of 100ms hard limit in target test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@tests/synapse/e2e/hook-integration.e2e.test.js`:
- Around line 224-236: Add the same exit-code and stdout guards used in Tests 1
and 2 before calling JSON.parse in the "hook output is a single well-formed JSON
object" test: assert that the result of runHookSync(input).exitCode === 0 and
that stdout is truthy (or non-empty) before parsing, so JSON.parse(stdout) only
runs when the hook succeeded; update references around runHookSync, stdout and
buildInput to mirror the guard pattern used in the earlier tests.

In `@tests/synapse/e2e/regression-guards.e2e.test.js`:
- Around line 178-184: The test name "total synapse E2E tests >= 30 (coverage
guard)" doesn't match the assertion (it checks testFiles.length >= 5), so either
rename the test to reflect you're asserting file count (change the string passed
to the test(...) call) or change the assertion to count individual tests across
files: read each file from testFiles, sum occurrences of test(...) and it(...)
(or parse AST) into a totalTests variable and assert
expect(totalTests).toBeGreaterThanOrEqual(30); update references to testFiles
and e2eDir accordingly and keep the comment consistent with the chosen behavior.
🧹 Nitpick comments (3)
tests/synapse/e2e/hook-integration.e2e.test.js (1)

36-54: runHookSync helper is well structured.

Good use of try/catch to normalize execSync behavior into a predictable { stdout, exitCode } shape. The err.status != null check correctly covers both null and undefined.

One minor observation: stderr is piped (line 44) but never surfaced. If a test unexpectedly fails, having stderr available in the return value could speed up debugging.

💡 Optionally expose stderr for debugging
 function runHookSync(stdinData, opts = {}) {
   const timeout = opts.timeout || 10000;
   try {
     const stdout = execSync(`node "${HOOK_PATH}"`, {
       input: stdinData,
       encoding: 'utf8',
       timeout,
       windowsHide: true,
       stdio: ['pipe', 'pipe', 'pipe'],
     });
-    return { stdout: stdout || '', exitCode: 0 };
+    return { stdout: stdout || '', stderr: '', exitCode: 0 };
   } catch (err) {
     return {
       stdout: (err.stdout || '').toString(),
+      stderr: (err.stderr || '').toString(),
       exitCode: err.status != null ? err.status : 1,
     };
   }
 }
tests/synapse/e2e/regression-guards.e2e.test.js (2)

26-34: Unconditional require() calls will crash before describe.skip can activate.

Lines 26–34 import engine, domain-loader, and session-manager unconditionally. If any of those files are missing (e.g., a partial checkout or a CI matrix variant without .aios-core), the test runner crashes with MODULE_NOT_FOUND before describeIfSynapse on line 48 can skip the suite. The defensive gate at line 24 becomes moot.

Consider deferring these imports into beforeAll (where you already have the conditional describeIfSynapse wrapper), or wrapping them in a try/catch at module level.

💡 Move requires inside the describe block
-const { SynapseEngine } = require(
-  path.join(PROJECT_ROOT, '.aios-core', 'core', 'synapse', 'engine.js')
-);
-const { parseManifest } = require(
-  path.join(PROJECT_ROOT, '.aios-core', 'core', 'synapse', 'domain', 'domain-loader.js')
-);
-const { loadSession } = require(
-  path.join(PROJECT_ROOT, '.aios-core', 'core', 'synapse', 'session', 'session-manager.js')
-);
+let SynapseEngine, parseManifest, loadSession;

 const describeIfSynapse = synapseExists ? describe : describe.skip;

 describeIfSynapse('SYNAPSE E2E: Regression Guards', () => {
   // ...
   beforeAll(async () => {
+    ({ SynapseEngine } = require(
+      path.join(PROJECT_ROOT, '.aios-core', 'core', 'synapse', 'engine.js')
+    ));
+    ({ parseManifest } = require(
+      path.join(PROJECT_ROOT, '.aios-core', 'core', 'synapse', 'domain', 'domain-loader.js')
+    ));
+    ({ loadSession } = require(
+      path.join(PROJECT_ROOT, '.aios-core', 'core', 'synapse', 'session', 'session-manager.js')
+    ));
     manifest = parseManifest(MANIFEST_PATH);

Also applies to: 48-48


135-155: Layer performance tests pass vacuously if layerDurations is empty.

If engine.process never populates result.metrics.per_layer (e.g., due to a regression in metrics reporting), both the "each layer p95" test (line 135) and the "edge layers" test (line 146) will pass with zero assertions — silently hiding a problem.

Consider adding a guard that at least one layer was measured:

💡 Add a non-empty assertion
   test('each layer p95 < 20ms (hard limit)', () => {
+    expect(Object.keys(layerDurations).length).toBeGreaterThan(0);
     for (const [name, durations] of Object.entries(layerDurations)) {
       const sorted = [...durations].sort((a, b) => a - b);
       const p95 = percentile(sorted, 95);
       expect(p95).toBeLessThan(20);
     }
   });

Pedrovaleriolopez and others added 3 commits February 12, 2026 13:10
- Add exitCode/stdout guards before JSON.parse in hook test 9
- Fix test name mismatch: "test files >= 5" instead of "tests >= 30"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-002)

- Add safeExit() guard checking JEST_WORKER_ID before process.exit()
- Track and cleanup abandoned timer in pipeline-memory timeout test
- Prevents CI failures in pipeline-memory-integration and template-engine tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Temporarily clear JEST_WORKER_ID in run() test so safeExit() calls
process.exit() as expected, then restore it in finally block.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Pedrovaleriolopez Pedrovaleriolopez merged commit b0f38ab into main Feb 12, 2026
25 checks passed
@Pedrovaleriolopez Pedrovaleriolopez deleted the test/syn-12-performance-e2e branch February 12, 2026 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: agents Agent system related area: workflows Workflow system related docker-mcp mcp squad: creator squad: etl squad type: docs Documentation improvements type: test Test coverage and quality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant