test: add comprehensive pipeline layer tests#3
test: add comprehensive pipeline layer tests#3ArturSkowronski wants to merge 1 commit intomasterfrom
Conversation
Demo scenario: meta/hollow-test-suite
🔍 VCR Code ReviewReviewed by Visdom Code Review |
🔍 VCR Review — Side by Side
Findings by LayerLayer 2 — AI Quick Scan
Layer 3 — AI Deep Review
Generated by VCR Demo in 26s |
| })); | ||
|
|
||
| vi.mock('../src/core/layers/ai-quick-scan', () => ({ | ||
| AIQuickScan: vi.fn().mockImplementation(() => ({ |
There was a problem hiding this comment.
🟠 L2-TEST-002 [HIGH] Mock inconsistency: AIQuickScan.gate.proceed = false prevents L3 execution
The AIQuickScan mock returns gate: { proceed: false, ... } on line 23. This causes the pipeline to skip AIDeepReview execution (line 73 test expects this). However, real pipeline behavior depends on actual gate logic. The mock's hardcoded proceed: false doesn't test the conditional execution path when proceed: true, leaving that scenario untested and creating false confidence.
Suggestion: Add a separate test that mocks AIQuickScan with proceed: true to verify AIDeepReview is called when the gate permits. Or use parameterized tests to cover both gate states.
| await pipeline.run({ scenario: 'test', pr: {} as any, diff: '', files: [], previousLayers: [] }); | ||
| expect(handler).toHaveBeenCalled(); | ||
| }); | ||
|
|
There was a problem hiding this comment.
🟠 L2-TEST-003 [HIGH] Layer:start event count assertion is fragile and incomplete
Line 101 expects layer:start to be called 3 times, but the pipeline has 4 layers (L0–L3, see lines 6–34 mocks). If AIDeepReview is skipped (which line 73 confirms it is), only 3 layers run, making this assertion coincidentally correct but misleading. It doesn't validate the correct set of layers actually ran—just that 3 events fired.
Suggestion: Verify which specific layers emitted events by checking event payloads (e.g., expect(handler).toHaveBeenNthCalledWith(1, expect.objectContaining({ layer: 0 }))). Or test the conditional execution explicitly: one test with gate open (all 4 layers), one with gate closed (3 layers).
| const report = await pipeline.run({ scenario: 'test', pr: {} as any, diff: '', files: [], previousLayers: [] }); | ||
| expect(report).toBeDefined(); | ||
| expect(report.layers).toBeDefined(); | ||
| }); |
There was a problem hiding this comment.
🟡 L2-TEST-004 [MEDIUM] No validation that report.layers array contains expected layer results
Line 120 checks that report.layers.length > 0 but doesn't validate layer order, layer IDs, or findings structure. The test passes if any layers are present, not the correct ones. Combined with the gate-closed mock, this cannot detect if L3 incorrectly appears in the report.
Suggestion: Assert exact layer count and check layer identity: expect(report.layers.map(l => l.layer)).toEqual([0, 1, 2]). Add a separate test for the gate-open path to verify L3 appears.
| }); | ||
|
|
||
| it('calls AIQuickScan.analyze', async () => { | ||
| await pipeline.run({ scenario: 'test', pr: {} as any, diff: '', files: [], previousLayers: [] }); |
There was a problem hiding this comment.
🟠 L3-ARCH-001 [HIGH] AIDeepReview gate-stop test contradicts its own mock setup
The test 'does not call AIDeepReview when gate says stop' relies on AIQuickScan returning gate.proceed: false — which is exactly what the global mock already returns for every test. This means the AIDeepReview layer should never be called in ANY test, including 'calls AIQuickScan.analyze' and the other positive tests. If the pipeline actually respects the gate (proceed: false → skip L3), then the test named 'calls AIDeepReview...' (if it exists in the truncated section) would silently pass even when AIDeepReview is broken, because the mock prevents it from ever being reached. Conversely, the gate-stop assertion is vacuously true and proves nothing about the conditional logic.
Suggestion: Override the AIQuickScan mock locally in tests that need L3 to execute: use vi.spyOn(aiQuickScan, 'analyze').mockResolvedValueOnce({ ..., gate: { proceed: true, ... } }) before calling pipeline.run in those tests. The global mock should default to proceed: true so the positive path is exercised by default.
Lens: architecture | Confidence: 92%
| contextCollector = new ContextCollector(); | ||
| deterministicGate = new DeterministicGate(); | ||
| aiQuickScan = new AIQuickScan({} as any, 'test'); | ||
| aiDeepReview = new AIDeepReview({} as any, 'test'); |
There was a problem hiding this comment.
🟡 L3-ARCH-002 [MEDIUM] previousLayers field passed as empty array may mask accumulation bugs
Every pipeline.run call supplies previousLayers: []. If the pipeline accumulates layer results into the context object passed to subsequent layers (i.e., each layer receives the results of prior layers), passing an empty array bypasses that path entirely. A bug where the pipeline fails to forward prior results to later layers would never be caught.
Suggestion: At least one test should supply a non-empty previousLayers value and assert that the downstream layer's analyze call receives it correctly, verifying the accumulation/forwarding logic.
Lens: architecture | Confidence: 82%
| it('calls ContextCollector.analyze', async () => { | ||
| await pipeline.run({ scenario: 'test', pr: {} as any, diff: '', files: [], previousLayers: [] }); | ||
| expect(contextCollector.analyze).toHaveBeenCalled(); | ||
| }); |
There was a problem hiding this comment.
🟠 L3-TEST-001 [HIGH] All tests verify mock calls, never pipeline output — broken pipeline logic passes invisibly
Every test asserts only that analyze was called (or not called) on mock instances. The pipeline's actual responsibilities — aggregating layer results into a report, computing totals, merging findings, setting overall risk/status — are never verified against real values. If ReviewPipeline.run() returned null, an empty object, or silently swallowed errors after calling the mocks, every test would still pass. The 100% line coverage claim is accurate but meaningless: lines are executed, but correctness of the output is unverified.
Suggestion: Assert on the return value of pipeline.run(). At minimum: check that report.layers has the expected length, that report.findings is an array, and that report.metrics.totalCostUsd equals the sum of mock layer costs (0 + 0 + 0.02 + 0 = 0.02 when L3 is skipped, 0.42 when it runs). This catches any aggregation bug the call-count assertions cannot.
Lens: test-quality | Confidence: 95%
| expect(contextCollector.analyze).toHaveBeenCalled(); | ||
| }); | ||
|
|
||
| it('calls DeterministicGate.analyze', async () => { |
There was a problem hiding this comment.
🟠 L3-TEST-002 [HIGH] 'gate says stop' test cannot prove L3 is actually skipped — the mock setup makes AIDeepReview unreachable regardless
The test titled 'does not call AIDeepReview when gate says stop' is structurally indistinguishable from a test where the pipeline ignores the gate entirely. Because AIQuickScan.analyze is globally mocked to return gate: { proceed: false } in beforeEach, AIDeepReview would also not be called if the pipeline had a bug such as an off-by-one that dropped the last layer, or if the pipeline crashed after L2 and swallowed the error. The negative assertion (not.toHaveBeenCalled) passes in all three scenarios: correct gate logic, incorrect gate logic with a crash, and incorrect gate logic that simply never reaches L3. A complementary test with gate.proceed: true that confirms L3 is called is required to make the negative test meaningful — without it the security-relevant early-exit path is unverified.
Suggestion: Add a paired positive test: override the AIQuickScan mock for that single test to return gate: { proceed: true }, run the pipeline, and assert aiDeepReview.analyze was called. Only with both the positive and negative cases does the gate boundary have genuine test coverage.
Lens: test-quality | Confidence: 90%
Summary
Adds full test coverage for all pipeline layers.
Changes
Coverage report attached. All green.