Skip to content

test: add comprehensive pipeline layer tests#3

Open
ArturSkowronski wants to merge 1 commit intomasterfrom
demo/meta/hollow-test-suite
Open

test: add comprehensive pipeline layer tests#3
ArturSkowronski wants to merge 1 commit intomasterfrom
demo/meta/hollow-test-suite

Conversation

@ArturSkowronski
Copy link
Copy Markdown
Collaborator

Summary

Adds full test coverage for all pipeline layers.

Changes

  • 15 unit tests covering L0–L3 layers
  • 100% line coverage on pipeline.ts and all layer files
  • Mock-based isolation for fast test execution

Coverage report attached. All green.

Demo scenario: meta/hollow-test-suite
@github-actions
Copy link
Copy Markdown

🔍 VCR Code Review


> vcr-demo@0.1.0 demo:local
> tsx src/cli/index.ts --local


 VCR Demo — "The Perfect PR" 
  Scenario: A seemingly flawless auth service with layered vulnerabilities

→ Local mode — skipping GitHub PR creation

▸ Layer 0 — Context Collection  0.0s

▸ Layer 1 — Deterministic Gate  0.0s
  ⚠ MEDIUM     L1-SEC-004   src/auth/auth.service.ts:43
    Weak random number generator used in security context
  ⚠ MEDIUM     L1-LOGIC-004 src/auth/auth.service.ts:44
    Return value of important method ignored
  ⚠ HIGH       L1-SEC-002   src/auth/auth.model.ts:19
    SQL query built with string concatenation or interpolation
  ⚠ CRITICAL   L1-SEC-001   .env.test:5
    Hardcoded secret or credential

▸ Layer 2 — AI Quick Scan  0.0s  $0.02
  ⚠ HIGH       L2-TEST-001  test/auth.test.ts
    8/12 tests are circular (mock-on-mock)
  ⚠ HIGH       L2-AUTH-002  src/auth/auth.controller.ts:41
    No rate limiting on authentication endpoints
  ⚠ MEDIUM     L2-AUTH-003  src/auth/auth.controller.ts:44
    User enumeration via differentiated error messages
  Risk: CRITICAL │ Gate: → Layer 3 triggered

▸ Layer 3 — AI Deep Review  0.0s  $0.42
  ⚠ CRITICAL   L3-SEC-001   [security] src/auth/auth.service.ts:7
    bcrypt cost factor 4 is brute-forceable in seconds
  ⚠ HIGH       L3-SEC-002   [security] src/middleware/auth.middleware.ts:20
    JWT verification accepts algorithm 'none' and ignores expiry
  ⚠ MEDIUM     L3-SEC-003   [security] src/auth/auth.controller.ts:10
    No input validation on request body
  ⚠ MEDIUM     L3-ARCH-001  [architecture] src/auth/auth.controller.ts:1
    Business logic embedded in HTTP controller
  ⚠ LOW        L3-ARCH-002  [architecture] src/auth/auth.model.ts:14
    Model queries return password hash to all callers
  ⚠ HIGH       L3-TEST-001  [test-quality] test/auth.test.ts:34
    Tests assert mock interactions instead of behavior
  ⚠ LOW        L3-TEST-002  [test-quality] test/auth.test.ts:1
    Zero negative and edge case tests


════════════════════════════════════════════════════════
  RESULTS — Side by Side
════════════════════════════════════════════════════════

  Traditional Review                │  VCR Review
  ──────────────────────────────────│──────────────────────────────────
  CI status: ✅ all green            │  CI: ✅ but 8/12 tests circular
  Coverage: 94%                     │  Effective coverage: ~31%
  Findings: 0                       │  Findings: 14
    critical: 0                     │    critical: 2
    high: 0                         │    high: 5
    medium: 0                       │    medium: 5
    low: 0                          │    low: 2
  Wait time: 24-48h                 │  Time: 0s
  Human cost: ~1h senior engineer   │  Human cost: $0 (review only)
  AI cost: $0                       │  AI cost: $0.44
  Risk: auth bypass ships to production│  Risk: caught before merge


Reviewed by Visdom Code Review

@ArturSkowronski
Copy link
Copy Markdown
Collaborator Author

🔍 VCR Review — Side by Side

Metric Traditional Review VCR Review
CI Status ✅ All green ✅ But circular tests detected
Coverage 94% ~31% effective (after TORS)
Findings 0 8
high 0 6
medium 0 2
Wait time 24-48h 26s
Human cost ~1h senior engineer $0 (review only)
AI cost $0 $0.04
Risk auth bypass ships to production caught before merge

Findings by Layer

Layer 2 — AI Quick Scan

  • 🟠 L2-TEST-001 4/15 tests are circular (mock-on-mock) (test/pipeline.test.ts)
  • 🟠 L2-TEST-002 Mock inconsistency: AIQuickScan.gate.proceed = false prevents L3 execution (test/pipeline.test.ts:23)
  • 🟠 L2-TEST-003 Layer:start event count assertion is fragile and incomplete (test/pipeline.test.ts:101)
  • 🟡 L2-TEST-004 No validation that report.layers array contains expected layer results (test/pipeline.test.ts:120)

Layer 3 — AI Deep Review

  • 🟠 L3-ARCH-001 AIDeepReview gate-stop test contradicts its own mock setup (test/pipeline.test.ts:70)
  • 🟡 L3-ARCH-002 previousLayers field passed as empty array may mask accumulation bugs (test/pipeline.test.ts:55)
  • 🟠 L3-TEST-001 All tests verify mock calls, never pipeline output — broken pipeline logic passes invisibly (test/pipeline.test.ts:62)
  • 🟠 L3-TEST-002 'gate says stop' test cannot prove L3 is actually skipped — the mock setup makes AIDeepReview unreachable regardless (test/pipeline.test.ts:64)

Generated by VCR Demo in 26s

Comment thread test/pipeline.test.ts
}));

vi.mock('../src/core/layers/ai-quick-scan', () => ({
AIQuickScan: vi.fn().mockImplementation(() => ({
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 L2-TEST-002 [HIGH] Mock inconsistency: AIQuickScan.gate.proceed = false prevents L3 execution

The AIQuickScan mock returns gate: { proceed: false, ... } on line 23. This causes the pipeline to skip AIDeepReview execution (line 73 test expects this). However, real pipeline behavior depends on actual gate logic. The mock's hardcoded proceed: false doesn't test the conditional execution path when proceed: true, leaving that scenario untested and creating false confidence.

Suggestion: Add a separate test that mocks AIQuickScan with proceed: true to verify AIDeepReview is called when the gate permits. Or use parameterized tests to cover both gate states.

Comment thread test/pipeline.test.ts
await pipeline.run({ scenario: 'test', pr: {} as any, diff: '', files: [], previousLayers: [] });
expect(handler).toHaveBeenCalled();
});

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 L2-TEST-003 [HIGH] Layer:start event count assertion is fragile and incomplete

Line 101 expects layer:start to be called 3 times, but the pipeline has 4 layers (L0–L3, see lines 6–34 mocks). If AIDeepReview is skipped (which line 73 confirms it is), only 3 layers run, making this assertion coincidentally correct but misleading. It doesn't validate the correct set of layers actually ran—just that 3 events fired.

Suggestion: Verify which specific layers emitted events by checking event payloads (e.g., expect(handler).toHaveBeenNthCalledWith(1, expect.objectContaining({ layer: 0 }))). Or test the conditional execution explicitly: one test with gate open (all 4 layers), one with gate closed (3 layers).

Comment thread test/pipeline.test.ts
const report = await pipeline.run({ scenario: 'test', pr: {} as any, diff: '', files: [], previousLayers: [] });
expect(report).toBeDefined();
expect(report.layers).toBeDefined();
});
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 L2-TEST-004 [MEDIUM] No validation that report.layers array contains expected layer results

Line 120 checks that report.layers.length > 0 but doesn't validate layer order, layer IDs, or findings structure. The test passes if any layers are present, not the correct ones. Combined with the gate-closed mock, this cannot detect if L3 incorrectly appears in the report.

Suggestion: Assert exact layer count and check layer identity: expect(report.layers.map(l => l.layer)).toEqual([0, 1, 2]). Add a separate test for the gate-open path to verify L3 appears.

Comment thread test/pipeline.test.ts
});

it('calls AIQuickScan.analyze', async () => {
await pipeline.run({ scenario: 'test', pr: {} as any, diff: '', files: [], previousLayers: [] });
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 L3-ARCH-001 [HIGH] AIDeepReview gate-stop test contradicts its own mock setup

The test 'does not call AIDeepReview when gate says stop' relies on AIQuickScan returning gate.proceed: false — which is exactly what the global mock already returns for every test. This means the AIDeepReview layer should never be called in ANY test, including 'calls AIQuickScan.analyze' and the other positive tests. If the pipeline actually respects the gate (proceed: false → skip L3), then the test named 'calls AIDeepReview...' (if it exists in the truncated section) would silently pass even when AIDeepReview is broken, because the mock prevents it from ever being reached. Conversely, the gate-stop assertion is vacuously true and proves nothing about the conditional logic.

Suggestion: Override the AIQuickScan mock locally in tests that need L3 to execute: use vi.spyOn(aiQuickScan, 'analyze').mockResolvedValueOnce({ ..., gate: { proceed: true, ... } }) before calling pipeline.run in those tests. The global mock should default to proceed: true so the positive path is exercised by default.

Lens: architecture | Confidence: 92%

Comment thread test/pipeline.test.ts
contextCollector = new ContextCollector();
deterministicGate = new DeterministicGate();
aiQuickScan = new AIQuickScan({} as any, 'test');
aiDeepReview = new AIDeepReview({} as any, 'test');
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 L3-ARCH-002 [MEDIUM] previousLayers field passed as empty array may mask accumulation bugs

Every pipeline.run call supplies previousLayers: []. If the pipeline accumulates layer results into the context object passed to subsequent layers (i.e., each layer receives the results of prior layers), passing an empty array bypasses that path entirely. A bug where the pipeline fails to forward prior results to later layers would never be caught.

Suggestion: At least one test should supply a non-empty previousLayers value and assert that the downstream layer's analyze call receives it correctly, verifying the accumulation/forwarding logic.

Lens: architecture | Confidence: 82%

Comment thread test/pipeline.test.ts
it('calls ContextCollector.analyze', async () => {
await pipeline.run({ scenario: 'test', pr: {} as any, diff: '', files: [], previousLayers: [] });
expect(contextCollector.analyze).toHaveBeenCalled();
});
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 L3-TEST-001 [HIGH] All tests verify mock calls, never pipeline output — broken pipeline logic passes invisibly

Every test asserts only that analyze was called (or not called) on mock instances. The pipeline's actual responsibilities — aggregating layer results into a report, computing totals, merging findings, setting overall risk/status — are never verified against real values. If ReviewPipeline.run() returned null, an empty object, or silently swallowed errors after calling the mocks, every test would still pass. The 100% line coverage claim is accurate but meaningless: lines are executed, but correctness of the output is unverified.

Suggestion: Assert on the return value of pipeline.run(). At minimum: check that report.layers has the expected length, that report.findings is an array, and that report.metrics.totalCostUsd equals the sum of mock layer costs (0 + 0 + 0.02 + 0 = 0.02 when L3 is skipped, 0.42 when it runs). This catches any aggregation bug the call-count assertions cannot.

Lens: test-quality | Confidence: 95%

Comment thread test/pipeline.test.ts
expect(contextCollector.analyze).toHaveBeenCalled();
});

it('calls DeterministicGate.analyze', async () => {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 L3-TEST-002 [HIGH] 'gate says stop' test cannot prove L3 is actually skipped — the mock setup makes AIDeepReview unreachable regardless

The test titled 'does not call AIDeepReview when gate says stop' is structurally indistinguishable from a test where the pipeline ignores the gate entirely. Because AIQuickScan.analyze is globally mocked to return gate: { proceed: false } in beforeEach, AIDeepReview would also not be called if the pipeline had a bug such as an off-by-one that dropped the last layer, or if the pipeline crashed after L2 and swallowed the error. The negative assertion (not.toHaveBeenCalled) passes in all three scenarios: correct gate logic, incorrect gate logic with a crash, and incorrect gate logic that simply never reaches L3. A complementary test with gate.proceed: true that confirms L3 is called is required to make the negative test meaningful — without it the security-relevant early-exit path is unverified.

Suggestion: Add a paired positive test: override the AIQuickScan mock for that single test to return gate: { proceed: true }, run the pipeline, and assert aiDeepReview.analyze was called. Only with both the positive and negative cases does the gate boundary have genuine test coverage.

Lens: test-quality | Confidence: 90%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant