Skip to content

harness: add targeted tests for dedup sensitivity, multi-module section splitting, and security boundary routing #36

@SmartBrandStrategies

Description

@SmartBrandStrategies

Context

After running the full scenario harness and analyzing results, these test gaps were identified as high-value additions for improving classifier quality.

Proposed additional test scenarios

1. Dedup sensitivity — near-duplicate variants

The current dedup test (`edge-duplicate-injection`) injects identical text. We need tests for:

  • Rephrased duplicates: "Never commit secrets" vs "Do not commit secrets to the repository" — Jaccard similarity may fall below 0.8, causing false negatives
  • Partial duplicates: a new session adds 3 rules, 2 of which already exist in ADF — only 1 should migrate

2. Multi-module section splitting — heading dominates all items

Current classifier: once a heading routes to a module, ALL items in that section go to the same module. Items with keywords for other modules are ignored.

Example failure:
```

Database

  • D1 bound as `DB` in wrangler.toml → backend.adf (heading wins)
  • Run migrations with `wrangler d1 migrate` → backend.adf (should be infra.adf!)
    ```

A test that verifies cross-keyword items within a section would expose this and track when it's fixed.

3. Security boundary routing — auth in backend vs security modules

Auth-related rules appear in two contexts:

  • Implementation rules (how to write auth code): belong in `backend.adf`
  • Security policy rules (what must be enforced): belong in `security.adf`

The current `## Auth` heading maps everything to `security.adf`. A test with mixed implementation + policy rules under one heading would expose the lack of sub-heading routing.

4. Trigger prefix collision — short triggers matching unrelated content

The prefix match fix (removing trailing `\b`) introduced a potential over-matching risk. Example:

  • Trigger `auth` now matches "authority", "author", "authentic"
  • Trigger `api` matches "apiary", "apiVersion"

A test with content containing "the author of this library" or "apiary endpoint" should verify these don't accidentally route to security/backend modules.

5. Large injection — 20+ items in one session

Current tests max at ~13 items per session. A stress test with 25+ items would:

  • Test dedup performance (O(n²) Jaccard comparisons)
  • Verify routing accuracy doesn't degrade at scale
  • Surface any ADF write failures for large patch sets

6. Empty/minimal injection — just a heading, no items

Edge case: AI adds `## Auth\n\n` (heading with no content). Should produce 0 extractions cleanly without errors.

Implementation

Add these to `harness/corpus/edge-cases.ts` as additional `Scenario` objects. The trigger prefix collision test (#4) is particularly important to add before the prefix-match change ships in a release.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions