Skip to content

refactor(skill/verifier): replace regex heuristic with structured toolCalls for coverage check#1552

Merged
hijzy merged 2 commits into
MemTensor:mem-agent-0424from
hijzy:feat/verifier-structured-tool-coverage
Apr 27, 2026
Merged

refactor(skill/verifier): replace regex heuristic with structured toolCalls for coverage check#1552
hijzy merged 2 commits into
MemTensor:mem-agent-0424from
hijzy:feat/verifier-structured-tool-coverage

Conversation

@hijzy
Copy link
Copy Markdown
Collaborator

@hijzy hijzy commented Apr 27, 2026

Summary

  • Replace the regex-based "command token guessing" in verifier.ts with structured tool name comparison using trace.toolCalls ground truth data
  • Add extractToolNames() utility that reads tc.name + first token of string tc.input from evidence traces
  • Crystallize prompt v3: inject EVIDENCE_TOOLS whitelist into the prompt, require LLM to output explicit tools: string[] field constrained to the whitelist
  • Verifier coverage check is now a clean set-containment: draft.tools ⊆ evidenceTools — no regex, no STOPWORDS, no actionBlob substring search
  • Add tools: string[] to SkillCrystallizationDraft, SkillProcedure; packager persists and renders "Tools used" section

Motivation

The old regex third branch [a-z_]{3,} pulled English verbs (install, verify, retry...) into the coverage denominator as false positives, while substring matching missed synonyms and CJK text. This caused systematically low coverage scores despite high resonance, blocking valid skills from passing verification.

Test plan

  • All 41 skill unit tests pass (verifier, crystallize, packager, lifecycle, eligibility, events, evidence, subscriber, integration)
  • Verifier tests rewritten with structured toolCalls evidence and tools draft field
  • Crystallize test verifies tools field parsing from LLM response
  • Zero lint errors

hijzy added 2 commits April 27, 2026 20:31
- Fix box-drawing alignment for emoji display width
- Fix "Terminated" noise and ANSI escape code leaks
- Suppress npm postinstall noise, add step numbering
- Viewer readiness spinner with actual HTTP check
- Handle launchctl KeepAlive conflict gracefully
- Improve interactive picker with emoji and alignment
…lCalls for coverage check

The old verifier used a regex to guess "command-like tokens" from the
draft's natural language text, then searched for them in the evidence
text via substring matching. The third regex branch `[a-z_]{3,}` pulled
in English verbs (install, verify, retry...) as false positives, while
substring search missed synonyms and CJK text — causing systematically
low coverage scores despite high resonance.

Replace the entire coverage pipeline with structured tool name comparison:
- New `extractToolNames()` reads `trace.toolCalls[].name` + first token
  of string `tc.input` (for shell-like tools) as ground truth
- Crystallize prompt v3 injects `EVIDENCE_TOOLS` whitelist and requires
  LLM to output explicit `tools: string[]` field
- Verifier checks `draft.tools ⊆ evidenceTools` via Set comparison
- Delete `collectCommandTokens`, `STOPWORDS`, `actionBlob` regex logic
- Add `tools: string[]` to `SkillCrystallizationDraft` and `SkillProcedure`
- Packager persists tools and renders "Tools used" in invocation guide
@hijzy hijzy merged commit 5143d0c into MemTensor:mem-agent-0424 Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant