refactor(skill/verifier): replace regex heuristic with structured toolCalls for coverage check#1552
Merged
hijzy merged 2 commits intoApr 27, 2026
Conversation
- Fix box-drawing alignment for emoji display width - Fix "Terminated" noise and ANSI escape code leaks - Suppress npm postinstall noise, add step numbering - Viewer readiness spinner with actual HTTP check - Handle launchctl KeepAlive conflict gracefully - Improve interactive picker with emoji and alignment
…lCalls for coverage check
The old verifier used a regex to guess "command-like tokens" from the
draft's natural language text, then searched for them in the evidence
text via substring matching. The third regex branch `[a-z_]{3,}` pulled
in English verbs (install, verify, retry...) as false positives, while
substring search missed synonyms and CJK text — causing systematically
low coverage scores despite high resonance.
Replace the entire coverage pipeline with structured tool name comparison:
- New `extractToolNames()` reads `trace.toolCalls[].name` + first token
of string `tc.input` (for shell-like tools) as ground truth
- Crystallize prompt v3 injects `EVIDENCE_TOOLS` whitelist and requires
LLM to output explicit `tools: string[]` field
- Verifier checks `draft.tools ⊆ evidenceTools` via Set comparison
- Delete `collectCommandTokens`, `STOPWORDS`, `actionBlob` regex logic
- Add `tools: string[]` to `SkillCrystallizationDraft` and `SkillProcedure`
- Packager persists tools and renders "Tools used" in invocation guide
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
verifier.tswith structured tool name comparison usingtrace.toolCallsground truth dataextractToolNames()utility that readstc.name+ first token of stringtc.inputfrom evidence tracesEVIDENCE_TOOLSwhitelist into the prompt, require LLM to output explicittools: string[]field constrained to the whitelistdraft.tools ⊆ evidenceTools— no regex, no STOPWORDS, noactionBlobsubstring searchtools: string[]toSkillCrystallizationDraft,SkillProcedure; packager persists and renders "Tools used" sectionMotivation
The old regex third branch
[a-z_]{3,}pulled English verbs (install, verify, retry...) into the coverage denominator as false positives, while substring matching missed synonyms and CJK text. This caused systematically low coverage scores despite high resonance, blocking valid skills from passing verification.Test plan
toolCallsevidence andtoolsdraft fieldtoolsfield parsing from LLM response