Skip to content

fix: complete label rename, export prompt bugs, docs + 0.1.5#57

Merged
Treelovah merged 4 commits intomainfrom
fix/scoring-label-completeness
Feb 27, 2026
Merged

fix: complete label rename, export prompt bugs, docs + 0.1.5#57
Treelovah merged 4 commits intomainfrom
fix/scoring-label-completeness

Conversation

@Treelovah
Copy link
Copy Markdown
Contributor

PR #49 renamed "Overall Score" to "Strategy Score" in about 70% of the output paths. This finishes the job — terminal analysis, text reports, and markdown reports all say the right thing now. The formula explainer also stopped pretending KSM is a simple multiplication when it's actually a piecewise function with efficacy gating.

PR #51's export prompt had three bugs: writeFileSync with no error handling (permission denied = crash), a guard that made the no-analysis export path unreachable, and Ctrl+C during the prompt surfacing as "Benchmark failed." All fixed.

Docs updated to reflect token efficiency as the third KSM factor. CHANGELOG covers #44 through #55. Version bumped to 0.1.5.

363 tests passing, all output formats verified locally (terminal, markdown, HTML, share card, clipboard).

Closes #54, closes #55

Three export paths still said "Overall Score" or bare "Score" after
the rename that was supposed to disambiguate KSM from the LLM strategy
assessment. The terminal output, text report, and markdown report all
had unrenamed labels — exactly the places where users see both numbers
side-by-side and get confused.

Also fixed the formula explainer that claimed KSM is a simple
multiplication of three terms. It isn't — efficacy acts as a gate
with a cap at 30 when zero and a sliding multiplier below 50. Saying
"x × y × z" when the code does something materially different is
worse than saying nothing.

Closes #54
The post-run export (PR #51) shipped with writeFileSync unwrapped,
a guard that made the no-analysis path completely unreachable, and
Ctrl+C surfacing as a benchmark failure. Probably shouldn't ship
interactive features that crash on predictable user behavior.

Also updated the test that was still asserting "Overall Score" after
we renamed it in #49. Tests only work if you keep them current.

Closes #55
KSM now has three factors instead of two. The docs should probably
reflect that before someone reads the spec and wonders why their
score doesn't match the formula on the page.

Updated KSM-SCORING.md with the full token efficiency section,
realistic examples showing the cost difference between efficient
and wasteful models, and bumped the spec version to 1.2.

README scoring section now documents all three factors. CHANGELOG
covers everything from #44 through #55. Version bump to 0.1.5.
@Treelovah Treelovah requested a review from pi3-code February 26, 2026 22:06
The hand-calculated efficiency was 0.871 but the actual formula gives
0.867 for 2698 tokens/step. KSM rounds accordingly: 84.1 not 84.5.
Spec documents should have correct math.
@Treelovah Treelovah merged commit 9114a56 into main Feb 27, 2026
4 checks passed
@Treelovah Treelovah deleted the fix/scoring-label-completeness branch February 27, 2026 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant