Skip to content

improve skill: better consolidation coverage, richer scripts, workspace layout#26

Merged
joe32140 merged 1 commit intomainfrom
update-skill-guidelines
Mar 11, 2026
Merged

improve skill: better consolidation coverage, richer scripts, workspace layout#26
joe32140 merged 1 commit intomainfrom
update-skill-guidelines

Conversation

@joe32140
Copy link
Copy Markdown
Contributor

@joe32140 joe32140 commented Mar 9, 2026

Summary

Improvements to the /openaireview skill based on analysis of review quality across multiple methods (skill, progressive Opus/Gemini). The key finding: the consolidation step was over-pruning — 61 raw sub-agent comments collapsed to 18, dropping legitimate singleton findings. This PR fixes that while also improving the underlying scripts.

SKILL.md

Step 4a — Singleton protection:
Read singleton findings (appearing in only one sub-agent) in full before dropping — these are most likely to be unique insights, not noise.

Step 4b — Smarter merging:
When merging a root-cause cluster, check whether any comment makes a distinct argument (different evidence, different claim, different consequence). If so, keep it separate. Goal is to eliminate true duplicates, not compress distinct observations that share a design decision.

Step 4c — Calibration floor:

  • Major ceiling: 8 → 10 (papers with many independent validity threats may legitimately exceed 7)
  • Added a 15–25 total comment floor: if fewer than 15 comments survive dedup, revisit for over-merging

consolidate_comments.py

Richer stdout output — now prints the first sentence of each explanation and a source-count column alongside each title. Makes singleton findings visible at a glance without requiring a full-text fetch.

prepare_workspace.py

Prefer reviewer.parsers (BeautifulSoup + Marker) when the package is available; fall back to the stdlib-only ArXiv extractor. Outputs workspace to review_results/<slug>_review/, writes metadata.json.

save_viz_json.py / criteria.md / subagent_templates.md

Minor fixes and wording refinements.

Impact

Evaluated on paper 2602.18458 (MechEvalAgent). Before: 18 final comments. After: 26 — recovering 8 findings that sub-agents had found but consolidation dropped, including >80% concealing C3/GT3 outliers, C3 as AND-logic artifact, three-trial GT budget unjustified, and impact statement gaps.

Test plan

  • Singleton findings visible in consolidation stdout without full-text fetch
  • Consolidated output reaches 15–25 comments on a typical paper
  • prepare_workspace.py falls back gracefully when reviewer.parsers is absent

@joe32140 joe32140 changed the title improve skill: merge-by-root-cause dedup, calibration guidance, quality-over-quantity improve skill: merge-by-root-cause dedup, calibration guidance Mar 10, 2026
SKILL.md / references:
- Step 4a: read singleton findings in full before dropping (singletons are
  most likely to be unique insights, not noise)
- Step 4b: when merging, check for distinct arguments within a group before
  folding them in — eliminate true duplicates, not distinct observations
- Step 4c: calibration ceiling 8→10 majors; add 15-25 total comment floor to
  catch over-merging; softer 4-type taxonomy guidance
- subagent_templates.md / criteria.md: minor wording refinements

Scripts:
- consolidate_comments.py: richer stdout — first sentence of each explanation
  plus source-count column so singletons are visible without full-text fetch
- prepare_workspace.py: prefer reviewer.parsers (BeautifulSoup/Marker) when
  available; fall back to stdlib-only ArXiv extractor; output to
  review_results/<slug>_review/, write metadata.json
- save_viz_json.py: minor fixes for metadata.json compatibility

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@joe32140 joe32140 force-pushed the update-skill-guidelines branch from 6bb5efe to ded847e Compare March 11, 2026 02:19
@joe32140 joe32140 changed the title improve skill: merge-by-root-cause dedup, calibration guidance improve skill: better consolidation coverage, richer scripts, workspace layout Mar 11, 2026
@joe32140 joe32140 merged commit 992a1ea into main Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant