Unit 6 (prompt-design): split criterion 2 (MEDIUM, preserve-by-default) by LuminLynx · Pull Request #164 · LuminLynx/Libella

LuminLynx · 2026-05-21T20:01:57Z

Summary

Unit 6 (prompt-design), fifth of the MEDIUM batch, under preserve-by-default.

Rubric (3 → 4): c1 unchanged · c2 = name the failure mode · c3 (NEW) = explain the mechanism · c4 = regime distinction (was c3).

Preserve-by-default decomposition

Faithful decomposition of the locked Opus values:

Every old-c2=T pair → c2=T and c3=T.
p007 is the lone c2=T / c3=F differential — its own authored label says it names "instructions miss edge cases" without the ambiguous-criteria mechanism. (So unlike Unit 5, this set does test the c2-vs-c3 distinction once.)
p009, p011 name no failure mode → c2=F, c3=F.
c4 (regime) carries old-c3 unchanged.

No realignments, no judgment-flips. p007/p011 labels updated for the 4-criterion shape.

Post-split distribution (21 pairs)

8 × 4-of-4 · 3 × 3-of-4 (p006, p007-differential, p008) · 2 × 2-of-4 (p010, p011) · 1 × 1-of-4 (p009) · 5 on-topic-all-missed · 2 off-topic.

Note

The known-bad p018 (emoji + structured markdown + slashed percentages — reproducible grader-payload ERROR per UNIT_6_GATE.md) is unaffected by the split; it remains a documented known-bad marker.

Local validation

lint_unit_markdown / ingest_units --check — clean
run_regression_set --check — 21 pairs valid
pytest — 20/20

Test plan

Backend + Android CI green
Live grader gate optional (preserve-by-default — disagreements are documented, not chased). Expect the p018 ERROR and possibly grader-lenient c1 reads.

Opened as draft.

Generated by Claude Code

…-default) Per docs/RUBRIC_AUDIT.md (MEDIUM): old c2 bundled 'names a concrete failure mode' with 'explains the mechanism.' Splits into name-the-failure-mode c2 and a new c3 (explain the mechanism); renumbers regime distinction to position 4. Rubric grows 3 -> 4. Preserve-by-default (docs/REGRESSION_GATE.md): faithful decomposition of the locked Opus values — old-c2=T → c2=T,c3=T. p007 is the lone c2=T/c3=F differential (its authored label says it names 'instructions miss edge cases' without the ambiguous-criteria mechanism); p009/p011 name no failure mode → c2=F,c3=F. No realignments, no judgment-flips. c4 (regime) carries old-c3 unchanged. Updated p007/p011 labels for the 4-criterion shape. Sonnet gate disagreements are documented calibration gaps, not edits to the gold standard. Known-bad p018 ERROR pair unaffected. Local lint, schema check, ingest-check, pytest all pass.

LuminLynx marked this pull request as ready for review May 21, 2026 20:18

LuminLynx merged commit 462138b into main May 21, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unit 6 (prompt-design): split criterion 2 (MEDIUM, preserve-by-default)#164

Unit 6 (prompt-design): split criterion 2 (MEDIUM, preserve-by-default)#164
LuminLynx merged 1 commit into
mainfrom
claude/unit-06-rubric-split

LuminLynx commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LuminLynx commented May 21, 2026

Summary

Preserve-by-default decomposition

Post-split distribution (21 pairs)

Note

Local validation

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants