Skip to content

Unit 8 (cost-dynamics): split criterion 2 (MEDIUM, preserve-by-default)#167

Merged
LuminLynx merged 2 commits into
mainfrom
claude/unit-08-rubric-split
May 21, 2026
Merged

Unit 8 (cost-dynamics): split criterion 2 (MEDIUM, preserve-by-default)#167
LuminLynx merged 2 commits into
mainfrom
claude/unit-08-rubric-split

Conversation

@LuminLynx
Copy link
Copy Markdown
Owner

Summary

Unit 8 (cost-dynamics), seventh of the MEDIUM batch, faithful preserve-by-default split. Gate-exempt.

Rubric (3 → 4): c1 unchanged · c2 = name the failure mode · c3 (NEW) = explain the mechanism · c4 = regime distinction (was c3).

Decomposition

  • Every old-c2=T pair → c2=T and c3=T.
  • p007 is the lone c2=T / c3=F differential — names the single-axis failure modes ("caching alone misses the batch discount") without the mechanism. (Distinct shape from earlier units: c1 is also false here — its old c1 failed on the missing annualization anchor — so it's a 2-of-4 pair F,T,F,T.)
  • p009, p011 name no failure mode → c2=F, c3=F.
  • c4 (regime) carries old-c3 unchanged. No realignments. p007/p011 labels updated.

Post-split distribution (21 pairs)

8 × 4-of-4 · 2 × 3-of-4 (p006, p008) · 2 × 2-of-4 (p007-differential, p010) · 2 × 1-of-4 (p009, p011) · 5 on-topic-all-missed · 2 off-topic.

Local validation

  • lint_unit_markdown / ingest_units --check — clean
  • run_regression_set --check — 21 pairs valid
  • pytest — 20/20

Ready to merge (gate-exempt). One unit left in the MEDIUM batch: Unit 9.


Generated by Claude Code

…-default)

Per docs/RUBRIC_AUDIT.md (MEDIUM): old c2 bundled 'names a concrete
failure mode' with 'explains the mechanism.' Splits into
name-the-failure-mode c2 and a new c3 (explain the mechanism);
renumbers regime distinction to position 4. Rubric grows 3 -> 4.

Preserve-by-default: faithful decomposition of locked Opus values
(old-c2=T -> c2=T,c3=T). p007 is the lone c2=T/c3=F differential
(names single-axis failures without the mechanism; c1 also false on
the missing annualization anchor); p009/p011 name no failure mode ->
c2=F,c3=F. No realignments. c4 (regime) carries old-c3 unchanged.
Updated p007/p011 labels for the 4-criterion shape.

Gate-exempt (faithful split, zero realignments per
docs/REGRESSION_GATE.md). Local lint, schema check, ingest-check,
pytest all pass.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

label: c1 + c2 met, c3 missing (no per-regime lever mapping)

P2 Badge Update pair label to match four-criterion expectations

After the rubric was split to four criteria, this label now contradicts the actual expected vector for the same pair (c3 is marked true below, while c4 is false). This kind of stale label makes calibration/debug workflows unreliable for anyone triaging by label text (including scripts or manual audits), because the descriptive class no longer matches the ground truth in expected.criteria; the affected labels should be renumbered to the new c1–c4 semantics.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Codex on PR #167: after the c2 split, several pair labels still
referenced the old 3-criterion numbering and contradicted their
expected vectors. Fixed:
  * p006: 'c1+c2 met, c3 missing' -> 'c1+c2+c3 met, c4 missing' (T,T,T,F)
  * p008: 'c2+c3 met, c1 missing' -> 'c2+c3+c4 met, c1 missing' (F,T,T,T)
  * p010: 'c2 only' -> 'c2+c3 met, c1/c4 missing' (F,T,T,F)
  * 'All three met' -> 'All four met' (cosmetic, 4 criteria now)
Expected values unchanged; only labels corrected.
Copy link
Copy Markdown
Owner Author

@codex Fixed in 0199ff4. Renumbered the stale labels to 4-criterion semantics: p006 → "c1+c2+c3 met, c4 missing" (T,T,T,F), p008 → "c2+c3+c4 met, c1 missing" (F,T,T,T), p010 → "c2+c3 met, c1/c4 missing" (F,T,T,F), and "All three met" → "All four met". Expected values unchanged.

Good catch on the systematic angle — when each unit was split I updated the differential/renamed pair labels but left ordinary partial-credit labels on old numbering. The same pattern exists in already-merged units (2–7, 10–12) and pending #168 (Unit 9); I'm doing a dedicated label-reconciliation pass for those rather than leaving them stale.


Generated by Claude Code

@chatgpt-codex-connector
Copy link
Copy Markdown

Thanks for the update — this looks resolved.

Given your trigger and the PR context, no further Codex action is needed right now:

  • You explicitly state the stale-label issue was fixed in commit 0199ff4.
  • The renumbering you listed is consistent with the 4-criterion semantics (c1..c4) and with preserving expected vectors.
  • You also identified and planned follow-up reconciliation for already-merged/pending units, which addresses the broader systematic risk noted in review.

I did not make additional code changes, create a commit, or open a follow-up PR from this run.

View task →

@LuminLynx LuminLynx merged commit 29e7a0d into main May 21, 2026
2 checks passed
LuminLynx pushed a commit that referenced this pull request May 21, 2026
The first draft claimed units 8 and 9 still carried a bundled c2. That was
read off a stale working tree from before a git pull — PRs #167 and #168
had already split both (rubric + regression sets to 4 criteria) and merged.
Remove the false "incomplete sweep" section, fix the numbering note (only
tokenization and the reverted multimodal remain 3-criterion), and correct
the regime-criterion index for units 8/9 from c3 to c4.

The c1 and regime-criterion analysis for 8/9 is unchanged — the split left
their c1 text alone and only renumbered the regime criterion.

https://claude.ai/code/session_019xEvNkByf5ic4kbMZFdKDR
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants