verify: G5 smoke test (DO NOT MERGE)#1
Closed
TheDave94 wants to merge 1 commit into
Closed
Conversation
TheDave94
pushed a commit
that referenced
this pull request
May 24, 2026
G5's bake-gates.yml workflow was missing scikit-image, which scripts/generate_strokes_auto.py imports unconditionally at module load time. The gates don't use skimage directly, but run_gates.py imports generate_strokes_auto for rasterize() and bbox_from_mask(), which triggers the load-time import. Caught by G5 verification PR #1 (deliberate gate violation; expected G3 failure but workflow exited at G1's module load with ModuleNotFoundError: No module named 'skimage'). Matches ios-build.yml stroke_audit job's dep list (plus scipy which audit_invariants.py needs for distance_transform_edt). Dep audit confirmed scikit-image is the only missing entry; all other top-level imports (pillow, numpy, scipy, fonttools) were already present. Future cleanup (out of scope): generate_strokes_auto.py should move the skimage import inside the bake-pipeline functions that actually use it, decoupling gate code from bake-code dependencies. Flagged for future refactor. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
d382ac2 to
91f156c
Compare
TheDave94
pushed a commit
that referenced
this pull request
May 24, 2026
classifier — catches smooth long curves wrongly
admitted by max+p95 alone
Caught during G5 verification (PR #1, 2026-05-24): three
strokes in the full 59-letter corpus (Y s0, Y s1, g s1)
classified as STRAIGHT under the existing max+p95 criterion
but had perpendicular deviations of 24-70 px. Visual
rendering confirmed these are correctly-drawn smooth long
curves, not bake artifacts. The classifier hole: smooth
curves at N=100 resample have per-segment angles ~0.03 rad
(below max threshold) and p95 ~0.08 (below p95 threshold),
but accumulate substantial net direction change.
Fix: add a third criterion to is_straight requiring
|signed_cumulative_angle| < π/12 (15°). Empirically derived
from full-corpus diagnostic; sits in the 22.9°-wide gap
between the last well-behaved STRAIGHT stroke (ä s1 at 4.7°)
and the first offender (Y s1 at 27.6°).
Properties:
- N-invariant (zero-mean noise cancels at any N; unlike
unsigned cumulative)
- Preserves all 8 calibration corpus STRAIGHT strokes
(max |signed_cum| in corpus: 0.023 rad / 1.3°)
- Correctly filters Y s0 (27.6°), Y s1 (30.2°), g s1 (116°)
- Threshold of record (2.05 px) unchanged — Y/g were
previously wrongly admitted; they're now vacuous
Methodology note: fifth instance of design-prediction-meets-
data in Phase 2b Track B. Predicted classifier was adequate
for full 59-letter corpus; verification PR's deployment to
all letters falsified the prediction; refined criterion
derived from data. The "predict explicitly, verify
empirically, refine when data falsifies" methodology now has
five trail markers.
Files in this commit:
- scripts/audit_invariants.py: G3_STRAIGHTNESS_SIGNED_CUM_RAD
constant (=π/12, N-invariance documented); _stroke_angle_stats
returns (max, p95, signed_cum); gate_g3_per_stroke straightness
check uses three-part AND; result dict carries signed_cum_ref
- scripts/tests/test_gate_g3.py: two new tests (smooth-long-curve
vacuous via signed_cum; truly-straight-with-zero-mean-noise
still passes). 13 G3 tests total (was 11).
- research_data/phase2b_gates/g3_design.md: "Refinement caught
during G5 verification" subsection within G3.1 caveat — full
diagnostic data, N-invariance rationale, 8-corner classification
matrix
- research_data/phase2b_gates/g3_calibration_run.md: "Post-
deployment refinement" section — calibration corpus check
(all 8 strokes preserved), threshold-of-record unchanged
- docs/BAKE_INVARIANTS.md: Threshold 3 criterion expanded to
three-part AND with cross-reference
All 50 tests pass (13 G1 + 9 G2 + 13 G3 + 15 G4). Smoke test:
G3 on Y/g now correctly vacuous; G3 sweep over all 59 letters
on main: 59/59 pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
DO NOT MERGE. This branch is a throwaway end-to-end CI verification for .github/workflows/bake-gates.yml (G5). Modifies cps[15:25] of A s0 (left diagonal) by 0.010 perpendicular to the stroke direction. Locally-verified expected behavior: G1: PASS (Pearson 0.86 ≥ 0.2005) G3: FAIL (deviation 2.69 px > 2.05 px threshold) G4: PASS (junction kink drift 0.004° ≤ 4.43°) Workflow should fire on PR-open (path filter matches), all three gates should run, only G3 should fail, exit code should be non-zero, JSON artifacts should upload. PR will be closed without merging once verification is captured. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
91f156c to
6131f98
Compare
Owner
Author
|
Verification complete. Workflow confirmed: fires on PR open, runs G1/G3/G4, catches the deliberate A violation in G3 (dev=2.68 px > 2.05 px threshold), exits non-zero, uploads JSON artifacts. Two sidebar findings landed on main during verification: scikit-image dep fix (commit 987b0bc) and G3 classifier refinement with signed-cumulative criterion (commit c4c143b). Closing without merging. |
TheDave94
pushed a commit
that referenced
this pull request
May 24, 2026
PR #1 (verify/g5-smoke-test) closed without merging after end-to-end verification of .github/workflows/bake-gates.yml. Workflow confirmed operational: - Fires on PR-open when strokes.json or gate code changes - Runs G1/G3/G4 sequentially via run_gates.py - Catches deliberate violations (A s0 dev=2.68 px caught by G3) - Exits non-zero on gate failure (merge-blocker semantics) - Uploads bake-gate-results JSON artifact Two sidebar fixes landed on main during verification: - 987b0bc: scikit-image added to CI deps (was missing) - c4c143b: G3 classifier extended with signed-cumulative criterion to filter smooth-long-curves (Y, g) the max+p95 alone admitted g5_verification.md documents the full verification trail including methodology-chapter content: fifth instance of design-prediction-meets-data in Phase 2b Track B (G3 classifier hole surfaced by G5 deployment to all 59 letters, not just the 13-letter calibration corpus). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
DO NOT MERGE. This PR is a deliberate end-to-end verification of
.github/workflows/bake-gates.yml(G5).What this changes
PrimaeNative/Resources/Letters/Regular/A/strokes.json— shifts cps[15:25] of A s0 (left diagonal) by 0.010 perpendicular to the stroke direction.Expected workflow behavior
Locally-verified prediction:
Workflow should:
bake-gate-resultsartifact with per-gate JSONVerification plan
After CI runs: