[Spec] extend validation prompt bank with P21–P62 (42 prompts) by boldfield · Pull Request #125 · boldfield/sigil

boldfield · 2026-05-10T01:21:24Z

Summary

Extends spec/validation-prompts.md with 42 new prompts (P21–P62) covering v1 surface previously untested. Every prompt's reference implementation was compiled and run against current main; oracles match observed output.

Total prompt bank: 62 (was 20).

Coverage groups

P21–P26 — Plan D shipped surfaces: tuples, std.pair, generic effect rows, per-op generics, row-poly fn params, conditional k-call. Closes the gap where Plan D's type-system + handler features had no validation prompts.
P27–P29 — handler features: return arm, multi-arm dispatch with std.state, nested handlers on distinct effects.
P30–P33 — Mem effect surface: MutArray, MutArray in-place sum, StringBuilder, MutByteArray + byte conversion. Entire Mem class was previously un-exercised.
P34–P35 — ByteArray: immutable checksum + UTF-8 validate + alloc roundtrip.
P36–P49 — stdlib usage: list map/filter/fold/sort, option unwrap, result match, string ops, char classifiers, format, raise.catch, state.run_state, choose.all_choices / first_choice, immutable array, persistent map.
P50–P52 — env/random/clock effects: env_var, deterministic xorshift via run_seeded_random, run_frozen_clock.
P53–P57 — numeric + ArithError: float, Int64, Bool operators, ArithError discharge with both div_by_zero and mod_by_zero arms, wrap-on-overflow at i64 boundary.
P58–P60 — patterns: 3-arity tuple destructure, nested constructor patterns, char literal patterns.
P61–P62 — misc: assert builtin no-op path, multi-import composition.

Notable adjustments from the original draft

The draft (~967 lines, in /tmp/sigil-validation/extended-prompts-draft.md) was written before verification. Reality diverged from the draft on several points; final prompts reflect what actually compiles and runs:

P24: dropped division (requires ArithError); switched to a positivity check that exercises the per-op A only.
P28: simplified the run_state composition; final state of incr+incr+decr is 1.
P34: byte_array_alloc only takes uniform fill; switched to string_to_bytes("ABC") to construct a ByteArray with specific bytes. The deferred Result-returning string_from_bytes wrapper doesn't exist in v1.
P35: same — use string_from_bytes_validate + string_from_bytes_alloc primitive pair directly (no Result wrapper).
P37: filter's pred must be pure; wrap n%2==0 in a discharging handler covering BOTH ArithError.div_by_zero and ArithError.mod_by_zero (E0142 requires exhaustive arms).
P40: string equality goes through std.ordering.string_compare returning Ordering — no string_eq builtin.
P41: string_to_int_validate returns Int (error code), not Option[Int].
P51: oracle exit code pinned at 170 (verified deterministic across runs of seed=42).
P52: run_frozen_clock body row is ![Clock] only — IO must happen outside the lambda.
P53: switched float operands to (1.5 + 2.0 * 1.25 = 4.0) to avoid IEEE 754 imprecision in the printed output.
P56: ArithError handler must include both div_by_zero AND mod_by_zero arms per E0142 exhaustiveness.

Verification

Each P21..P62's reference implementation lives at /tmp/sigil-validation/verify-extended/Pxx.sigil (workspace, not committed). All 42 compiled clean against target/release/sigil (built off current main, post-PR-#124 merge) and produced byte-exact output matching the documented oracle.

Out of scope

Per the original draft's "Coverage check" section, these surfaces remain intentionally uncovered:

std.fs ops, std.process.run, IO.read_line — non-deterministic / depend on filesystem or subprocess state.
First-class Continuation[OpRet, Ret] user surface — outside core multi-shot k(arg) idiom.
Wrapper-fn-frame composition — needs a dedicated 2-3 prompt sub-bank for handler composition correctness; deferred.

🤖 Generated with Claude Code

Adds 42 new validation prompts covering v1 surface previously untested by the P01–P20 bank. Each new prompt's reference implementation was compiled and run against current main; oracles match observed output. Coverage groups: - **P21–P26 — Plan D shipped surfaces** (tuples, std.pair, generic effect rows, per-op generics, row-poly fn params, conditional k-call). Closes the gap where Plan D's type-system + handler features had no validation prompts. - **P27–P29 — handler features** (return arm, multi-arm dispatch with std.state, nested handlers on distinct effects). - **P30–P33 — Mem effect surface** (MutArray, MutArray in-place sum, StringBuilder, MutByteArray + byte conversion). Entire Mem class was previously un-exercised. - **P34–P35 — ByteArray** (immutable checksum + UTF-8 validate + alloc roundtrip). - **P36–P49 — stdlib usage** (list map/filter/fold/sort, option unwrap, result match, string ops, char classifiers, format, raise.catch, state.run_state, choose.all_choices / first_choice, immutable array, persistent map). - **P50–P52 — env/random/clock effects** (env_var, deterministic xorshift via run_seeded_random, run_frozen_clock). - **P53–P57 — numeric + ArithError** (float, Int64, Bool operators, ArithError discharge with both div_by_zero and mod_by_zero arms, wrap-on-overflow at i64 boundary). - **P58–P60 — patterns** (3-arity tuple destructure, nested constructor patterns, char literal patterns). - **P61–P62 — misc** (assert builtin no-op path, multi-import composition). Notable adjustments from the original draft (per actual verification): - P24: removed division (requires ArithError); switched to a positivity check that exercises the per-op A only. - P28: simplified the run_state composition; final state of incr+incr+decr is 1. - P34: byte_array_alloc only takes uniform fill; switched to string_to_bytes("ABC") to construct a ByteArray with specific bytes. The deferred Result-returning string_from_bytes wrapper doesn't exist in v1. - P35: same — use string_from_bytes_validate + alloc primitive pair directly (no Result wrapper). - P37: filter's pred must be pure; wrap n%2==0 in a discharging handler covering BOTH ArithError.div_by_zero and mod_by_zero (E0142 requires exhaustive arms). - P40: string equality goes through std.ordering's string_compare returning Ordering — no string_eq builtin. - P41: string_to_int_validate returns Int (error code), not Option[Int]. - P51: oracle exit code pinned at 170 (verified deterministic across runs of seed=42). - P52: run_frozen_clock body row is ![Clock] only — IO must happen outside the lambda. - P53: switched float operands to (1.5 + 2.0 * 1.25 = 4.0) to avoid IEEE 754 imprecision in the printed output. - P56: ArithError handler must include both div_by_zero AND mod_by_zero arms per E0142 exhaustiveness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

boldfield

Review: P21–P62 validation prompt bank

Overall: Well-structured, thorough, and carefully verified. The 42 prompts provide strong incremental coverage over the Plan D type-system surfaces, the Mem effect class, and most stdlib modules that P01–P20 left untouched. Formatting is consistent with established conventions. No must-fix issues.

Issue: P57 contradicts spec's stated Int range

P57 uses let big: Int = 9223372036854775807; (2^63 − 1, i64 max) and documents wrapping to i64 min. This was verified against the compiler and works.

Problem: the spec disagrees. §1 line 563 says:

Range: [-2^62, 2^62) (63-bit tagged Int).

And §12 says:

Int is 63-bit at FFI boundaries (one bit reserved for the heap-vs-immediate tag)

2^63 − 1 is well outside [-2^62, 2^62). The int_abs(i64::MIN) reference in §13.2 muddies this further by assuming full i64 range.

The validation harness runs prompts against a fresh LLM session given only spec/language.md. An LLM reading "Range: [-2^62, 2^62)" could legitimately refuse to emit a literal outside that range, even though the prompt tells it to. This makes P57 fragile against spec-compliant generation.

Options:

Fix the spec — if the compiler actually accepts full i64 range, update §1/§12 to say so, then P57 is clean.
Adjust P57 to use 4611686018427387903 (2^62 − 1, the stated max) and document what stated_max + 1 produces. This tests overflow at the spec-documented boundary rather than the i64 boundary.

Either way, the spec and prompt should agree.

Observation: P51 oracle is implementation-coupled

P51 pins exit code to 170 (xorshift64 first draw from seed=42, mod 256). The notes say "verified deterministic across runs," which is correct — but determinism only holds while the PRNG implementation is unchanged. If the xorshift variant or seeding strategy changes, this prompt silently becomes a regression trap rather than a spec-validation tool.

Not a blocker — this is inherent to testing deterministic randomness. But consider adding a note like "oracle depends on the xorshift64 variant documented in §13.2" to make the coupling explicit for future maintainers.

Minor: P25 and P44 both exercise `catch`

P25 imports std.raise and uses catch (to test row-polymorphic | e tails). P44 also imports std.raise and uses catch (to test the canonical discharger pattern). They test different aspects, so this isn't redundant, but the two could cross-reference each other in their notes to make the distinction explicit.

Verified correct

All stdlib export names checked against source — every import std.X + builtin reference in P21–P62 matches the actual module exports.
string_to_int_validate returning Int error code (not Option): confirmed.
filter's pure pred (A) -> Bool ![]: confirmed.
catch in std.raise (not std.result): confirmed; P25/P44 import correctly.
ArithError exhaustiveness (both div_by_zero + mod_by_zero arms): P37 and P56 both handle this correctly.
No feature overlap with P01–P20; coverage is strictly incremental.
Formatting conventions (heading style, field order, code blocks) are consistent with P01–P20.

Spec — fix Int range claims (PR #125 review item 1) Three sites in spec/language.md claimed 63-bit Int (`[-2^62, 2^62)`), but the compiler actually accepts and operates on full signed i64. Verified empirically: `9223372036854775807` (i64 max) parses and prints round-trip clean; `9223372036854775807 + 1` wraps to i64 min; `9223372036854775808` fires E0050 with the message 'integer literal X is out of range for Int (i64)' — the compiler itself names Int as i64. Updated: - §1 line 563 (literal range) - §3.1 line 635 (Int built-in description) - §12 line 1403 (runtime model tagged-values description) P57's '9223372036854775807' literal + wrap-on-overflow oracle is now consistent with the corrected spec. Prompts — minor review notes: - P51: added explicit note that the oracle is coupled to the xorshift64 variant + seeding strategy in std/random.sigil; if either changes, the oracle must be re-pinned. Makes the implementation-coupling explicit for future maintainers. - P25/P44: added cross-reference notes distinguishing what each tests (P25 = row-poly | e tail mechanism; P44 = canonical discharger pattern without row-tail variation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

boldfield commented May 10, 2026

View reviewed changes

boldfield merged commit 9afd645 into main May 10, 2026
4 checks passed

boldfield deleted the validation-prompts-extension branch May 10, 2026 01:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spec] extend validation prompt bank with P21–P62 (42 prompts)#125

[Spec] extend validation prompt bank with P21–P62 (42 prompts)#125
boldfield merged 2 commits into
mainfrom
validation-prompts-extension

boldfield commented May 10, 2026

Uh oh!

boldfield left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

boldfield commented May 10, 2026

Summary

Coverage groups

Notable adjustments from the original draft

Verification

Out of scope

Uh oh!

boldfield left a comment

Choose a reason for hiding this comment

Review: P21–P62 validation prompt bank

Issue: P57 contradicts spec's stated Int range

Observation: P51 oracle is implementation-coupled

Minor: P25 and P44 both exercise catch

Verified correct

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Minor: P25 and P44 both exercise `catch`