[Spec] extend validation prompt bank with P21–P62 (42 prompts)#125
Conversation
Adds 42 new validation prompts covering v1 surface previously
untested by the P01–P20 bank. Each new prompt's reference
implementation was compiled and run against current main; oracles
match observed output.
Coverage groups:
- **P21–P26 — Plan D shipped surfaces** (tuples, std.pair,
generic effect rows, per-op generics, row-poly fn params,
conditional k-call). Closes the gap where Plan D's type-system
+ handler features had no validation prompts.
- **P27–P29 — handler features** (return arm, multi-arm dispatch
with std.state, nested handlers on distinct effects).
- **P30–P33 — Mem effect surface** (MutArray, MutArray in-place
sum, StringBuilder, MutByteArray + byte conversion). Entire
Mem class was previously un-exercised.
- **P34–P35 — ByteArray** (immutable checksum + UTF-8 validate
+ alloc roundtrip).
- **P36–P49 — stdlib usage** (list map/filter/fold/sort, option
unwrap, result match, string ops, char classifiers, format,
raise.catch, state.run_state, choose.all_choices /
first_choice, immutable array, persistent map).
- **P50–P52 — env/random/clock effects** (env_var, deterministic
xorshift via run_seeded_random, run_frozen_clock).
- **P53–P57 — numeric + ArithError** (float, Int64,
Bool operators, ArithError discharge with both div_by_zero and
mod_by_zero arms, wrap-on-overflow at i64 boundary).
- **P58–P60 — patterns** (3-arity tuple destructure, nested
constructor patterns, char literal patterns).
- **P61–P62 — misc** (assert builtin no-op path, multi-import
composition).
Notable adjustments from the original draft (per actual
verification):
- P24: removed division (requires ArithError); switched to a
positivity check that exercises the per-op A only.
- P28: simplified the run_state composition; final state of
incr+incr+decr is 1.
- P34: byte_array_alloc only takes uniform fill; switched to
string_to_bytes("ABC") to construct a ByteArray with specific
bytes. The deferred Result-returning string_from_bytes wrapper
doesn't exist in v1.
- P35: same — use string_from_bytes_validate + alloc primitive
pair directly (no Result wrapper).
- P37: filter's pred must be pure; wrap n%2==0 in a discharging
handler covering BOTH ArithError.div_by_zero and mod_by_zero
(E0142 requires exhaustive arms).
- P40: string equality goes through std.ordering's string_compare
returning Ordering — no string_eq builtin.
- P41: string_to_int_validate returns Int (error code), not
Option[Int].
- P51: oracle exit code pinned at 170 (verified deterministic
across runs of seed=42).
- P52: run_frozen_clock body row is ![Clock] only — IO must
happen outside the lambda.
- P53: switched float operands to (1.5 + 2.0 * 1.25 = 4.0) to
avoid IEEE 754 imprecision in the printed output.
- P56: ArithError handler must include both div_by_zero AND
mod_by_zero arms per E0142 exhaustiveness.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
boldfield
left a comment
There was a problem hiding this comment.
Review: P21–P62 validation prompt bank
Overall: Well-structured, thorough, and carefully verified. The 42 prompts provide strong incremental coverage over the Plan D type-system surfaces, the Mem effect class, and most stdlib modules that P01–P20 left untouched. Formatting is consistent with established conventions. No must-fix issues.
Issue: P57 contradicts spec's stated Int range
P57 uses let big: Int = 9223372036854775807; (2^63 − 1, i64 max) and documents wrapping to i64 min. This was verified against the compiler and works.
Problem: the spec disagrees. §1 line 563 says:
Range:
[-2^62, 2^62)(63-bit tagged Int).
And §12 says:
Intis 63-bit at FFI boundaries (one bit reserved for the heap-vs-immediate tag)
2^63 − 1 is well outside [-2^62, 2^62). The int_abs(i64::MIN) reference in §13.2 muddies this further by assuming full i64 range.
The validation harness runs prompts against a fresh LLM session given only spec/language.md. An LLM reading "Range: [-2^62, 2^62)" could legitimately refuse to emit a literal outside that range, even though the prompt tells it to. This makes P57 fragile against spec-compliant generation.
Options:
- Fix the spec — if the compiler actually accepts full i64 range, update §1/§12 to say so, then P57 is clean.
- Adjust P57 to use
4611686018427387903(2^62 − 1, the stated max) and document whatstated_max + 1produces. This tests overflow at the spec-documented boundary rather than the i64 boundary.
Either way, the spec and prompt should agree.
Observation: P51 oracle is implementation-coupled
P51 pins exit code to 170 (xorshift64 first draw from seed=42, mod 256). The notes say "verified deterministic across runs," which is correct — but determinism only holds while the PRNG implementation is unchanged. If the xorshift variant or seeding strategy changes, this prompt silently becomes a regression trap rather than a spec-validation tool.
Not a blocker — this is inherent to testing deterministic randomness. But consider adding a note like "oracle depends on the xorshift64 variant documented in §13.2" to make the coupling explicit for future maintainers.
Minor: P25 and P44 both exercise catch
P25 imports std.raise and uses catch (to test row-polymorphic | e tails). P44 also imports std.raise and uses catch (to test the canonical discharger pattern). They test different aspects, so this isn't redundant, but the two could cross-reference each other in their notes to make the distinction explicit.
Verified correct
- All stdlib export names checked against source — every
import std.X+ builtin reference in P21–P62 matches the actual module exports. string_to_int_validatereturningInterror code (notOption): confirmed.filter's pure pred(A) -> Bool ![]: confirmed.catchinstd.raise(notstd.result): confirmed; P25/P44 import correctly.- ArithError exhaustiveness (both
div_by_zero+mod_by_zeroarms): P37 and P56 both handle this correctly. - No feature overlap with P01–P20; coverage is strictly incremental.
- Formatting conventions (heading style, field order, code blocks) are consistent with P01–P20.
Spec — fix Int range claims (PR #125 review item 1) Three sites in spec/language.md claimed 63-bit Int (`[-2^62, 2^62)`), but the compiler actually accepts and operates on full signed i64. Verified empirically: `9223372036854775807` (i64 max) parses and prints round-trip clean; `9223372036854775807 + 1` wraps to i64 min; `9223372036854775808` fires E0050 with the message 'integer literal X is out of range for Int (i64)' — the compiler itself names Int as i64. Updated: - §1 line 563 (literal range) - §3.1 line 635 (Int built-in description) - §12 line 1403 (runtime model tagged-values description) P57's '9223372036854775807' literal + wrap-on-overflow oracle is now consistent with the corrected spec. Prompts — minor review notes: - P51: added explicit note that the oracle is coupled to the xorshift64 variant + seeding strategy in std/random.sigil; if either changes, the oracle must be re-pinned. Makes the implementation-coupling explicit for future maintainers. - P25/P44: added cross-reference notes distinguishing what each tests (P25 = row-poly | e tail mechanism; P44 = canonical discharger pattern without row-tail variation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Extends
spec/validation-prompts.mdwith 42 new prompts (P21–P62) covering v1 surface previously untested. Every prompt's reference implementation was compiled and run against current main; oracles match observed output.Total prompt bank: 62 (was 20).
Coverage groups
Notable adjustments from the original draft
The draft (~967 lines, in
/tmp/sigil-validation/extended-prompts-draft.md) was written before verification. Reality diverged from the draft on several points; final prompts reflect what actually compiles and runs:byte_array_alloconly takes uniform fill; switched tostring_to_bytes("ABC")to construct aByteArraywith specific bytes. The deferredResult-returningstring_from_byteswrapper doesn't exist in v1.string_from_bytes_validate+string_from_bytes_allocprimitive pair directly (noResultwrapper).filter's pred must be pure; wrapn%2==0in a discharging handler covering BOTHArithError.div_by_zeroandArithError.mod_by_zero(E0142 requires exhaustive arms).std.ordering.string_comparereturningOrdering— nostring_eqbuiltin.string_to_int_validatereturnsInt(error code), notOption[Int].run_frozen_clockbody row is![Clock]only — IO must happen outside the lambda.(1.5 + 2.0 * 1.25 = 4.0)to avoid IEEE 754 imprecision in the printed output.div_by_zeroANDmod_by_zeroarms per E0142 exhaustiveness.Verification
Each P21..P62's reference implementation lives at
/tmp/sigil-validation/verify-extended/Pxx.sigil(workspace, not committed). All 42 compiled clean againsttarget/release/sigil(built off current main, post-PR-#124 merge) and produced byte-exact output matching the documented oracle.Out of scope
Per the original draft's "Coverage check" section, these surfaces remain intentionally uncovered:
std.fsops,std.process.run,IO.read_line— non-deterministic / depend on filesystem or subprocess state.Continuation[OpRet, Ret]user surface — outside core multi-shot k(arg) idiom.🤖 Generated with Claude Code