Skip to content

[Spec] extend validation prompt bank with P21–P62 (42 prompts)#125

Merged
boldfield merged 2 commits into
mainfrom
validation-prompts-extension
May 10, 2026
Merged

[Spec] extend validation prompt bank with P21–P62 (42 prompts)#125
boldfield merged 2 commits into
mainfrom
validation-prompts-extension

Conversation

@boldfield
Copy link
Copy Markdown
Owner

Summary

Extends spec/validation-prompts.md with 42 new prompts (P21–P62) covering v1 surface previously untested. Every prompt's reference implementation was compiled and run against current main; oracles match observed output.

Total prompt bank: 62 (was 20).

Coverage groups

  • P21–P26 — Plan D shipped surfaces: tuples, std.pair, generic effect rows, per-op generics, row-poly fn params, conditional k-call. Closes the gap where Plan D's type-system + handler features had no validation prompts.
  • P27–P29 — handler features: return arm, multi-arm dispatch with std.state, nested handlers on distinct effects.
  • P30–P33 — Mem effect surface: MutArray, MutArray in-place sum, StringBuilder, MutByteArray + byte conversion. Entire Mem class was previously un-exercised.
  • P34–P35 — ByteArray: immutable checksum + UTF-8 validate + alloc roundtrip.
  • P36–P49 — stdlib usage: list map/filter/fold/sort, option unwrap, result match, string ops, char classifiers, format, raise.catch, state.run_state, choose.all_choices / first_choice, immutable array, persistent map.
  • P50–P52 — env/random/clock effects: env_var, deterministic xorshift via run_seeded_random, run_frozen_clock.
  • P53–P57 — numeric + ArithError: float, Int64, Bool operators, ArithError discharge with both div_by_zero and mod_by_zero arms, wrap-on-overflow at i64 boundary.
  • P58–P60 — patterns: 3-arity tuple destructure, nested constructor patterns, char literal patterns.
  • P61–P62 — misc: assert builtin no-op path, multi-import composition.

Notable adjustments from the original draft

The draft (~967 lines, in /tmp/sigil-validation/extended-prompts-draft.md) was written before verification. Reality diverged from the draft on several points; final prompts reflect what actually compiles and runs:

  • P24: dropped division (requires ArithError); switched to a positivity check that exercises the per-op A only.
  • P28: simplified the run_state composition; final state of incr+incr+decr is 1.
  • P34: byte_array_alloc only takes uniform fill; switched to string_to_bytes("ABC") to construct a ByteArray with specific bytes. The deferred Result-returning string_from_bytes wrapper doesn't exist in v1.
  • P35: same — use string_from_bytes_validate + string_from_bytes_alloc primitive pair directly (no Result wrapper).
  • P37: filter's pred must be pure; wrap n%2==0 in a discharging handler covering BOTH ArithError.div_by_zero and ArithError.mod_by_zero (E0142 requires exhaustive arms).
  • P40: string equality goes through std.ordering.string_compare returning Ordering — no string_eq builtin.
  • P41: string_to_int_validate returns Int (error code), not Option[Int].
  • P51: oracle exit code pinned at 170 (verified deterministic across runs of seed=42).
  • P52: run_frozen_clock body row is ![Clock] only — IO must happen outside the lambda.
  • P53: switched float operands to (1.5 + 2.0 * 1.25 = 4.0) to avoid IEEE 754 imprecision in the printed output.
  • P56: ArithError handler must include both div_by_zero AND mod_by_zero arms per E0142 exhaustiveness.

Verification

Each P21..P62's reference implementation lives at /tmp/sigil-validation/verify-extended/Pxx.sigil (workspace, not committed). All 42 compiled clean against target/release/sigil (built off current main, post-PR-#124 merge) and produced byte-exact output matching the documented oracle.

Out of scope

Per the original draft's "Coverage check" section, these surfaces remain intentionally uncovered:

  • std.fs ops, std.process.run, IO.read_line — non-deterministic / depend on filesystem or subprocess state.
  • First-class Continuation[OpRet, Ret] user surface — outside core multi-shot k(arg) idiom.
  • Wrapper-fn-frame composition — needs a dedicated 2-3 prompt sub-bank for handler composition correctness; deferred.

🤖 Generated with Claude Code

Adds 42 new validation prompts covering v1 surface previously
untested by the P01–P20 bank. Each new prompt's reference
implementation was compiled and run against current main; oracles
match observed output.

Coverage groups:

- **P21–P26 — Plan D shipped surfaces** (tuples, std.pair,
  generic effect rows, per-op generics, row-poly fn params,
  conditional k-call). Closes the gap where Plan D's type-system
  + handler features had no validation prompts.
- **P27–P29 — handler features** (return arm, multi-arm dispatch
  with std.state, nested handlers on distinct effects).
- **P30–P33 — Mem effect surface** (MutArray, MutArray in-place
  sum, StringBuilder, MutByteArray + byte conversion). Entire
  Mem class was previously un-exercised.
- **P34–P35 — ByteArray** (immutable checksum + UTF-8 validate
  + alloc roundtrip).
- **P36–P49 — stdlib usage** (list map/filter/fold/sort, option
  unwrap, result match, string ops, char classifiers, format,
  raise.catch, state.run_state, choose.all_choices /
  first_choice, immutable array, persistent map).
- **P50–P52 — env/random/clock effects** (env_var, deterministic
  xorshift via run_seeded_random, run_frozen_clock).
- **P53–P57 — numeric + ArithError** (float, Int64,
  Bool operators, ArithError discharge with both div_by_zero and
  mod_by_zero arms, wrap-on-overflow at i64 boundary).
- **P58–P60 — patterns** (3-arity tuple destructure, nested
  constructor patterns, char literal patterns).
- **P61–P62 — misc** (assert builtin no-op path, multi-import
  composition).

Notable adjustments from the original draft (per actual
verification):

- P24: removed division (requires ArithError); switched to a
  positivity check that exercises the per-op A only.
- P28: simplified the run_state composition; final state of
  incr+incr+decr is 1.
- P34: byte_array_alloc only takes uniform fill; switched to
  string_to_bytes("ABC") to construct a ByteArray with specific
  bytes. The deferred Result-returning string_from_bytes wrapper
  doesn't exist in v1.
- P35: same — use string_from_bytes_validate + alloc primitive
  pair directly (no Result wrapper).
- P37: filter's pred must be pure; wrap n%2==0 in a discharging
  handler covering BOTH ArithError.div_by_zero and mod_by_zero
  (E0142 requires exhaustive arms).
- P40: string equality goes through std.ordering's string_compare
  returning Ordering — no string_eq builtin.
- P41: string_to_int_validate returns Int (error code), not
  Option[Int].
- P51: oracle exit code pinned at 170 (verified deterministic
  across runs of seed=42).
- P52: run_frozen_clock body row is ![Clock] only — IO must
  happen outside the lambda.
- P53: switched float operands to (1.5 + 2.0 * 1.25 = 4.0) to
  avoid IEEE 754 imprecision in the printed output.
- P56: ArithError handler must include both div_by_zero AND
  mod_by_zero arms per E0142 exhaustiveness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@boldfield boldfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: P21–P62 validation prompt bank

Overall: Well-structured, thorough, and carefully verified. The 42 prompts provide strong incremental coverage over the Plan D type-system surfaces, the Mem effect class, and most stdlib modules that P01–P20 left untouched. Formatting is consistent with established conventions. No must-fix issues.

Issue: P57 contradicts spec's stated Int range

P57 uses let big: Int = 9223372036854775807; (2^63 − 1, i64 max) and documents wrapping to i64 min. This was verified against the compiler and works.

Problem: the spec disagrees. §1 line 563 says:

Range: [-2^62, 2^62) (63-bit tagged Int).

And §12 says:

Int is 63-bit at FFI boundaries (one bit reserved for the heap-vs-immediate tag)

2^63 − 1 is well outside [-2^62, 2^62). The int_abs(i64::MIN) reference in §13.2 muddies this further by assuming full i64 range.

The validation harness runs prompts against a fresh LLM session given only spec/language.md. An LLM reading "Range: [-2^62, 2^62)" could legitimately refuse to emit a literal outside that range, even though the prompt tells it to. This makes P57 fragile against spec-compliant generation.

Options:

  1. Fix the spec — if the compiler actually accepts full i64 range, update §1/§12 to say so, then P57 is clean.
  2. Adjust P57 to use 4611686018427387903 (2^62 − 1, the stated max) and document what stated_max + 1 produces. This tests overflow at the spec-documented boundary rather than the i64 boundary.

Either way, the spec and prompt should agree.

Observation: P51 oracle is implementation-coupled

P51 pins exit code to 170 (xorshift64 first draw from seed=42, mod 256). The notes say "verified deterministic across runs," which is correct — but determinism only holds while the PRNG implementation is unchanged. If the xorshift variant or seeding strategy changes, this prompt silently becomes a regression trap rather than a spec-validation tool.

Not a blocker — this is inherent to testing deterministic randomness. But consider adding a note like "oracle depends on the xorshift64 variant documented in §13.2" to make the coupling explicit for future maintainers.

Minor: P25 and P44 both exercise catch

P25 imports std.raise and uses catch (to test row-polymorphic | e tails). P44 also imports std.raise and uses catch (to test the canonical discharger pattern). They test different aspects, so this isn't redundant, but the two could cross-reference each other in their notes to make the distinction explicit.

Verified correct

  • All stdlib export names checked against source — every import std.X + builtin reference in P21–P62 matches the actual module exports.
  • string_to_int_validate returning Int error code (not Option): confirmed.
  • filter's pure pred (A) -> Bool ![]: confirmed.
  • catch in std.raise (not std.result): confirmed; P25/P44 import correctly.
  • ArithError exhaustiveness (both div_by_zero + mod_by_zero arms): P37 and P56 both handle this correctly.
  • No feature overlap with P01–P20; coverage is strictly incremental.
  • Formatting conventions (heading style, field order, code blocks) are consistent with P01–P20.

Spec — fix Int range claims (PR #125 review item 1)

Three sites in spec/language.md claimed 63-bit Int (`[-2^62, 2^62)`),
but the compiler actually accepts and operates on full signed
i64. Verified empirically: `9223372036854775807` (i64 max) parses
and prints round-trip clean; `9223372036854775807 + 1` wraps to
i64 min; `9223372036854775808` fires E0050 with the message
'integer literal X is out of range for Int (i64)' — the compiler
itself names Int as i64.

Updated:
- §1 line 563 (literal range)
- §3.1 line 635 (Int built-in description)
- §12 line 1403 (runtime model tagged-values description)

P57's '9223372036854775807' literal + wrap-on-overflow oracle is
now consistent with the corrected spec.

Prompts — minor review notes:

- P51: added explicit note that the oracle is coupled to the
  xorshift64 variant + seeding strategy in std/random.sigil; if
  either changes, the oracle must be re-pinned. Makes the
  implementation-coupling explicit for future maintainers.
- P25/P44: added cross-reference notes distinguishing what each
  tests (P25 = row-poly | e tail mechanism; P44 = canonical
  discharger pattern without row-tail variation).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@boldfield boldfield merged commit 9afd645 into main May 10, 2026
4 checks passed
@boldfield boldfield deleted the validation-prompts-extension branch May 10, 2026 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant