Boxed Float (IEEE 754 f64) type by boldfield · Pull Request #101 · boldfield/sigil

boldfield · 2026-05-06T05:30:44Z

Summary

Add Float as a boxed IEEE 754 f64 type following the Int64 boxed-scalar pattern
Header constant TAG_FLOAT = 0x0B, 21 runtime FFI primitives (arithmetic, comparison, math, conversion)
Float literal syntax in lexer/parser (3.14, 1e10, 2.5e-3), type system integration, full codegen wiring
float_to_string always includes .0 for whole numbers (e.g., 4.0 not 4); inf/NaN unchanged
10 e2e tests + runtime unit tests covering the full surface
spec/language.md updated: Float in literal syntax, type table, expression forms, stdlib reference, runtime model; removed from v1 limits

Key design decisions

ABI: sigil_float_box accepts i64 bit pattern (not f64) to stay in Cranelift's integer register class — codegen uses f64::to_bits() as i64 → iconst(I64) → call
Division: IEEE 754 semantics (div-by-zero → ±Inf, not abort)
float_to_string formatting: appends .0 to whole numbers so Float values are always visually distinct from Int

Test plan

10 e2e tests: literals, arithmetic, negation, div-by-zero→inf, from_int round-trip, comparisons, floor/ceil, string parse/validate, NaN≠NaN, doc-only import
Runtime unit tests: boxing, arithmetic, comparison, math, conversion, to_string formatting (3.14, 4.0, inf, NaN)
Full workspace test suite passes (330/330; 3 pre-existing perf-floor flakes excluded)

Implements plan: 2026-05-05-sigil-float.md

🤖 Generated with Claude Code

Add TAG_FLOAT = 0x0B, FloatAllocCount/FloatAllocBytes counters, and runtime/src/float.rs with 21 FFI primitives: box/unbox, 5 arithmetic, 5 comparison, 4 math, and 5 conversion ops. All follow the Int64 boxed-scalar pattern (atomic alloc, count=1, bitmap=0). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add FloatLit(f64) token and Expr::FloatLit(f64, Span) AST variant. The lexer recognizes `3.14`, `1e10`, `3.14e-2` as float literals (requires digits before and after the dot). Parser maps the token and constant-folds unary negation on float literals. All match arms across resolve, monomorphize, closure_convert, color, elaborate, typecheck, and codegen updated for exhaustiveness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Register opaque Float type in builtin_types(). Add register_builtin_float_schemes() with 16 primitives: 4 arithmetic, 5 comparison, 4 math, 5 conversion/string ops. Float literal inference (Expr::FloatLit → Ty::User("Float")) already added in Task F2's exhaustiveness pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add 20 FFI declarations for float runtime primitives to BuiltinFuncIds/BuiltinFuncRefs. Float literals lower via f64::to_bits() → iconst → sigil_float_box (bit-pattern calling convention). All 19 callable primitives dispatch through lower_call with correct stackmap placeholders for allocating ops. Fix sigil_float_box to accept i64 bit pattern (integer register class) instead of f64, matching codegen's iconst path. Add std/float.sigil (doc-only) and imports.rs BUILTIN_INJECTED entry. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

10 e2e tests covering float literals, arithmetic, negation, div-by-zero→inf, from_int round-trip, comparisons, floor/ceil, string parse/validate, NaN≠NaN, doc-only import. float_to_string now appends ".0" to whole-number results so Float values are always visually distinguishable from Int (inf/NaN unchanged). Runtime unit tests extended to cover the formatting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add Float to: literal syntax (§1), type table (§3), expression forms (§4.1), stdlib reference (§13), runtime model (§12). Remove "no Float type" from v1 limits (§14). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

boldfield

Code Review: Boxed Float Type

Clean implementation that mirrors the established Int64 boxed-scalar pattern. Correctness looks good across all layers. A few issues worth addressing:

Issues

1. Lexer: greedy exponent consumption produces misleading errors

The exponent branch (line ~164 of lexer.rs) unconditionally consumes e/E and an optional sign without verifying that exponent digits follow. Input like 1e or 1e+ gets consumed and then fails with "float literal 1e is out of range" — but it's not out of range, it's malformed.

A simple fix: peek ahead for at least one digit before committing to the exponent parse. Otherwise, leave the e unconsumed and treat the preceding part as an integer literal. This matters because .sigil source with a typo (1e) currently eats characters that belong to the next token and gives a confusing diagnostic.

2. Dead code in codegen: make_float_binop takes unused name parameter

let make_float_binop = |name: &str, sig_holder: &mut Signature| {
    // ...
    let _ = name;
};

The name parameter is immediately discarded. The closure doesn't need it — drop the parameter.

Observations (not blocking)

float_to_int clamping is redundant with Rust's saturating as semantics (f64 as i64 already returns 0 for NaN, i64::MAX for overflow, i64::MIN for underflow since Rust 1.45). The explicit checks make the contract more visible to readers though, so not wrong — just worth knowing that removing them wouldn't change behavior.
"Plan D" comments appear throughout the codegen additions. The PR summary and branch name suggest this is a Plan C continuation (Float type for the Sigil language). If "Plan D" is intentional naming, ignore this — just flagging potential naming drift.

Verdict

Solid work. The lexer exponent greediness is the only thing I'd want fixed before merge — it'll bite someone debugging a typo in float-heavy code and getting a confusing diagnostic. The dead parameter is a one-line cleanup.

1. Lexer: exponent branch now peeks ahead for at least one digit before committing to consume `e`/`E` and optional sign. `1e` lexes as IntLit(1) + Ident("e"), not a malformed float. Three new lexer unit tests pin the behavior. 2. Codegen: remove unused `name` parameter from `make_float_binop` closure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Change test values from 3.14/2.718 (approx PI/E) to 3.125/2.75 to avoid clippy::approx_constant lint - Add SAFETY comments on .as_ptr() calls for interior-pointer check Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- 14 e2e tests cover the full Char surface: char_literal_round_trips_via_to_string, char_codepoint_round_trip, int_to_char_rejects_out_of_range, int_to_char_rejects_surrogate, int_to_char_accepts_valid, is_ascii_classifiers_basic, to_lower_upper_ascii_passthrough, string_chars_ascii, string_chars_multibyte, string_char_at_codepoint_index, string_from_chars_round_trip, char_pattern_match_against_literal, char_eq_distinguishes_different_codepoints, char_doc_only_import. - std/char.sigil ships as a documentation-only file (added to BUILTIN_INJECTED skip-list). Covers literal syntax, the 19 user-facing primitives, the ASCII-only v1 scope + v2 closure path, the byte-vs-codepoint cross-reference to std/string.sigil, and a worked count_digits example. - std/string.sigil doc preamble rewritten — the "Codepoint-aware variants deferred" note becomes a positive byte-vs-codepoint surface description; the new codepoint ops surface (string_chars, string_char_at, string_from_chars) is added to the builtin operations table with a cross-reference to std/char.sigil. - PLAN_C_PROGRESS.md gets a Task CH section documenting the addendum's runtime + compiler + stdlib + test surface and the pre-existing Char-as-I32 → boxed-Char representation switch. - PLAN_C_DEVIATIONS.md Task 68 entry's "deferred class 1" (codepoint-aware ops) and "deferred class 5" (char_to_int / int_to_char) marked CLOSED by this addendum. Class 3 (Float helpers) marked CLOSED by PR #101 (already shipped). Class 2 (codepoint-aware string_split / string_replace) remains deferred — depends on stdlib namespace qualification, not Char.

* [Task CH1] Char header tag + runtime primitives Adds TAG_CHAR=0x0C and a complete `runtime/src/char.rs` module covering boxing, equality/ordering, conversion, ASCII classifiers, ASCII case folding, and the load-bearing UTF-8 walkers (`string_chars`, `string_char_at`, `string_from_chars`). `string_chars` and `string_from_chars` accept Cons / Nil header words and discriminants from the codegen call site so the runtime stays free of compile-time `List[Char]` layout knowledge while still emitting well-typed `List[Char]` values. Lossy UTF-8 decode emits U+FFFD per invalid byte and resyncs at the next valid leading byte. Adds `CharAllocCount` / `CharAllocBytes` counters; 32 unit tests cover box/unbox round-trips for ASCII / 2-byte / 3-byte / 4-byte codepoints, every classifier, ASCII case passthrough on non-ASCII, `int_to_char_validate` boundary cases (surrogates, > 0x10FFFF, negative), `string_chars` on valid + invalid UTF-8 input, codepoint- indexed `string_char_at`, and `string_from_chars` round-trip through `string_chars`. * [Task CH2] Lexer + parser — Char literal extensions Extends the existing `'x'` Char literal lexer with the full plan-spec escape and validity surface: - New escape sequences: `\"`, `\0`, `\u{HEX}` (1–6 hex digits). - Bare multi-byte codepoints (`'é'`, `'中'`, `'😀'`) decoded via a new UTF-8 peek/advance helper pair on the byte-based cursor; the prior `self.src[pos] as char` shortcut would otherwise see N source bytes and reject as multi-codepoint. - Compile-time rejection of `\u{...}` values > 0x10FFFF and any surrogate `0xD800..=0xDFFF` (E0010 with descriptive message). - Compile-time rejection of empty `\u{}` and >6-hex-digit escapes. - Multi-codepoint literal bodies (`'ab'`, `'ab\u{41}'`) now produce the targeted "Char literal must be a single codepoint; got N" diagnostic rather than the prior "expected closing `'`" surface. Token type stays `CharLit(char)` — Rust's `char` already enforces the post-validation invariant (no surrogates, ≤ 0x10FFFF). Existing parser-side mapping (`Token::CharLit` → `Expr::CharLit`) is unchanged. 15 new lexer tests cover every escape, each Unicode boundary (0xD7FF / 0xE000 / 0x10FFFF), each rejection case (empty braces, too many digits, out-of-range, low/high surrogates), and the three multi-byte bare-codepoint widths. * [Task CH3] Char builtin schemes — 19 user-facing primitives Registers the user-facing Char primitives via a new `register_builtin_char_schemes(tc)` mirroring `register_builtin_float_schemes`: - 5 equality / ordering: char_eq, char_lt, char_le, char_gt, char_ge — `(Char, Char) -> Bool` - 3 conversion: char_to_int `(Char) -> Int`, int_to_char `(Int) -> Option[Char]`, char_to_string `(Char) -> String` - 5 ASCII classifiers: is_ascii, is_ascii_digit, is_ascii_alpha, is_ascii_alphanumeric, is_ascii_whitespace — `(Char) -> Bool` - 2 ASCII case: to_lower_ascii, to_upper_ascii — `(Char) -> Char` - 4 string codepoint ops: string_chars `(String) -> List[Char]`, string_char_at `(String, Int) -> Option[Char]`, string_from_chars `(List[Char]) -> String` Char itself is already a Ty::Char primitive (pre-existing) and literal type inference (`Expr::CharLit -> Ty::Char`) is unchanged. The validator helpers `int_to_char_validate` / `string_char_at_validate` are codegen-internal — not registered as user-callable schemes; codegen will lower the user-facing `int_to_char` / `string_char_at` to the validate-then-construct pattern in CH4. 13 typecheck unit tests cover each user-facing scheme: round-trip type inference, Option[Char] / List[Char] return types, E0044 on wrong-argument-type calls, E0045 on Char-vs-Int annotation mismatch, and Char literal default inference. * [Task CH4] Codegen — boxed Char + 19-primitive dispatch Converts Char from an I32 codepoint immediate to a `TAG_CHAR`-headed heap pointer (mirroring Float / Int64 / String) and wires the 19 user-facing Char primitives through `lower_call`. Type-mapping changes: - `cranelift_ty_for_type_expr("Char")` and `cranelift_ty_of_ty(Ty::Char)` return `pointer_ty` instead of `types::I32`. - `Expr::CharLit(c, _)` lowers to `iconst(I64, codepoint) → sigil_char_box → pointer_ty` (with stackmap placeholder), no longer bare `iconst(I32, c)`. - `type_of_expr(Expr::CharLit)` returns `pointer_ty`. - `Pattern::CharLit(c)` loads the u32 codepoint at offset 8 from the boxed Char scrutinee, zero-extends to i64, and tests for equality against the literal codepoint. Primitive dispatch: - 17 simple FFI dispatch arms in `lower_call` for char_eq / lt / le / gt / ge / char_to_int / char_to_string / is_ascii(_digit / _alpha / _alphanumeric / _whitespace) / to_lower_ascii / to_upper_ascii. - `int_to_char` and `string_char_at` lower to a validate-then-construct pattern: validator → `brif(ok==0)` → Some/None branches → merge block with a `pointer_ty` block-param. Some-branch builds via `lower_ctor_alloc(Option$$Char, some_idx, [char_ptr])`; None via `lower_ctor_alloc(..., none_idx, [])`. - `string_chars` and `string_from_chars` thread the codegen-computed `List$$Char` Cons / Nil header words and discriminants to the runtime as i64 immediates; runtime stamps them through `sigil_alloc` to build well-typed `List[Char]` cells. `option_layout_for(Ty)` and `list_char_layout_immediates()` are the two private helpers that resolve the monomorphized layout via `mangle_type` / `mangle_ctor` and the `ctor_index` / `type_layouts` side-tables. `type_of_expr` predictions extended with the 19 Char op return types (I8 for boolean classifiers / comparators, I64 for `char_to_int`, `pointer_ty` for the rest). Globals set extended with the 17 user- callable surface names (the validator helpers `int_to_char_validate` / `string_char_at_validate` are codegen- internal only). * [Task CH4] Slot widening + GC bitmap for boxed Char Boxed Char (TAG_CHAR) is a heap pointer; the closure-record / sum-ctor / tuple-element slot-widening logic pre-Plan-C-addendum treated `EnvSlotKind::Char` and `Ty::Char` as a sub-word I32 primitive (uextend on store, ireduce(I32) on load). Both directions must drop for boxed Char — the slot already holds a `pointer_ty` value, and narrowing a pointer to I32 corrupts it. Updates: - `EnvSlotKind::is_pointer()` now matches `Char` alongside `String / Closure / User`, so closure-record bitmap bits are set correctly for boxed-Char captures. - Every match site that branched on `EnvSlotKind::Char` (closure store / load, synth-cont capture pack / unpack, post-arm-k capture pack, tail-prefix-let widen, narrow_for_kind helpers, `type_of_expr` for `Expr::ClosureEnvLoad`) routes Char through the pointer-typed branch. - `Ty::Char` in `load_field_value` and the tuple-pattern element loader drops the I32 narrow. - `is_gc_pointer_ty(Ty::Char)` returns true so sum-type field bitmaps mark Char fields for GC tracing (cosmetic on Boehm's conservative scan, load-bearing for any future precise GC). Existing e2e tests that exercised Char-as-I32 semantics: - `perform_side_narrow_to_char_value_checked`: rewrites `c == 'Y'` to `char_eq(c, 'Y')` (pointer equality wouldn't match codepoint equality for boxed Chars). - `cps_abi_captures_bearing_with_char_capture_exercises_widen_- narrow_symmetry`: same `==` → `char_eq` rewrite; the test still pins capture flow through synth-cont closure records, but with no width narrowing. - `arm_reads_char_arg_branches_on_codepoint`: same `==` → `char_eq` rewrite; the I32 → I8 split with the Bool test collapses (both are now pointer-store paths). - `task_78_5_g4_approach6_b_neq_r_pointer_return_arm_through_- char_body`: B != R width discrepancy collapses (B = Char = pointer_ty, R = String = pointer_ty); test docstring updated to reflect post-addendum reality. Test still pins the wrapper composition end-to-end. * [Task CH5] e2e tests + std/char.sigil + std/string.sigil + PLAN_C docs - 14 e2e tests cover the full Char surface: char_literal_round_trips_via_to_string, char_codepoint_round_trip, int_to_char_rejects_out_of_range, int_to_char_rejects_surrogate, int_to_char_accepts_valid, is_ascii_classifiers_basic, to_lower_upper_ascii_passthrough, string_chars_ascii, string_chars_multibyte, string_char_at_codepoint_index, string_from_chars_round_trip, char_pattern_match_against_literal, char_eq_distinguishes_different_codepoints, char_doc_only_import. - std/char.sigil ships as a documentation-only file (added to BUILTIN_INJECTED skip-list). Covers literal syntax, the 19 user-facing primitives, the ASCII-only v1 scope + v2 closure path, the byte-vs-codepoint cross-reference to std/string.sigil, and a worked count_digits example. - std/string.sigil doc preamble rewritten — the "Codepoint-aware variants deferred" note becomes a positive byte-vs-codepoint surface description; the new codepoint ops surface (string_chars, string_char_at, string_from_chars) is added to the builtin operations table with a cross-reference to std/char.sigil. - PLAN_C_PROGRESS.md gets a Task CH section documenting the addendum's runtime + compiler + stdlib + test surface and the pre-existing Char-as-I32 → boxed-Char representation switch. - PLAN_C_DEVIATIONS.md Task 68 entry's "deferred class 1" (codepoint-aware ops) and "deferred class 5" (char_to_int / int_to_char) marked CLOSED by this addendum. Class 3 (Float helpers) marked CLOSED by PR #101 (already shipped). Class 2 (codepoint-aware string_split / string_replace) remains deferred — depends on stdlib namespace qualification, not Char. * [Task CH6] Spec update — Char primitive + codepoint string ops Updates `spec/language.md` for the Plan C addendum: - §1 (Lexical structure / literals): expanded `Char` literal entry covers the boxed representation (TAG_CHAR=0x0C, 16 bytes), the closed codepoint range (0x000000..=0x10FFFF excluding surrogates), the full escape table (`\n`, `\t`, `\r`, `\\`, `\'`, `\"`, `\0`, `\u{HEX}` 1–6 hex digits), bare-codepoint UTF-8 literals, and parse-time rejection of multi-codepoint / out-of-range / surrogate inputs. Calls out the deliberate absence of `==` / `<` operator overloading on Char. - §3.1 (Built-in types): `Char` row updated to "Boxed Unicode codepoint (TAG_CHAR=0x0C, 21-bit codepoint payload)" — replaces the pre-addendum "1-byte codepoint" placeholder. - §3.1.1 (new subsection): full `Char` reference covering literal syntax, the 19 user-facing operations grouped by purpose (equality / ordering, conversion, ASCII classifiers, ASCII case), the codepoint-aware string operations (`string_chars`, `string_char_at`, `string_from_chars`), the byte-vs-codepoint surface coexistence, and a worked `count_digits` example. - §13 (Stdlib reference): adds a `std.char` row; extends the `std.string` row with the codepoint-indexed surface. - §14 (v1 limits): new §14.1 "Deferred to follow-up plans" table documenting closure paths for codepoint-aware `string_split` / `string_replace`, Unicode-aware classifiers / case / normalization, and a hypothetical strict-UTF-8 `string_chars_strict`. * [Task CH4 fix] is_gc_pointer_ty test + e2e UTF-8 source bytes CI on PR #105 surfaced two regressions from the boxed-Char representation switch: 1. `is_gc_pointer_ty_matches_expected_types` (layout.rs unit test) pinned `!is_gc_pointer_ty(Ty::Char)` — pre-addendum Char was an I32 immediate, so it correctly wasn't a GC pointer. Updated to assert `is_gc_pointer_ty(Ty::Char)` instead, with a Plan-C- addendum justification comment. 2. Three CH5 e2e tests embedded `\u{HEX}` escapes inside `"..."` string literals. Sigil's *string* lexer accepts only `\n / \t / \r / \\\\ / \\\"`; the `\u{...}` escape lives in *char* literals only. Rewrote the tests to use bare UTF-8 bytes in source (`"héllo"`, `"héllo 😀"`) — the bytes are valid UTF-8 that Rust's source-tree handling preserves into the embedded test string verbatim, and Sigil's runtime treats `String` as a raw UTF-8 byte buffer. * [Task CH4 fix 2] String lexer UTF-8 preservation + e2e mono trigger Two more CI failures fixed: 1. Sigil's string lexer pre-Plan-C-addendum read source bytes as Latin-1 chars (`self.src[pos] as char`) and pushed them to a Rust `String` via `String::push(char)`, which UTF-8 re-encodes. Multi-byte source sequences double-encoded — `é` (0xC3 0xA9) → 4 bytes (0xC3 0x82 0xC2 0xA9) inside the resulting StringLit. This regressed nothing pre-addendum because `string_chars` / `string_char_at` didn't exist; the addendum surfaces the bug immediately. `take_string_lit` now uses the `peek_utf8` / `advance_utf8` helpers (added in CH2) to decode multi-byte source bytes as a single codepoint and push that codepoint verbatim. Two new lexer tests pin the round-trip for 2-byte `é` and 4-byte `😀`. 2. `string_from_chars_round_trip` had no explicit `List[Char]` annotation in its source, so monomorphize never saw the type and codegen panicked with "ctor `Cons$$Char` not registered". Added a `let xs: List[Char] = string_chars(s)` binding to trigger mono via the type annotation's Apply node, mirroring the working `string_chars_multibyte` test's shape. * [Review] Address PR #105 review items 1–5 + 7 PR #105 review (boldfield, 2026-05-07): one MUST-FIX, four follow- ups, two deferred. All five non-deferred items addressed plus a small note for item 7. **1 (MUST-FIX) — Reject `==` / `!=` on `Char` at typecheck.** Pre- fix, `'a' == 'a'` typechecked and lowered to `icmp Equal l r` on boxed Char pointers — pointer identity, not codepoint equality — silently returning `false` at runtime. New typecheck arm in `check_binop`'s `BinOp::Eq | BinOp::NotEq` rejects `Ty::Char` operands with E0060 pointing at `char_eq(a, b)` (or `!char_eq(a, b)` for `!=`). Two new typecheck unit tests pin the rejection. Float / Int64 (also boxed) inherit the same trap; generalizing the rejection across all heap-boxed primitives is queued as a follow-up since the existing `float_eq` / `int64_eq` discipline currently hides the bug. **2 — Refactor `string_char_at` to early-exit shared helper.** Pre- refactor `sigil_string_char_at_validate` and `sigil_string_char_at` each called `decode_codepoints_lossy(slice)` (full-pass + Vec allocation) — making `for i in 0..n: char_at(s, i)` O(n²) with O(n) transient allocations per call. New shared helper `find_nth_codepoint(bytes, idx) -> Option<u32>` walks bytes front-to-back with early-exit; both entry points use it. Each call is now O(idx + decode_cost) and allocates nothing on the hot path. **3 — `lower_ctor_alloc` comment fix.** Dropped `Char` from the "sub-word primitives (Bool, Byte, Char, Unit)" widen-on-store comment; boxed Char takes the pass-through branch. **4 — Decoder overlong-rejection tests.** Two new runtime tests: `string_chars_overlong_2byte_replaces` (`[0xC0, 0x80]` 2-byte overlong of U+0000) and `string_chars_overlong_3byte_surrogate_- replaces` (`[0xED, 0xA0, 0x80]` 3-byte UTF-8 form of surrogate U+D800). Each pins a distinct decoder branch. **5 — Lexer multi-codepoint count uses codepoint-step.** The "Char literal must be a single codepoint; got N" diagnostic's count loop now uses `peek_utf8` / `advance_utf8`, parity with the literal-body decoder above. New lexer test pins `'aé'` reports "got 2", not "got 3". **7 (DEFERRED note) — U+FFFD merging.** Sigil v1's per-byte replacement (matching `String::from_utf8_lossy`) is now documented in `std/char.sigil` alongside a forward reference to the v2 `string_chars_strict` follow-up. Item 6 (Char in `is_gc_pointer_ty` — note for v2 precise GC) is deferred per reviewer. * [Review 2/3/4] Address all 4 outstanding boldfield reviews Combined fixes for: - **Review B item 3** (PR review 4246507154, 17:59) — comment style - **Review C items 9 + 10** (issue comment 4399992055, 18:30) — Float/Int64 == extension + e2e lossy-decode test - **Review D items 1–4** (PR review 4246835141, 18:46) — E0060 both operands, decoder dedup, spec §3.4 typo, std/string duplicate ### B item 3 — Plan-C-addendum comment spam pruned Deleted 13 redundant single-line `// Plan C addendum (Char) — boxed Char is pointer-typed.` comments adjacent to `EnvSlotKind::Char | ...` match arms across `compiler/src/codegen.rs`. The match-arm context alone makes the change self-evident; the load-bearing ones (literal lowering, type-mapping fns, dispatch arms, helpers, FuncIds struct, runtime counters, ast.rs `is_pointer()`, layout.rs `is_gc_pointer_ty`) stay. ### C item 9 — Heap-boxed-primitive == rejection extended to Float / Int64 The earlier Char-only E0060 rejection now generalizes to all three heap-boxed primitives. New `BoxedPrim` enum + `boxed_primitive_- eq_name` helper drive the typecheck arm; per-type suggestion + ordering hint string. Float adds the NaN-aware caveat in the error message. Four new typecheck unit tests pin Float / Int64 `==` and `!=` rejection. ### C item 10 — e2e test for user-visible lossy UTF-8 decode `string_chars_invalid_utf8_replaces` constructs a `ByteArray` with a known-invalid byte (`0xFF`) via `byte_array_alloc` + `byte_array_concat`, bypasses validation via `string_from_bytes_alloc` (the post-validation primitive copies bytes verbatim), and verifies `string_chars` emits U+FFFD (65533) for the invalid byte. Closes the runtime → user-program coverage gap. ### D item 1 — E0060 char check now guards both operands The earlier check only inspected `lt.as_ref()`. Now both `lt` and `rt` are checked via `lt_boxed.or(rt_boxed)`, so `42 == 'a'` and similar shapes still fire the named-function suggestion even when LHS is non-Char or `None`. ### D item 2 — UTF-8 decoder deduplicated Extracted `decode_next_codepoint(bytes, offset) -> (cp, len)` as the single source of truth for Sigil's lossy UTF-8 decode. `decode_codepoints_lossy` (drives `string_chars`) and `find_nth_codepoint` (drives `string_char_at`) both step through it, so codepoint-boundary agreement is now identical-by- construction rather than agree-by-coincidence. New runtime test `string_char_at_overlong_replaces` exercises `find_nth_codepoint` on `[0xC0, 0x80, b'a']` (overlong + ASCII) to pin that the two entry points produce the same codepoint count for invalid input. ### D item 3 — spec §3.4 → §3.1.1 The Char literal entry's "use the named functions (§3.4)" pointer referenced "Inference rules (overview)"; corrected to §3.1.1 ("Char and codepoint string operations") which is the new subsection. ### D item 4 — `std/string.sigil` duplicate removed The byte-indexed surface preamble listed `string_byte_at` twice; replaced the second occurrence with `string_length`. ### Out of scope (acknowledged in PR reply) - D non-blocking observation #1 (test gap on find_nth_codepoint with invalid input): closed by `string_char_at_overlong_replaces` above. - D non-blocking observation #2 (`\\u{HEX}` / `\\0` not supported in string literals): tracked as a separate follow-up since the current behavior produces a clear E0010, not a silent miss.

boldfield and others added 6 commits May 5, 2026 21:38

[Task F5] Update spec/language.md for Float type

5c859fd

Add Float to: literal syntax (§1), type table (§3), expression forms (§4.1), stdlib reference (§13), runtime model (§12). Remove "no Float type" from v1 limits (§14). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

boldfield commented May 6, 2026

View reviewed changes

boldfield and others added 3 commits May 5, 2026 22:36

Spec: clarify float exponent digit requirement and pattern exclusion

b716f15

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

boldfield merged commit 1fd9e0a into main May 6, 2026
4 checks passed

boldfield deleted the plan-c-float-type branch May 6, 2026 05:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Boxed Float (IEEE 754 f64) type#101

Boxed Float (IEEE 754 f64) type#101
boldfield merged 9 commits into
mainfrom
plan-c-float-type

boldfield commented May 6, 2026

Uh oh!

boldfield left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

boldfield commented May 6, 2026

Summary

Key design decisions

Test plan

Uh oh!

boldfield left a comment

Choose a reason for hiding this comment

Code Review: Boxed Float Type

Issues

Observations (not blocking)

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant