Boxed Float (IEEE 754 f64) type#101
Conversation
Add TAG_FLOAT = 0x0B, FloatAllocCount/FloatAllocBytes counters, and runtime/src/float.rs with 21 FFI primitives: box/unbox, 5 arithmetic, 5 comparison, 4 math, and 5 conversion ops. All follow the Int64 boxed-scalar pattern (atomic alloc, count=1, bitmap=0). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add FloatLit(f64) token and Expr::FloatLit(f64, Span) AST variant. The lexer recognizes `3.14`, `1e10`, `3.14e-2` as float literals (requires digits before and after the dot). Parser maps the token and constant-folds unary negation on float literals. All match arms across resolve, monomorphize, closure_convert, color, elaborate, typecheck, and codegen updated for exhaustiveness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Register opaque Float type in builtin_types(). Add
register_builtin_float_schemes() with 16 primitives: 4 arithmetic,
5 comparison, 4 math, 5 conversion/string ops. Float literal
inference (Expr::FloatLit → Ty::User("Float")) already added in
Task F2's exhaustiveness pass.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add 20 FFI declarations for float runtime primitives to BuiltinFuncIds/BuiltinFuncRefs. Float literals lower via f64::to_bits() → iconst → sigil_float_box (bit-pattern calling convention). All 19 callable primitives dispatch through lower_call with correct stackmap placeholders for allocating ops. Fix sigil_float_box to accept i64 bit pattern (integer register class) instead of f64, matching codegen's iconst path. Add std/float.sigil (doc-only) and imports.rs BUILTIN_INJECTED entry. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
10 e2e tests covering float literals, arithmetic, negation, div-by-zero→inf, from_int round-trip, comparisons, floor/ceil, string parse/validate, NaN≠NaN, doc-only import. float_to_string now appends ".0" to whole-number results so Float values are always visually distinguishable from Int (inf/NaN unchanged). Runtime unit tests extended to cover the formatting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Float to: literal syntax (§1), type table (§3), expression forms (§4.1), stdlib reference (§13), runtime model (§12). Remove "no Float type" from v1 limits (§14). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
boldfield
left a comment
There was a problem hiding this comment.
Code Review: Boxed Float Type
Clean implementation that mirrors the established Int64 boxed-scalar pattern. Correctness looks good across all layers. A few issues worth addressing:
Issues
1. Lexer: greedy exponent consumption produces misleading errors
The exponent branch (line ~164 of lexer.rs) unconditionally consumes e/E and an optional sign without verifying that exponent digits follow. Input like 1e or 1e+ gets consumed and then fails with "float literal 1e is out of range" — but it's not out of range, it's malformed.
A simple fix: peek ahead for at least one digit before committing to the exponent parse. Otherwise, leave the e unconsumed and treat the preceding part as an integer literal. This matters because .sigil source with a typo (1e) currently eats characters that belong to the next token and gives a confusing diagnostic.
2. Dead code in codegen: make_float_binop takes unused name parameter
let make_float_binop = |name: &str, sig_holder: &mut Signature| {
// ...
let _ = name;
};The name parameter is immediately discarded. The closure doesn't need it — drop the parameter.
Observations (not blocking)
-
float_to_intclamping is redundant with Rust's saturatingassemantics (f64 as i64already returns 0 for NaN,i64::MAXfor overflow,i64::MINfor underflow since Rust 1.45). The explicit checks make the contract more visible to readers though, so not wrong — just worth knowing that removing them wouldn't change behavior. -
"Plan D" comments appear throughout the codegen additions. The PR summary and branch name suggest this is a Plan C continuation (Float type for the Sigil language). If "Plan D" is intentional naming, ignore this — just flagging potential naming drift.
Verdict
Solid work. The lexer exponent greediness is the only thing I'd want fixed before merge — it'll bite someone debugging a typo in float-heavy code and getting a confusing diagnostic. The dead parameter is a one-line cleanup.
1. Lexer: exponent branch now peeks ahead for at least one digit
before committing to consume `e`/`E` and optional sign. `1e`
lexes as IntLit(1) + Ident("e"), not a malformed float.
Three new lexer unit tests pin the behavior.
2. Codegen: remove unused `name` parameter from `make_float_binop`
closure.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Change test values from 3.14/2.718 (approx PI/E) to 3.125/2.75 to avoid clippy::approx_constant lint - Add SAFETY comments on .as_ptr() calls for interior-pointer check Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- 14 e2e tests cover the full Char surface: char_literal_round_trips_via_to_string, char_codepoint_round_trip, int_to_char_rejects_out_of_range, int_to_char_rejects_surrogate, int_to_char_accepts_valid, is_ascii_classifiers_basic, to_lower_upper_ascii_passthrough, string_chars_ascii, string_chars_multibyte, string_char_at_codepoint_index, string_from_chars_round_trip, char_pattern_match_against_literal, char_eq_distinguishes_different_codepoints, char_doc_only_import. - std/char.sigil ships as a documentation-only file (added to BUILTIN_INJECTED skip-list). Covers literal syntax, the 19 user-facing primitives, the ASCII-only v1 scope + v2 closure path, the byte-vs-codepoint cross-reference to std/string.sigil, and a worked count_digits example. - std/string.sigil doc preamble rewritten — the "Codepoint-aware variants deferred" note becomes a positive byte-vs-codepoint surface description; the new codepoint ops surface (string_chars, string_char_at, string_from_chars) is added to the builtin operations table with a cross-reference to std/char.sigil. - PLAN_C_PROGRESS.md gets a Task CH section documenting the addendum's runtime + compiler + stdlib + test surface and the pre-existing Char-as-I32 → boxed-Char representation switch. - PLAN_C_DEVIATIONS.md Task 68 entry's "deferred class 1" (codepoint-aware ops) and "deferred class 5" (char_to_int / int_to_char) marked CLOSED by this addendum. Class 3 (Float helpers) marked CLOSED by PR #101 (already shipped). Class 2 (codepoint-aware string_split / string_replace) remains deferred — depends on stdlib namespace qualification, not Char.
* [Task CH1] Char header tag + runtime primitives
Adds TAG_CHAR=0x0C and a complete `runtime/src/char.rs` module
covering boxing, equality/ordering, conversion, ASCII classifiers,
ASCII case folding, and the load-bearing UTF-8 walkers
(`string_chars`, `string_char_at`, `string_from_chars`).
`string_chars` and `string_from_chars` accept Cons / Nil header
words and discriminants from the codegen call site so the runtime
stays free of compile-time `List[Char]` layout knowledge while
still emitting well-typed `List[Char]` values. Lossy UTF-8 decode
emits U+FFFD per invalid byte and resyncs at the next valid leading
byte.
Adds `CharAllocCount` / `CharAllocBytes` counters; 32 unit tests
cover box/unbox round-trips for ASCII / 2-byte / 3-byte / 4-byte
codepoints, every classifier, ASCII case passthrough on non-ASCII,
`int_to_char_validate` boundary cases (surrogates, > 0x10FFFF,
negative), `string_chars` on valid + invalid UTF-8 input, codepoint-
indexed `string_char_at`, and `string_from_chars` round-trip through
`string_chars`.
* [Task CH2] Lexer + parser — Char literal extensions
Extends the existing `'x'` Char literal lexer with the full plan-spec
escape and validity surface:
- New escape sequences: `\"`, `\0`, `\u{HEX}` (1–6 hex digits).
- Bare multi-byte codepoints (`'é'`, `'中'`, `'😀'`) decoded via a
new UTF-8 peek/advance helper pair on the byte-based cursor; the
prior `self.src[pos] as char` shortcut would otherwise see N
source bytes and reject as multi-codepoint.
- Compile-time rejection of `\u{...}` values > 0x10FFFF and any
surrogate `0xD800..=0xDFFF` (E0010 with descriptive message).
- Compile-time rejection of empty `\u{}` and >6-hex-digit escapes.
- Multi-codepoint literal bodies (`'ab'`, `'ab\u{41}'`) now produce
the targeted "Char literal must be a single codepoint; got N"
diagnostic rather than the prior "expected closing `'`" surface.
Token type stays `CharLit(char)` — Rust's `char` already enforces
the post-validation invariant (no surrogates, ≤ 0x10FFFF). Existing
parser-side mapping (`Token::CharLit` → `Expr::CharLit`) is unchanged.
15 new lexer tests cover every escape, each Unicode boundary
(0xD7FF / 0xE000 / 0x10FFFF), each rejection case (empty braces,
too many digits, out-of-range, low/high surrogates), and the three
multi-byte bare-codepoint widths.
* [Task CH3] Char builtin schemes — 19 user-facing primitives
Registers the user-facing Char primitives via a new
`register_builtin_char_schemes(tc)` mirroring
`register_builtin_float_schemes`:
- 5 equality / ordering: char_eq, char_lt, char_le, char_gt, char_ge
— `(Char, Char) -> Bool`
- 3 conversion: char_to_int `(Char) -> Int`, int_to_char
`(Int) -> Option[Char]`, char_to_string `(Char) -> String`
- 5 ASCII classifiers: is_ascii, is_ascii_digit, is_ascii_alpha,
is_ascii_alphanumeric, is_ascii_whitespace — `(Char) -> Bool`
- 2 ASCII case: to_lower_ascii, to_upper_ascii — `(Char) -> Char`
- 4 string codepoint ops: string_chars `(String) -> List[Char]`,
string_char_at `(String, Int) -> Option[Char]`, string_from_chars
`(List[Char]) -> String`
Char itself is already a Ty::Char primitive (pre-existing) and
literal type inference (`Expr::CharLit -> Ty::Char`) is unchanged.
The validator helpers `int_to_char_validate` /
`string_char_at_validate` are codegen-internal — not registered as
user-callable schemes; codegen will lower the user-facing
`int_to_char` / `string_char_at` to the validate-then-construct
pattern in CH4.
13 typecheck unit tests cover each user-facing scheme: round-trip
type inference, Option[Char] / List[Char] return types, E0044 on
wrong-argument-type calls, E0045 on Char-vs-Int annotation
mismatch, and Char literal default inference.
* [Task CH4] Codegen — boxed Char + 19-primitive dispatch
Converts Char from an I32 codepoint immediate to a `TAG_CHAR`-headed
heap pointer (mirroring Float / Int64 / String) and wires the 19
user-facing Char primitives through `lower_call`.
Type-mapping changes:
- `cranelift_ty_for_type_expr("Char")` and `cranelift_ty_of_ty(Ty::Char)`
return `pointer_ty` instead of `types::I32`.
- `Expr::CharLit(c, _)` lowers to `iconst(I64, codepoint) →
sigil_char_box → pointer_ty` (with stackmap placeholder), no longer
bare `iconst(I32, c)`.
- `type_of_expr(Expr::CharLit)` returns `pointer_ty`.
- `Pattern::CharLit(c)` loads the u32 codepoint at offset 8 from the
boxed Char scrutinee, zero-extends to i64, and tests for equality
against the literal codepoint.
Primitive dispatch:
- 17 simple FFI dispatch arms in `lower_call` for char_eq / lt / le /
gt / ge / char_to_int / char_to_string / is_ascii(_digit / _alpha /
_alphanumeric / _whitespace) / to_lower_ascii / to_upper_ascii.
- `int_to_char` and `string_char_at` lower to a validate-then-construct
pattern: validator → `brif(ok==0)` → Some/None branches → merge
block with a `pointer_ty` block-param. Some-branch builds via
`lower_ctor_alloc(Option$$Char, some_idx, [char_ptr])`; None via
`lower_ctor_alloc(..., none_idx, [])`.
- `string_chars` and `string_from_chars` thread the codegen-computed
`List$$Char` Cons / Nil header words and discriminants to the
runtime as i64 immediates; runtime stamps them through `sigil_alloc`
to build well-typed `List[Char]` cells.
`option_layout_for(Ty)` and `list_char_layout_immediates()` are the
two private helpers that resolve the monomorphized layout via
`mangle_type` / `mangle_ctor` and the `ctor_index` / `type_layouts`
side-tables.
`type_of_expr` predictions extended with the 19 Char op return types
(I8 for boolean classifiers / comparators, I64 for `char_to_int`,
`pointer_ty` for the rest). Globals set extended with the 17 user-
callable surface names (the validator helpers
`int_to_char_validate` / `string_char_at_validate` are codegen-
internal only).
* [Task CH4] Slot widening + GC bitmap for boxed Char
Boxed Char (TAG_CHAR) is a heap pointer; the closure-record /
sum-ctor / tuple-element slot-widening logic pre-Plan-C-addendum
treated `EnvSlotKind::Char` and `Ty::Char` as a sub-word I32
primitive (uextend on store, ireduce(I32) on load). Both directions
must drop for boxed Char — the slot already holds a `pointer_ty`
value, and narrowing a pointer to I32 corrupts it.
Updates:
- `EnvSlotKind::is_pointer()` now matches `Char` alongside
`String / Closure / User`, so closure-record bitmap bits are
set correctly for boxed-Char captures.
- Every match site that branched on `EnvSlotKind::Char` (closure
store / load, synth-cont capture pack / unpack, post-arm-k
capture pack, tail-prefix-let widen, narrow_for_kind helpers,
`type_of_expr` for `Expr::ClosureEnvLoad`) routes Char through
the pointer-typed branch.
- `Ty::Char` in `load_field_value` and the tuple-pattern element
loader drops the I32 narrow.
- `is_gc_pointer_ty(Ty::Char)` returns true so sum-type field
bitmaps mark Char fields for GC tracing (cosmetic on Boehm's
conservative scan, load-bearing for any future precise GC).
Existing e2e tests that exercised Char-as-I32 semantics:
- `perform_side_narrow_to_char_value_checked`: rewrites
`c == 'Y'` to `char_eq(c, 'Y')` (pointer equality wouldn't
match codepoint equality for boxed Chars).
- `cps_abi_captures_bearing_with_char_capture_exercises_widen_-
narrow_symmetry`: same `==` → `char_eq` rewrite; the test
still pins capture flow through synth-cont closure records,
but with no width narrowing.
- `arm_reads_char_arg_branches_on_codepoint`: same `==` →
`char_eq` rewrite; the I32 → I8 split with the Bool test
collapses (both are now pointer-store paths).
- `task_78_5_g4_approach6_b_neq_r_pointer_return_arm_through_-
char_body`: B != R width discrepancy collapses (B = Char =
pointer_ty, R = String = pointer_ty); test docstring updated
to reflect post-addendum reality. Test still pins the wrapper
composition end-to-end.
* [Task CH5] e2e tests + std/char.sigil + std/string.sigil + PLAN_C docs
- 14 e2e tests cover the full Char surface: char_literal_round_trips_via_to_string,
char_codepoint_round_trip, int_to_char_rejects_out_of_range,
int_to_char_rejects_surrogate, int_to_char_accepts_valid,
is_ascii_classifiers_basic, to_lower_upper_ascii_passthrough,
string_chars_ascii, string_chars_multibyte,
string_char_at_codepoint_index, string_from_chars_round_trip,
char_pattern_match_against_literal, char_eq_distinguishes_different_codepoints,
char_doc_only_import.
- std/char.sigil ships as a documentation-only file (added to
BUILTIN_INJECTED skip-list). Covers literal syntax, the 19
user-facing primitives, the ASCII-only v1 scope + v2 closure path,
the byte-vs-codepoint cross-reference to std/string.sigil, and a
worked count_digits example.
- std/string.sigil doc preamble rewritten — the "Codepoint-aware
variants deferred" note becomes a positive byte-vs-codepoint
surface description; the new codepoint ops surface
(string_chars, string_char_at, string_from_chars) is added to the
builtin operations table with a cross-reference to std/char.sigil.
- PLAN_C_PROGRESS.md gets a Task CH section documenting the
addendum's runtime + compiler + stdlib + test surface and the
pre-existing Char-as-I32 → boxed-Char representation switch.
- PLAN_C_DEVIATIONS.md Task 68 entry's "deferred class 1"
(codepoint-aware ops) and "deferred class 5"
(char_to_int / int_to_char) marked CLOSED by this addendum.
Class 3 (Float helpers) marked CLOSED by PR #101 (already shipped).
Class 2 (codepoint-aware string_split / string_replace) remains
deferred — depends on stdlib namespace qualification, not Char.
* [Task CH6] Spec update — Char primitive + codepoint string ops
Updates `spec/language.md` for the Plan C addendum:
- §1 (Lexical structure / literals): expanded `Char` literal entry
covers the boxed representation (TAG_CHAR=0x0C, 16 bytes), the
closed codepoint range (0x000000..=0x10FFFF excluding surrogates),
the full escape table (`\n`, `\t`, `\r`, `\\`, `\'`, `\"`, `\0`,
`\u{HEX}` 1–6 hex digits), bare-codepoint UTF-8 literals, and
parse-time rejection of multi-codepoint / out-of-range / surrogate
inputs. Calls out the deliberate absence of `==` / `<` operator
overloading on Char.
- §3.1 (Built-in types): `Char` row updated to "Boxed Unicode
codepoint (TAG_CHAR=0x0C, 21-bit codepoint payload)" — replaces
the pre-addendum "1-byte codepoint" placeholder.
- §3.1.1 (new subsection): full `Char` reference covering literal
syntax, the 19 user-facing operations grouped by purpose
(equality / ordering, conversion, ASCII classifiers, ASCII case),
the codepoint-aware string operations (`string_chars`,
`string_char_at`, `string_from_chars`), the byte-vs-codepoint
surface coexistence, and a worked `count_digits` example.
- §13 (Stdlib reference): adds a `std.char` row; extends the
`std.string` row with the codepoint-indexed surface.
- §14 (v1 limits): new §14.1 "Deferred to follow-up plans" table
documenting closure paths for codepoint-aware `string_split` /
`string_replace`, Unicode-aware classifiers / case / normalization,
and a hypothetical strict-UTF-8 `string_chars_strict`.
* [Task CH4 fix] is_gc_pointer_ty test + e2e UTF-8 source bytes
CI on PR #105 surfaced two regressions from the boxed-Char
representation switch:
1. `is_gc_pointer_ty_matches_expected_types` (layout.rs unit test)
pinned `!is_gc_pointer_ty(Ty::Char)` — pre-addendum Char was an
I32 immediate, so it correctly wasn't a GC pointer. Updated to
assert `is_gc_pointer_ty(Ty::Char)` instead, with a Plan-C-
addendum justification comment.
2. Three CH5 e2e tests embedded `\u{HEX}` escapes inside `"..."`
string literals. Sigil's *string* lexer accepts only
`\n / \t / \r / \\\\ / \\\"`; the `\u{...}` escape lives in
*char* literals only. Rewrote the tests to use bare UTF-8 bytes
in source (`"héllo"`, `"héllo 😀"`) — the bytes are valid UTF-8
that Rust's source-tree handling preserves into the embedded
test string verbatim, and Sigil's runtime treats `String` as a
raw UTF-8 byte buffer.
* [Task CH4 fix 2] String lexer UTF-8 preservation + e2e mono trigger
Two more CI failures fixed:
1. Sigil's string lexer pre-Plan-C-addendum read source bytes as
Latin-1 chars (`self.src[pos] as char`) and pushed them to a
Rust `String` via `String::push(char)`, which UTF-8 re-encodes.
Multi-byte source sequences double-encoded — `é` (0xC3 0xA9) →
4 bytes (0xC3 0x82 0xC2 0xA9) inside the resulting StringLit.
This regressed nothing pre-addendum because `string_chars` /
`string_char_at` didn't exist; the addendum surfaces the bug
immediately. `take_string_lit` now uses the `peek_utf8` /
`advance_utf8` helpers (added in CH2) to decode multi-byte
source bytes as a single codepoint and push that codepoint
verbatim. Two new lexer tests pin the round-trip for 2-byte
`é` and 4-byte `😀`.
2. `string_from_chars_round_trip` had no explicit `List[Char]`
annotation in its source, so monomorphize never saw the type
and codegen panicked with "ctor `Cons$$Char` not registered".
Added a `let xs: List[Char] = string_chars(s)` binding to
trigger mono via the type annotation's Apply node, mirroring
the working `string_chars_multibyte` test's shape.
* [Review] Address PR #105 review items 1–5 + 7
PR #105 review (boldfield, 2026-05-07): one MUST-FIX, four follow-
ups, two deferred. All five non-deferred items addressed plus a
small note for item 7.
**1 (MUST-FIX) — Reject `==` / `!=` on `Char` at typecheck.** Pre-
fix, `'a' == 'a'` typechecked and lowered to `icmp Equal l r` on
boxed Char pointers — pointer identity, not codepoint equality —
silently returning `false` at runtime. New typecheck arm in
`check_binop`'s `BinOp::Eq | BinOp::NotEq` rejects `Ty::Char`
operands with E0060 pointing at `char_eq(a, b)` (or
`!char_eq(a, b)` for `!=`). Two new typecheck unit tests pin the
rejection. Float / Int64 (also boxed) inherit the same trap;
generalizing the rejection across all heap-boxed primitives is
queued as a follow-up since the existing `float_eq` / `int64_eq`
discipline currently hides the bug.
**2 — Refactor `string_char_at` to early-exit shared helper.** Pre-
refactor `sigil_string_char_at_validate` and `sigil_string_char_at`
each called `decode_codepoints_lossy(slice)` (full-pass + Vec
allocation) — making `for i in 0..n: char_at(s, i)` O(n²) with O(n)
transient allocations per call. New shared helper
`find_nth_codepoint(bytes, idx) -> Option<u32>` walks bytes
front-to-back with early-exit; both entry points use it. Each call
is now O(idx + decode_cost) and allocates nothing on the hot path.
**3 — `lower_ctor_alloc` comment fix.** Dropped `Char` from the
"sub-word primitives (Bool, Byte, Char, Unit)" widen-on-store
comment; boxed Char takes the pass-through branch.
**4 — Decoder overlong-rejection tests.** Two new runtime tests:
`string_chars_overlong_2byte_replaces` (`[0xC0, 0x80]` 2-byte
overlong of U+0000) and `string_chars_overlong_3byte_surrogate_-
replaces` (`[0xED, 0xA0, 0x80]` 3-byte UTF-8 form of surrogate
U+D800). Each pins a distinct decoder branch.
**5 — Lexer multi-codepoint count uses codepoint-step.** The
"Char literal must be a single codepoint; got N" diagnostic's
count loop now uses `peek_utf8` / `advance_utf8`, parity with the
literal-body decoder above. New lexer test pins
`'aé'` reports "got 2", not "got 3".
**7 (DEFERRED note) — U+FFFD merging.** Sigil v1's per-byte
replacement (matching `String::from_utf8_lossy`) is now
documented in `std/char.sigil` alongside a forward reference to
the v2 `string_chars_strict` follow-up.
Item 6 (Char in `is_gc_pointer_ty` — note for v2 precise GC) is
deferred per reviewer.
* [Review 2/3/4] Address all 4 outstanding boldfield reviews
Combined fixes for:
- **Review B item 3** (PR review 4246507154, 17:59) — comment style
- **Review C items 9 + 10** (issue comment 4399992055, 18:30) —
Float/Int64 == extension + e2e lossy-decode test
- **Review D items 1–4** (PR review 4246835141, 18:46) — E0060 both
operands, decoder dedup, spec §3.4 typo, std/string duplicate
### B item 3 — Plan-C-addendum comment spam pruned
Deleted 13 redundant single-line `// Plan C addendum (Char) — boxed
Char is pointer-typed.` comments adjacent to `EnvSlotKind::Char | ...`
match arms across `compiler/src/codegen.rs`. The match-arm context
alone makes the change self-evident; the load-bearing ones (literal
lowering, type-mapping fns, dispatch arms, helpers, FuncIds struct,
runtime counters, ast.rs `is_pointer()`, layout.rs
`is_gc_pointer_ty`) stay.
### C item 9 — Heap-boxed-primitive == rejection extended to Float / Int64
The earlier Char-only E0060 rejection now generalizes to all three
heap-boxed primitives. New `BoxedPrim` enum + `boxed_primitive_-
eq_name` helper drive the typecheck arm; per-type suggestion +
ordering hint string. Float adds the NaN-aware caveat in the error
message. Four new typecheck unit tests pin Float / Int64 `==` and
`!=` rejection.
### C item 10 — e2e test for user-visible lossy UTF-8 decode
`string_chars_invalid_utf8_replaces` constructs a `ByteArray` with
a known-invalid byte (`0xFF`) via `byte_array_alloc` +
`byte_array_concat`, bypasses validation via
`string_from_bytes_alloc` (the post-validation primitive copies
bytes verbatim), and verifies `string_chars` emits U+FFFD (65533)
for the invalid byte. Closes the runtime → user-program coverage
gap.
### D item 1 — E0060 char check now guards both operands
The earlier check only inspected `lt.as_ref()`. Now both `lt` and
`rt` are checked via `lt_boxed.or(rt_boxed)`, so `42 == 'a'` and
similar shapes still fire the named-function suggestion even when
LHS is non-Char or `None`.
### D item 2 — UTF-8 decoder deduplicated
Extracted `decode_next_codepoint(bytes, offset) -> (cp, len)` as
the single source of truth for Sigil's lossy UTF-8 decode.
`decode_codepoints_lossy` (drives `string_chars`) and
`find_nth_codepoint` (drives `string_char_at`) both step through
it, so codepoint-boundary agreement is now identical-by-
construction rather than agree-by-coincidence. New runtime test
`string_char_at_overlong_replaces` exercises `find_nth_codepoint`
on `[0xC0, 0x80, b'a']` (overlong + ASCII) to pin that the two
entry points produce the same codepoint count for invalid input.
### D item 3 — spec §3.4 → §3.1.1
The Char literal entry's "use the named functions (§3.4)" pointer
referenced "Inference rules (overview)"; corrected to §3.1.1
("Char and codepoint string operations") which is the new
subsection.
### D item 4 — `std/string.sigil` duplicate removed
The byte-indexed surface preamble listed `string_byte_at` twice;
replaced the second occurrence with `string_length`.
### Out of scope (acknowledged in PR reply)
- D non-blocking observation #1 (test gap on find_nth_codepoint with
invalid input): closed by `string_char_at_overlong_replaces` above.
- D non-blocking observation #2 (`\\u{HEX}` / `\\0` not supported in
string literals): tracked as a separate follow-up since the
current behavior produces a clear E0010, not a silent miss.
Summary
Floatas a boxed IEEE 754 f64 type following theInt64boxed-scalar patternTAG_FLOAT = 0x0B, 21 runtime FFI primitives (arithmetic, comparison, math, conversion)3.14,1e10,2.5e-3), type system integration, full codegen wiringfloat_to_stringalways includes.0for whole numbers (e.g.,4.0not4);inf/NaNunchangedspec/language.mdupdated: Float in literal syntax, type table, expression forms, stdlib reference, runtime model; removed from v1 limitsKey design decisions
sigil_float_boxacceptsi64bit pattern (notf64) to stay in Cranelift's integer register class — codegen usesf64::to_bits() as i64→iconst(I64)→ callfloat_to_stringformatting: appends.0to whole numbers so Float values are always visually distinct from IntTest plan
Implements plan: 2026-05-05-sigil-float.md
🤖 Generated with Claude Code