Skip to content

fix(runtime): built-ins/String test262 parity#4754

Merged
proggeramlug merged 1 commit into
mainfrom
string-parity
Jun 7, 2026
Merged

fix(runtime): built-ins/String test262 parity#4754
proggeramlug merged 1 commit into
mainfrom
string-parity

Conversation

@proggeramlug

Copy link
Copy Markdown
Contributor

Lifts built-ins/String test262 parity from 752 → 814 passing (runtime-fail 160 → 102, compile-fail 20 → 16), zero regressions (no test that passed before now fails). Differential vs Node on the pinned corpus (tc39/test262 @ 4249661388).

Root causes fixed

1. String.prototype.split(separator, limit) argument coercion. Split skipped ToString(separator) / ToUint32(limit) and treated an undefined separator as an empty-string separator (per-character split) instead of returning [wholeString]. New js_string_split_value takes the boxed separator + limit and follows §22.1.3.21 exactly: ToUint32(limit) runs before ToString(separator) (spec order — either may run user valueOf/toString and throw), undefined separator → [S], limit === 0[], and a RegExp separator delegates to the regex splitter. (~22 tests)

2. indexOf / lastIndexOf argument coercion. An object searchString was bit-cast as a string handle (→ garbage → -1) and the position was raw-fptosid (skipping valueOf). Now ToString(searchString) via js_string_coerce runs before ToNumber(position) via js_string_index_to_i32 / js_number_coerce. (~18 tests)

3. search / match regex coercion. Both required an actual RegExp argument (returned -1/null for a string/undefined/{toString} arg). New js_string_search_value / js_string_match_value coerce any non-RegExp arg via RegExpCreate(ToString(arg)), with undefined → the empty /(?:)/ regex; match/search now also accept 0 args. match(undefined) previously SIGSEGVd — a null pattern pointer was dereferenced by lookup_fancy_regex; now an empty StringHeader is built. (~30 tests)

4. trim / trimStart / trimEnd whitespace set. Used Rust’s Unicode White_Space, which excludes U+FEFF (BOM — a JS WhiteSpace) and includes U+0085 (NEL — not JS whitespace). Added an is_js_whitespace predicate matching ECMA-262 §22.1.3.32 (WhiteSpace + LineTerminator). (~7 tests)

Files

  • crates/perry-codegen/src/lower_string_method.rs — needle/position/separator/regex-arg coercion + 0-arg match/search
  • crates/perry-codegen/src/runtime_decls/strings_part2.rs — declare js_string_search_value / js_string_match_value (the js_string_split_value decl already landed on main)
  • crates/perry-runtime/src/object/native_call_method.rs — generic-this dispatch arms route through the coercing entries
  • crates/perry-runtime/src/regex.rscoerce_search_arg_to_regex + js_string_search_value / js_string_match_value
  • crates/perry-runtime/src/string/slice_ops.rsis_js_whitespace trim
  • crates/perry-runtime/src/string/split.rsjs_string_split_value

Validation

Default compile path (no env flags). built-ins/String: 752 → 814 pass, 0 regressions. Verified manually against Node for split/indexOf/lastIndexOf/search/match/trim incl. the previously-crashing "abc".match(undefined).

Argument-coercion and whitespace fixes across String.prototype methods,
lifting built-ins/String from 752 to 814 passing (zero regressions).

Root causes:
- split(separator, limit) skipped ToString(separator)/ToUint32(limit) and
  treated undefined separator as empty-string (per-char split). New
  js_string_split_value takes boxed args: ToUint32(limit) before
  ToString(separator) (spec order; either may throw), undefined -> [S],
  limit 0 -> [], RegExp-separator delegation.
- indexOf/lastIndexOf bit-cast an object searchString as a string handle and
  raw-fptosi'd the position (skipping valueOf). Now ToString(searchString)
  via js_string_coerce before ToNumber(position) via js_string_index_to_i32
  / js_number_coerce.
- search/match required a RegExp arg (returned -1/null otherwise). New
  js_string_search_value/js_string_match_value coerce any non-RegExp arg via
  RegExpCreate(ToString(arg)); undefined -> empty /(?:)/. match/search now
  accept 0 args. (match(undefined) previously SIGSEGV'd: a null pattern
  pointer was deref'd by lookup_fancy_regex — now an empty StringHeader.)
- trim/trimStart/trimEnd used Rust's Unicode White_Space, which excludes
  U+FEFF (BOM, a JS WhiteSpace) and includes U+0085 (NEL, not JS). Added an
  is_js_whitespace predicate matching ECMA-262 §22.1.3.32.

Files: lower_string_method.rs (codegen), runtime_decls/strings_part2.rs,
native_call_method.rs, regex.rs, string/slice_ops.rs, string/split.rs.

built-ins/String: 752 -> 814 pass; 160 -> 102 runtime-fail; 20 -> 16
compile-fail; 0 regressions.
@proggeramlug proggeramlug merged commit b60dd9d into main Jun 7, 2026
13 checks passed
@proggeramlug proggeramlug deleted the string-parity branch June 7, 2026 10:01
proggeramlug pushed a commit that referenced this pull request Jun 7, 2026
Follow-up to #4754. built-ins/String 814 -> 840 passing, 0 regressions.

- String.prototype.search dispatch arm returned a NaN-boxed INT32 instead of
  a raw f64, so `new String(s).search(x) === N` (and generic-this
  `__o.search = String.prototype.search` forms) strict-compared unequal to a
  number literal even though the index was correct. Return `i32 as f64`,
  matching the indexOf arm. (search 19/31 -> 31/31)
- new String(...) wrapper (type Named("String")) mis-routed the Array/String
  shared methods indexOf/includes/slice/lastIndexOf to ArrayIndexOf/etc.
  (read the wrapper as an array -> -1/false). Treat the boxed String wrapper
  as a string in try_local_array_methods so they route to the ToString-
  coercing string dispatch.
- replace/replaceAll did not ToString-coerce a non-RegExp searchValue or a
  non-function replaceValue (object args with throwing toString didn't throw,
  per ECMA-262 §22.1.3.19 order: searchValue before replaceValue).
- concat bit-cast its args as string handles instead of ToString-coercing
  them, dropping `undefined`/booleans ("lego".concat(undefined) -> "lego").
  Now ToString-coerces each non-static-string arg (§22.1.3.5).
- padStart/padEnd bit-cast a non-string fillString, dropping it. New
  js_string_pad_fill ToString-coerces the fill (undefined -> default " ").

Files: native_call_method.rs, string/pad.rs (runtime),
lower_string_method.rs, runtime_decls/strings_part2.rs (codegen),
lower/expr_call/local_array_methods.rs (HIR).
proggeramlug added a commit that referenced this pull request Jun 7, 2026
…4755)

Follow-up to #4754. built-ins/String 814 -> 840 passing, 0 regressions.

- String.prototype.search dispatch arm returned a NaN-boxed INT32 instead of
  a raw f64, so `new String(s).search(x) === N` (and generic-this
  `__o.search = String.prototype.search` forms) strict-compared unequal to a
  number literal even though the index was correct. Return `i32 as f64`,
  matching the indexOf arm. (search 19/31 -> 31/31)
- new String(...) wrapper (type Named("String")) mis-routed the Array/String
  shared methods indexOf/includes/slice/lastIndexOf to ArrayIndexOf/etc.
  (read the wrapper as an array -> -1/false). Treat the boxed String wrapper
  as a string in try_local_array_methods so they route to the ToString-
  coercing string dispatch.
- replace/replaceAll did not ToString-coerce a non-RegExp searchValue or a
  non-function replaceValue (object args with throwing toString didn't throw,
  per ECMA-262 §22.1.3.19 order: searchValue before replaceValue).
- concat bit-cast its args as string handles instead of ToString-coercing
  them, dropping `undefined`/booleans ("lego".concat(undefined) -> "lego").
  Now ToString-coerces each non-static-string arg (§22.1.3.5).
- padStart/padEnd bit-cast a non-string fillString, dropping it. New
  js_string_pad_fill ToString-coerces the fill (undefined -> default " ").

Files: native_call_method.rs, string/pad.rs (runtime),
lower_string_method.rs, runtime_decls/strings_part2.rs (codegen),
lower/expr_call/local_array_methods.rs (HIR).

Co-authored-by: Ralph Küpper <ralph@skelpo.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant