-
-
Notifications
You must be signed in to change notification settings - Fork 603
Description
Describe the bug
decodeURI and decodeURIComponent return incorrect results for non-BMP characters such as emoji (\uD83D\uDE00). The URI Decode abstract operation reads a Unicode code point instead of a UTF-16 code unit, but ECMAScript spec step 4.b requires reading a UTF-16 code unit directly at index k.
To Reproduce
decodeURI('\uD83D\uDE00') === '\uD83D\uDE00'
// returns false in Boa, expected true
decodeURIComponent('\uD83D\uDE00') === '\uD83D\uDE00'
// returns false in Boa, expected trueVerified in Node.js:
node -e "console.log(decodeURI('\uD83D\uDE00') === '\uD83D\uDE00')"
# true
node -e "console.log(decodeURIComponent('\uD83D\uDE00') === '\uD83D\uDE00')"
# trueExpected behavior
Both decodeURI and decodeURIComponent should return true for non-BMP input, matching Node.js and Chrome. ECMAScript Decode abstract operation step 4.b states the code unit at index k should be read, not the code point.
Build environment:
- OS: Windows 11
- Version: 10.0.26200
- Target triple: x86_64-pc-windows-msvc
- Rustc version: rustc 1.94.0 (4a4ef493e 2026-03-02)
Additional context
Root cause is in core/engine/src/builtins/uri/mod.rs where code_point_at is used instead of code_unit_at. A surrogate pair like \uD83D\uDE00 is two UTF-16 code units, but code_point_at returns the combined code point U+1F600 which then gets truncated. The spec requires processing each code unit individually.
Spec reference: https://tc39.es/ecma262/#sec-decode