Skip to content

decodeURI/decodeURIComponent return incorrect results for non-BMP characters (emoji) #5166

@HiteshShonak

Description

@HiteshShonak

Describe the bug
decodeURI and decodeURIComponent return incorrect results for non-BMP characters such as emoji (\uD83D\uDE00). The URI Decode abstract operation reads a Unicode code point instead of a UTF-16 code unit, but ECMAScript spec step 4.b requires reading a UTF-16 code unit directly at index k.

To Reproduce

decodeURI('\uD83D\uDE00') === '\uD83D\uDE00'
// returns false in Boa, expected true

decodeURIComponent('\uD83D\uDE00') === '\uD83D\uDE00'
// returns false in Boa, expected true

Verified in Node.js:

node -e "console.log(decodeURI('\uD83D\uDE00') === '\uD83D\uDE00')"
# true

node -e "console.log(decodeURIComponent('\uD83D\uDE00') === '\uD83D\uDE00')"
# true

Expected behavior
Both decodeURI and decodeURIComponent should return true for non-BMP input, matching Node.js and Chrome. ECMAScript Decode abstract operation step 4.b states the code unit at index k should be read, not the code point.

Build environment:

  • OS: Windows 11
  • Version: 10.0.26200
  • Target triple: x86_64-pc-windows-msvc
  • Rustc version: rustc 1.94.0 (4a4ef493e 2026-03-02)

Additional context
Root cause is in core/engine/src/builtins/uri/mod.rs where code_point_at is used instead of code_unit_at. A surrogate pair like \uD83D\uDE00 is two UTF-16 code units, but code_point_at returns the combined code point U+1F600 which then gets truncated. The spec requires processing each code unit individually.

Spec reference: https://tc39.es/ecma262/#sec-decode

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions