Skip to content

[java] Fix JSON parser EOF sentinel collision with U+FFFF#17737

Merged
shs96c merged 2 commits into
SeleniumHQ:trunkfrom
shs96c:shs-json-fix-eof-sentinel
Jul 2, 2026
Merged

[java] Fix JSON parser EOF sentinel collision with U+FFFF#17737
shs96c merged 2 commits into
SeleniumHQ:trunkfrom
shs96c:shs-json-fix-eof-sentinel

Conversation

@shs96c

@shs96c shs96c commented Jul 2, 2026

Copy link
Copy Markdown
Member

Input.EOF was (char) -1, i.e. 0xFFFF — a valid UTF-16 code unit. Any JSON string containing U+FFFF (literally or as ) was mis-reported as an unterminated string.

  • Switch Input.peek() / Input.read() to return int, with -1 as the EOF sentinel (matching java.io.Reader.read()).
  • Update JsonInput callers to handle the wider return type (cast to char at appending/formatting sites).

Input used (char) -1 as its EOF sentinel. That value is 0xFFFF - a valid
Unicode code unit that can legitimately appear in JSON. Any string
containing it was mis-reported as an unterminated string.

Switch peek()/read() to return int with -1 as the sentinel (matching
Reader.read()) so the sentinel cannot collide with a valid UTF-16 code
unit.
@selenium-ci selenium-ci added the C-java Java Bindings label Jul 2, 2026
@shs96c shs96c marked this pull request as ready for review July 2, 2026 10:37
@qodo-code-review

Copy link
Copy Markdown
Contributor

PR Summary by Qodo

Fix Java JSON EOF sentinel collision with U+FFFF

🐞 Bug fix 🧪 Tests 🕐 20-40 Minutes

Grey Divider

AI Description

• Replace in-band char EOF sentinel with int -1 to avoid U+FFFF collisions
• Update JSON parser call sites to handle int reads and cast only when appending/formatting
• Add regression tests proving U+FFFF parses correctly (literal and \uFFFF escape)
Diagram

graph TD
  T(["JsonInputTest"]) --> J["JsonInput"] --> I["Input"] --> R["java.io.Reader"]
  J --> E["JsonException"]
Loading
High-Level Assessment

The chosen approach—switching Input.peek()/read() to int with -1 EOF—matches java.io.Reader semantics and fully removes the possibility of colliding with any UTF-16 code unit (including U+FFFF). Alternatives like keeping char returns with an out-of-band boolean/flag or reserving a different char sentinel still risk edge cases or complicate the API, so the current change is the most robust and idiomatic fix.

Files changed (3) +41 / -22

Bug fix (2) +28 / -22
Input.javaUse int -1 EOF sentinel and return int from peek/read +13/-6

Use int -1 EOF sentinel and return int from peek/read

• Replaces the previous char-based EOF sentinel (0xFFFF) with an int EOF sentinel (-1). Updates Input.peek() and Input.read() to return int, aligning with Reader.read() and preventing collisions with valid UTF-16 code units such as U+FFFF.

java/src/org/openqa/selenium/json/Input.java

JsonInput.javaAdapt parser to int-based Input reads and cast at use sites +15/-16

Adapt parser to int-based Input reads and cast at use sites

• Updates JsonInput to consume int values from Input.read() and only cast to char when appending to StringBuilder or formatting error messages. Preserves existing parsing logic while ensuring EOF detection no longer conflicts with literal U+FFFF in strings.

java/src/org/openqa/selenium/json/JsonInput.java

Tests (1) +13 / -0
JsonInputTest.javaAdd regression test for parsing U+FFFF in JSON strings +13/-0

Add regression test for parsing U+FFFF in JSON strings

• Adds coverage asserting that a JSON string containing the literal U+FFFF character and the "\uFFFF" escape both parse successfully. Prevents regression of the prior behavior where U+FFFF was misinterpreted as end-of-input and reported as an unterminated string.

java/test/org/openqa/selenium/json/JsonInputTest.java

@qodo-code-review

qodo-code-review Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📜 Skill insights (0)

Context used
✅ Compliance rules (platform): 18 rules

Grey Divider


Remediation recommended

1. U+FFFF literal in test ✓ Resolved 🐞 Bug ☼ Reliability
Description
JsonInputTest embeds U+FFFF as a literal glyph ("�") in a Java string, which can be fragile across
environments/editors when the Java source encoding isn’t explicitly pinned for the Bazel target. The
same coverage can be achieved more portably by using a Java unicode escape ("\uFFFF") or
constructing the string from a char.
Code

java/test/org/openqa/selenium/json/JsonInputTest.java[R295-304]

+  void shouldReadU_FFFF_AsALiteralCharacterAndNotEndOfInput() {
+    // U+FFFF is a valid (non-)character; historically it collided with an in-band EOF sentinel
+    // and was mis-reported as an unterminated string.
+    try (JsonInput input = newInput("\"a�b\"")) {
+      assertThat(input.nextString()).isEqualTo("a�b");
+    }
+
+    try (JsonInput input = newInput("\"\\uFFFF\"")) {
+      assertThat(input.nextString()).isEqualTo("�");
+    }
Evidence
The new test includes a literal non-ASCII U+FFFF glyph in the Java source, and the Bazel test
target’s javacopts only sets -parameters (no explicit -encoding), which makes such literals
more fragile/less portable than using \uFFFF escapes.

java/test/org/openqa/selenium/json/JsonInputTest.java[294-305]
java/test/org/openqa/selenium/json/BUILD.bazel[18-28]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`JsonInputTest` contains a literal U+FFFF character (rendered as `�`) inside a Java string literal. This makes the test dependent on how the source file is encoded/handled by tooling, even though the test intent is simply “the input contains UTF-16 code unit 0xFFFF”.

### Issue Context
- The Bazel target for these tests does not specify `-encoding` in `javacopts`, so source encoding is not explicitly pinned at the target level.

### Fix Focus Areas
- Replace the literal `�` occurrences with a portable Java escape (e.g. `"\"a\uFFFFb\""` and expected `"a\uFFFFb"`), or build the string with `new String(new char[]{'a','\uFFFF','b'})`.

- java/test/org/openqa/selenium/json/JsonInputTest.java[295-304]
- java/test/org/openqa/selenium/json/BUILD.bazel[19-28]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment thread java/test/org/openqa/selenium/json/JsonInputTest.java
Avoids depending on the source file's byte encoding when reading the test.
@qodo-code-review

Copy link
Copy Markdown
Contributor

Code review by qodo was updated up to the latest commit de2ada0

@shs96c shs96c merged commit 3e8e318 into SeleniumHQ:trunk Jul 2, 2026
43 checks passed
@shs96c shs96c deleted the shs-json-fix-eof-sentinel branch July 2, 2026 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-java Java Bindings

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants