fix: lexing utf-8 characters #2836

tstirrat15 · 2026-01-21T16:31:52Z

Description

See #2835. A user encountered an issue where non-ASCII characters in string literals were causing a lexer panic. This fixes that by changing how peeks and string literal lexing work.

The issue was that the code implicitly made an assumption that all of the peeked or accepted runes were the same width, which isn't necessarily going to be the case in unicode strings; l.backup() will use the currently-set width, rather than the width of the rune that is being backed up over. Manually tracking the start position works around this problem.

Changes

Add parser test
Add lexer test
Change lexer logic to restore to a starting position rather than calling backup multiple times

Testing

Review. See that tests pass.

codecov · 2026-01-21T16:35:20Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.45%. Comparing base (737bdef) to head (a5edb21).

❌ Your project check has failed because the head coverage (74.45%) is below the target coverage (75.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2836      +/-   ##
==========================================
- Coverage   74.48%   74.45%   -0.03%     
==========================================
  Files         482      482              
  Lines       56555    56560       +5     
==========================================
- Hits        42119    42104      -15     
- Misses      11482    11497      +15     
- Partials     2954     2959       +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

josephschorr

LGTM

tstirrat15 requested a review from a team as a code owner January 21, 2026 16:31

github-actions bot added area/schema Affects the Schema Language area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) labels Jan 21, 2026

tstirrat15 force-pushed the 2835-lexing-utf-8-chars branch 2 times, most recently from 8e51a5d to a591d7c Compare January 21, 2026 16:46

fix: lexing utf-8 characters

a5edb21

tstirrat15 force-pushed the 2835-lexing-utf-8-chars branch from a591d7c to a5edb21 Compare January 21, 2026 16:56

josephschorr approved these changes Jan 21, 2026

View reviewed changes

tstirrat15 enabled auto-merge January 21, 2026 16:57

tstirrat15 added this pull request to the merge queue Jan 21, 2026

github-merge-queue bot pushed a commit that referenced this pull request Jan 21, 2026

fix: lexing utf-8 characters (#2836)

6b2b2ea

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 21, 2026

tstirrat15 added this pull request to the merge queue Jan 21, 2026

Merged via the queue into main with commit dc79633 Jan 21, 2026
44 of 45 checks passed

tstirrat15 deleted the 2835-lexing-utf-8-chars branch January 21, 2026 17:31

github-actions bot locked and limited conversation to collaborators Jan 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: lexing utf-8 characters #2836

fix: lexing utf-8 characters #2836

Uh oh!

tstirrat15 commented Jan 21, 2026 •

edited

Loading

Uh oh!

codecov bot commented Jan 21, 2026 •

edited

Loading

Uh oh!

josephschorr left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: lexing utf-8 characters #2836

fix: lexing utf-8 characters #2836

Uh oh!

Conversation

tstirrat15 commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Testing

Uh oh!

codecov bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

josephschorr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tstirrat15 commented Jan 21, 2026 •

edited

Loading

codecov bot commented Jan 21, 2026 •

edited

Loading