Skip to content

Conversation

@tstirrat15
Copy link
Contributor

@tstirrat15 tstirrat15 commented Jan 21, 2026

Fixes #2835

Description

See #2835. A user encountered an issue where non-ASCII characters in string literals were causing a lexer panic. This fixes that by changing how peeks and string literal lexing work.

The issue was that the code implicitly made an assumption that all of the peeked or accepted runes were the same width, which isn't necessarily going to be the case in unicode strings; l.backup() will use the currently-set width, rather than the width of the rune that is being backed up over. Manually tracking the start position works around this problem.

Changes

  • Add parser test
  • Add lexer test
  • Change lexer logic to restore to a starting position rather than calling backup multiple times

Testing

Review. See that tests pass.

@tstirrat15 tstirrat15 requested a review from a team as a code owner January 21, 2026 16:31
@github-actions github-actions bot added area/schema Affects the Schema Language area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools) labels Jan 21, 2026
@codecov
Copy link

codecov bot commented Jan 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.45%. Comparing base (737bdef) to head (a5edb21).

❌ Your project check has failed because the head coverage (74.45%) is below the target coverage (75.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2836      +/-   ##
==========================================
- Coverage   74.48%   74.45%   -0.03%     
==========================================
  Files         482      482              
  Lines       56555    56560       +5     
==========================================
- Hits        42119    42104      -15     
- Misses      11482    11497      +15     
- Partials     2954     2959       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tstirrat15 tstirrat15 force-pushed the 2835-lexing-utf-8-chars branch 2 times, most recently from 8e51a5d to a591d7c Compare January 21, 2026 16:46
@tstirrat15 tstirrat15 force-pushed the 2835-lexing-utf-8-chars branch from a591d7c to a5edb21 Compare January 21, 2026 16:56
Copy link
Member

@josephschorr josephschorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tstirrat15 tstirrat15 enabled auto-merge January 21, 2026 16:57
@tstirrat15 tstirrat15 added this pull request to the merge queue Jan 21, 2026
github-merge-queue bot pushed a commit that referenced this pull request Jan 21, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 21, 2026
@tstirrat15 tstirrat15 added this pull request to the merge queue Jan 21, 2026
Merged via the queue into main with commit dc79633 Jan 21, 2026
44 of 45 checks passed
@tstirrat15 tstirrat15 deleted the 2835-lexing-utf-8-chars branch January 21, 2026 17:31
@github-actions github-actions bot locked and limited conversation to collaborators Jan 21, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area/schema Affects the Schema Language area/tooling Affects the dev or user toolchain (e.g. tests, ci, build tools)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Panic: "slice bounds out of range" in lexStringLiteral when parsing UTF-8/Chinese characters in Schema

3 participants