Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some non-ASCII input mangled silently #4

Open
escalonn opened this issue Jun 5, 2022 · 1 comment
Open

Some non-ASCII input mangled silently #4

escalonn opened this issue Jun 5, 2022 · 1 comment

Comments

@escalonn
Copy link

escalonn commented Jun 5, 2022

Due to a broken check in the codegolf function, non-Latin-1 characters (all those above U+00FF) at odd-numbered positions in the input string have their code point silently truncated to 8 bytes, instead of throwing an error so the user can be notified.

To Reproduce

  1. Enter 'ज़' into input box, which has a non-Latin-1 character in index 1.
  2. Click on "Golf it" button
  3. Observe printed output exec(bytes('嬧‧','u16')[2:])
  4. Verify that bytes('嬧‧','u16')[2:] evaluates to b"'[' ", which does not match the input code.

Expected behavior
Error message displayed about non-ASCII characters, as it is for the input ' ज़' (space added to put the character into an even-numbered position).

Environment

  • OS: Windows 10
  • Browser: Brave Version 1.39.111 Chromium: 102.0.5005.61 (Official Build) (64-bit)

Additional context
The code causing the issue is here
Effectively c1 (the even-numbered character) is checked but c2 is ignored and subsequently truncated.

Also
Handling of characters from the Latin-1 Supplement block (U+0080 to U+00FF) by this site is unclear. These are non-ASCII characters, but is there a reason to ban them from the input? Shouldn't the check really be > 255 instead of > 127?

@clemg
Copy link
Owner

clemg commented Jun 24, 2022

Good catch! This is indeed a problem.

Would you like to submit a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants