Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asciisanitizer.Sanitizer mishandled the unicode character #127

Closed
yin1999 opened this issue Aug 11, 2023 · 1 comment · Fixed by #128
Closed

asciisanitizer.Sanitizer mishandled the unicode character #127

yin1999 opened this issue Aug 11, 2023 · 1 comment · Fixed by #128
Assignees
Labels
bug Something isn't working

Comments

@yin1999
Copy link
Contributor

yin1999 commented Aug 11, 2023

We have encountered an error when using the GitHub cli to fetch commits in an MDN repository. And I found this error is coursed by the sanitizer which is used by GitHub cli.

So I created a demo to reproduce the problem:

the plain text to transform:

�, plain text

When we read the plain text, and use transform with the the sanitizer , we would got an error:

image

But this should be the correct text. I found the error is returned here.

So I read the signature of utf8.DecodeRune. It may also return utf8.RuneError if the bytes are correctly decoded. And if there does be a decode error, it will return (RuneError, 0) or (RuneError, 1).

So we can't judge whether there is a decoding error just based on the first value returned, like the text I used above, which uses this unicode character. The sanitizer mishandled it.

@samcoe
Copy link
Contributor

samcoe commented Aug 11, 2023

@yin1999 Thanks for raising this issue. I was able to reproduce it, the sanitizer is in fact mishandling properly encoded \uFFFD unicode characters that are coming from the API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants