Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming multi-byte UTF8 characters not being parsed correctly #11

Closed
jlank opened this issue Mar 14, 2013 · 2 comments
Closed

Streaming multi-byte UTF8 characters not being parsed correctly #11

jlank opened this issue Mar 14, 2013 · 2 comments

Comments

@jlank
Copy link

jlank commented Mar 14, 2013

Hey @dscape! I was using another streaming parser (jsonparse) when I found this class of bug, and checked clarinet to see if it was present, turns out it is. To save github some bytes of storage I'll link to the bug description at creationix/jsonparse.

In a nutshell, when you do .toString() on a streamed buffer, if the stream breaks between a multi-byte utf8 character, it will not be able to properly convert the split character, and it ends up putting two replacement characters in the stream instead. I haven't devised a specific solution for clarinet yet, but I know what needs to happen.

I also wrote a little demo repo so you can see the bug in action.

guess I didn't save those storage bytes after all.

@jlank
Copy link
Author

jlank commented Mar 14, 2013

Oh also if you are aware of any other semi-popular streaming json parsers I'd be down to check those out for the same issue. I did a search on npm but came out of it with basically just clarinet and jsonparse as the big ones being used.

@dscape
Copy link
Owner

dscape commented Mar 16, 2013

@jlank can you please reopen this with a pull request?

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants