Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Streaming multi-byte UTF8 characters not being parsed correctly #13

Open
jlank opened this Issue Mar 13, 2013 · 1 comment

Comments

Projects
None yet
2 participants
Contributor

jlank commented Mar 13, 2013

When streaming data into jsonparse that consists of multi-byte utf8 characters, if a data chunk splits a multi-byte character, jsonparse does not properly reconcile the character between data events. I wrote a quick demo repo to show this behavior and started writing blog post to explain the issue in more detail (not finished). In the meantime check the demo repo out, it has the current implementation and proposed patch working. For more context on this issue see this thread with @mikeal discussing where the "proper" place to reconcile / parse mutli-byte utf8 characters is. I already have a proposed fix written up for jsonparse with test cases, but wanted to open an issue first and get your feedback before I made a PR.

Thanks!

Owner

creationix commented Mar 13, 2013

I don't see any problem with the patch. Go ahead and send a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment