-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handler not invoked correctly in multi-byte UTF8 sequences #162
Comments
@rtobar Gah, I knew I forgot to get around to fixing this. I can get you a hot fix for now and I'll work on optimizing it tomorrow. |
I'm not in a rush, so don't worry about a hotfix. I just wanted to raise the issue, maybe also get the test added into the test suite for completeness. |
Alright, in that case we can incorporate the test you provided and I'll get this fixed tomorrow. @vinniefalco I think what I'm going to do is when the string is unescaped, we will directly call the handler with the byte sequence from the input. Buffered string parsing is harder, so maybe we just don't reclip the stream and save the byte sequence in a member? |
Now that number literals are implemented, I gave this library a second test round. However I found a new problem (that I previously didn't experience) that prevented me from advancing further.
The problem happens when one feeds multi-byte UTF8 sequences to a
basic_parser
one char at a time. In such situationson_string_part
is called each time a sequence finishes, but itsstring_view
parameter contains only the last byte of the sequence.I think this is better illustrated with a test, so I implemented one. See rtobar@fde197b for a test that reproduces this problem. I'm 90% sure I'm doing things correctly, but please indicate if usage is not as intended.
The text was updated successfully, but these errors were encountered: