Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handler not invoked correctly in multi-byte UTF8 sequences #162

Closed
rtobar opened this issue Aug 17, 2020 · 3 comments
Closed

Handler not invoked correctly in multi-byte UTF8 sequences #162

rtobar opened this issue Aug 17, 2020 · 3 comments
Assignees
Labels

Comments

@rtobar
Copy link

rtobar commented Aug 17, 2020

Now that number literals are implemented, I gave this library a second test round. However I found a new problem (that I previously didn't experience) that prevented me from advancing further.

The problem happens when one feeds multi-byte UTF8 sequences to a basic_parser one char at a time. In such situations on_string_part is called each time a sequence finishes, but its string_view parameter contains only the last byte of the sequence.

I think this is better illustrated with a test, so I implemented one. See rtobar@fde197b for a test that reproduces this problem. I'm 90% sure I'm doing things correctly, but please indicate if usage is not as intended.

@sdkrystian sdkrystian self-assigned this Aug 17, 2020
@sdkrystian sdkrystian added the Bug label Aug 17, 2020
@sdkrystian
Copy link
Member

@rtobar Gah, I knew I forgot to get around to fixing this. I can get you a hot fix for now and I'll work on optimizing it tomorrow.

@rtobar
Copy link
Author

rtobar commented Aug 17, 2020

I'm not in a rush, so don't worry about a hotfix. I just wanted to raise the issue, maybe also get the test added into the test suite for completeness.

@sdkrystian
Copy link
Member

sdkrystian commented Aug 17, 2020

Alright, in that case we can incorporate the test you provided and I'll get this fixed tomorrow.

@vinniefalco I think what I'm going to do is when the string is unescaped, we will directly call the handler with the byte sequence from the input. Buffered string parsing is harder, so maybe we just don't reclip the stream and save the byte sequence in a member?

sdkrystian added a commit to sdkrystian/json that referenced this issue Aug 21, 2020
sdkrystian added a commit to sdkrystian/json that referenced this issue Aug 21, 2020
sdkrystian added a commit to sdkrystian/json that referenced this issue Aug 21, 2020
sdkrystian added a commit to sdkrystian/json that referenced this issue Aug 22, 2020
sdkrystian added a commit to sdkrystian/json that referenced this issue Aug 22, 2020
sdkrystian added a commit to sdkrystian/json that referenced this issue Aug 22, 2020
sdkrystian added a commit to sdkrystian/json that referenced this issue Aug 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants