-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utf8_string::get_num_codepoints returns number of bytes under certain circumstances with ::lut_active #16
Comments
Hi Vadim, I currently have little time to have a look, but I will, as soon as I do. All the Best |
Thanks, Jakob! |
Does it work, if you replace line 736 in tinyutf8.cpp
with
? I look forward to hearing from you! |
Hi Jakob, Now I'm unavailable (relocating) :) - sorry, will take me a day or two to get to that. I will let you know though. Best regards, Vadim |
Yes, it works with my example! Thanks a lot for fixing it. Best regards, Vadim |
Fixed bug in get_num_bytes, if lut is active (#16)
Hi Jakob, Long time no talk :) ! Sadly, the same thing strikes again, this time, with a longer string but again the multibytes are the culprits. Here is an example:
It is supposed to return 104. It returns 102. If you replace the smart quotes by The workaround for me is to comment out the block starting at the line 2701:
|
Hi Vadim! Indeed, I hope your doing well! Cheers, |
Hi Jakob, I'm good, thanks, hope you're also doing well. Yes indeed, the issue is now different, and it's definitely not the number of bytes, but the number of codepoints is absolutely incorrect. I naturally assumed the issue was in my code before zeroing in on the issue. I don't know where the number comes from, but there's definitely an issue in this part. Run the example, and you'll see. Best regards, Vadim |
I had a look to the code and it was a small ">" to ">=" issue 😃 |
Does it work for you now as well? |
Sorry about the belated reply, Jakob! (Crazy weeks, then I forgot about it.) I checked and indeed it works now. Thank you very much for the quick resolution. |
No probs! You're very welcome! |
Hi Jakob,
Looks like the part of
get_num_codepoints
that useslut_active
has issues under some circumstances. I only could fix it by disabling the entire part altogether.Here is the example:
The
.
character is at position 24. The return value is 26 due to ’ being a "compound" codepoint.Best regards,
Vadim
The text was updated successfully, but these errors were encountered: