Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect segment counting #18

Closed
MikeCarbone opened this issue Oct 5, 2021 · 4 comments · Fixed by #20
Closed

Incorrect segment counting #18

MikeCarbone opened this issue Oct 5, 2021 · 4 comments · Fixed by #20
Labels

Comments

@MikeCarbone
Copy link

Hello! Just wanted to let you know that the segment counting doesn't seem to be done correctly. I was building my own implementation of a segment counter and stumbled upon an innaccuracy.

Take this ridiculous string for example:
this is a ]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]😘

The tool says this is 3 segments.

However, my understanding is that it should be 2 segments. The length of the string is 86, with the emoji triggering UCS-2 encoding. However once switched to UCS-2, the special GSM-7 bracket should now only count as one character with a maximum per segment of 67 characters.

I believe the tool may be double counting the special GSM-7 counting, then not adjusting once UCS-2 is triggered.

I sent that string to Twilio and it does indeed count as two segments, not three.

Anyways, not terribly urgent, just figured I'd let you all know. Thanks for the tool! Really helped me wrap my head around the topic.

@vernig
Copy link
Contributor

vernig commented Oct 5, 2021

Hey @MikeCarbone, you are absolutely right! Thanks for raising this issue. Let me work on a fix.

@MikeCarbone
Copy link
Author

Nice! Thanks @vernig! I created the same problem in my implementation originally. I ended up fixing it by keeping track of normal GSM-7 characters and special GSM-7 characters in two separate counts, then doing the math after running through the string. If it was GSM-7, I'd multiply the special-GSM count * 2, then add it with the normal count. If it was UCS-2 at the end, I'd just add the counts together for a total.

You probably don't need that, but figured I'd share. Thanks for the quick response! 👍

@vernig
Copy link
Contributor

vernig commented Oct 5, 2021

I can see what you mean @MikeCarbone and thanks for sharing your implementation. I may adopt a very similar solution. Let me do some more test and get back to you

@vernig
Copy link
Contributor

vernig commented Oct 12, 2021

🎉 This issue has been resolved in version 1.1.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants