-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix string comparison to avoid using the collator. #87
Conversation
shown an up-to-1000x performance degradation from using the collator.
There's a complication here. The Collator does UNICODE normalization, while String.equals does not. Like if you put together "á" from "a" and the combining accent character, they won't be equal anymore with this patch, while they look exactly the same for the user. On the other hand, we don't want to know which string comes first, so maybe just doing UNICODE normalization and then calling equals will solve this. |
Also, when it comes to backward compatibility, I don't know right now what other magic Collators do... Like if different kind of dashes are equal and such. So probably this will need a |
The collator used here ( |
Quickly checked on Windows 10, Oracle Java 8... it does (de)normalization apparently:
If this either worked or not based on if it's OpenJDK, or Oracle, or who knows based on what other factor, that's another issue to fix in itself. If (de)normalization is not terribly slow, then probably I would chose doing that. |
Just to note - a very simplistic benchmark (https://gist.github.com/nolaviz/94455756da704892c4dca377062ab82a) shows the following results:
i.e., |
This should be more compatible with the previous behavior (Collator-based).
The last thing we have to figure out if we can always do this (normalize then |
I see you made this activated by |
I haven't found any concrete backward compatibility issues. But - better safe than sorry. |
…n to avoid using the collator
Benchmarking at Google has shown an up-to-1000x performance degradation from using the collator.