-
Notifications
You must be signed in to change notification settings - Fork 48
shared character toLowerCase function #809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The link validator issue seems a temp issue - the supposed broken link works for me. |
raboof
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change, but it'd be good to double-check that this doesn't allow sneaking in harmful characters that were sanitized out before. How did parboiled behave for non-7-bit-ascii characters?
|
|
||
| (char1 | char2) < 0x80 && | ||
| Character.toLowerCase(char1) == Character.toLowerCase(char2) && | ||
| CharUtils.toLowerCase(char1) == CharUtils.toLowerCase(char2) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 clearly safe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, in hindsight, this might be the riskiest change. CharUtils.toLowerCase only lower cases the 26 standard letters of the latin alphabet (the ones in 7 bit ASCII table) while java.lang.Character.toLowerCase seems to support other letters that appear in Unicode table. Let me re-review this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually with (char1 | char2) < 0x80 in the check, I think we can assume that it doesn't matter that Character.toLowerCase can handle non-ASCII chars.
| ///////////// helpers ///////////// | ||
|
|
||
| private def appendLowered(): Rule0 = rule { run(sb.append(CharUtils.toLowerCase(lastChar))) } | ||
| private def appendLowered(): Rule0 = rule { run(sb.append(CU.toLowerCase(lastChar))) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we assume lastChar is 7-bit ascii here? or are we OK with letting other characters through?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parboiled CharUtils only deals with the 26 standard letters of the latin alphabet (the ones in 7 bit ASCII table) - same as the pekko-http internal toLowerCase.
|
@raboof I've looked again and I think these changes are safe. wdyt? |
|
I'm not sure why d55eed0 makes sense? It seems 'correct' but looks like it'd be slower if anything? |
The main part d55eed0 was to change to calling byteChar function. This should get inlined by Hotspot compiler. It is used in lots of places already and this usage seems strange not to use it. I can remove this change though and come back to it. Maybe the correct thing to do is actually make the inlining of byteChar explicit (by declaring |
raboof
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok!
Update CharUtils.scala Update CharUtils.scala use byteChar Remove unused import from HttpHeaderParser.scala revert change to use byteChar fn Update HttpHeaderParser.scala
current code uses 3 different ways to lowercase a char
I did a jmh benchmark in my pekko-bench project
https://github.com/pjfanning/pekko-bench/blob/main/src/main/scala/example/LowerCaseBench.scala
And this showed that function in HttpHeaderParser was better than or matched the perf of the others on all Java versions.
Parboiled2 was always slower. The Java built-in was as good for Java 21 and 25 but slowest approach in java 17 and below.
This change switches all code to use the pekko-http internal method.