shared character toLowerCase function #809

pjfanning · 2025-10-01T15:11:45Z

current code uses 3 different ways to lowercase a char

parboiled2 CharUtils method
an internal method on HttpHeaderParser
uses Java built-in Character.toLowerCase

I did a jmh benchmark in my pekko-bench project
https://github.com/pjfanning/pekko-bench/blob/main/src/main/scala/example/LowerCaseBench.scala

And this showed that function in HttpHeaderParser was better than or matched the perf of the others on all Java versions.
Parboiled2 was always slower. The Java built-in was as good for Java 21 and 25 but slowest approach in java 17 and below.

This change switches all code to use the pekko-http internal method.

pjfanning · 2025-10-01T15:32:33Z

The link validator issue seems a temp issue - the supposed broken link works for me.

raboof

I like this change, but it'd be good to double-check that this doesn't allow sneaking in harmful characters that were sanitized out before. How did parboiled behave for non-7-bit-ascii characters?

raboof · 2025-10-01T16:05:09Z

http-core/src/main/scala/org/apache/pekko/http/impl/model/parser/CommonActions.scala


        (char1 | char2) < 0x80 &&
-        Character.toLowerCase(char1) == Character.toLowerCase(char2) &&
+        CharUtils.toLowerCase(char1) == CharUtils.toLowerCase(char2) &&


👍 clearly safe

Actually, in hindsight, this might be the riskiest change. CharUtils.toLowerCase only lower cases the 26 standard letters of the latin alphabet (the ones in 7 bit ASCII table) while java.lang.Character.toLowerCase seems to support other letters that appear in Unicode table. Let me re-review this change.

actually with (char1 | char2) < 0x80 in the check, I think we can assume that it doesn't matter that Character.toLowerCase can handle non-ASCII chars.

raboof · 2025-10-01T16:06:49Z

http-core/src/main/scala/org/apache/pekko/http/impl/model/parser/UriParser.scala

  ///////////// helpers /////////////

-  private def appendLowered(): Rule0 = rule { run(sb.append(CharUtils.toLowerCase(lastChar))) }
+  private def appendLowered(): Rule0 = rule { run(sb.append(CU.toLowerCase(lastChar))) }


can we assume lastChar is 7-bit ascii here? or are we OK with letting other characters through?

Parboiled CharUtils only deals with the 26 standard letters of the latin alphabet (the ones in 7 bit ASCII table) - same as the pekko-http internal toLowerCase.

pjfanning · 2025-10-02T11:19:24Z

@raboof I've looked again and I think these changes are safe. wdyt?

raboof · 2025-10-06T20:49:53Z

I'm not sure why d55eed0 makes sense? It seems 'correct' but looks like it'd be slower if anything?

pjfanning · 2025-10-06T21:03:43Z

I'm not sure why d55eed0 makes sense? It seems 'correct' but looks like it'd be slower if anything?

The main part d55eed0 was to change to calling byteChar function. This should get inlined by Hotspot compiler. It is used in lots of places already and this usage seems strange not to use it. I can remove this change though and come back to it. Maybe the correct thing to do is actually make the inlining of byteChar explicit (by declaring inline def byteChar and @inline def byteChar which means slitting the code into Scala 2 and 3 versions of the package.scala).

raboof

ok!

Update CharUtils.scala Update CharUtils.scala use byteChar Remove unused import from HttpHeaderParser.scala revert change to use byteChar fn Update HttpHeaderParser.scala

pjfanning requested review from He-Pin and raboof October 1, 2025 15:50

raboof reviewed Oct 1, 2025

View reviewed changes

pjfanning marked this pull request as draft October 1, 2025 18:44

pjfanning marked this pull request as ready for review October 1, 2025 20:49

pjfanning added this to the 2.0.0 milestone Oct 2, 2025

raboof approved these changes Oct 6, 2025

View reviewed changes

shared character toLowerCase function

ba49e3b

Update CharUtils.scala Update CharUtils.scala use byteChar Remove unused import from HttpHeaderParser.scala revert change to use byteChar fn Update HttpHeaderParser.scala

pjfanning force-pushed the lowercase branch from 28ee018 to ba49e3b Compare October 6, 2025 21:17

pjfanning merged commit a6b03ea into apache:main Oct 6, 2025
5 checks passed

pjfanning deleted the lowercase branch October 6, 2025 21:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

shared character toLowerCase function #809

shared character toLowerCase function #809

Uh oh!

pjfanning commented Oct 1, 2025

Uh oh!

pjfanning commented Oct 1, 2025

Uh oh!

raboof left a comment

Uh oh!

raboof Oct 1, 2025

Uh oh!

pjfanning Oct 1, 2025

Uh oh!

pjfanning Oct 1, 2025

Uh oh!

raboof Oct 1, 2025

Uh oh!

pjfanning Oct 1, 2025

Uh oh!

pjfanning commented Oct 2, 2025

Uh oh!

raboof commented Oct 6, 2025

Uh oh!

pjfanning commented Oct 6, 2025

Uh oh!

raboof left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shared character toLowerCase function #809

shared character toLowerCase function #809

Uh oh!

Conversation

pjfanning commented Oct 1, 2025

Uh oh!

pjfanning commented Oct 1, 2025

Uh oh!

raboof left a comment

Choose a reason for hiding this comment

Uh oh!

raboof Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

pjfanning Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

pjfanning Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

raboof Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

pjfanning Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

pjfanning commented Oct 2, 2025

Uh oh!

raboof commented Oct 6, 2025

Uh oh!

pjfanning commented Oct 6, 2025

Uh oh!

raboof left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants