Skip to content

Optimize lstripChars, rstripChars, and stripChars via specialization#236

Merged
stephenamar-db merged 4 commits intodatabricks:masterfrom
JoshRosen:specialize-more-builtins
Dec 12, 2024
Merged

Optimize lstripChars, rstripChars, and stripChars via specialization#236
stephenamar-db merged 4 commits intodatabricks:masterfrom
JoshRosen:specialize-more-builtins

Conversation

@JoshRosen
Copy link
Copy Markdown
Contributor

@JoshRosen JoshRosen commented Dec 12, 2024

This PR optimizes lstripChars / rstripChars / stripChars built-in functions by using the specialization framework from #119 / 0bd255a to pre-compile and re-use Pattern instances when the replacement / strip arguments are constants.

In Java, String.replaceAll() compiles and uses a Pattern under the hood and this is relatively expensive; specialization lets us save this cost when the chars-to-be-stripped are constant.

Testing

Correctness: ran all tests with a manual change to disable static function application optimizations (a prerequisite required to achieve full test coverage, as the static application prevents specialization from kicking in during most tests). In a followup, I think we should explore adding flags for optimizer features and running all tests with/without optimization.

Performance: ran benchmarks on a large file and with this change I save ~11% of allocated bytes and ~8% of wall time. I ran that same file through RunProfiler and saw a large speedup in stripChars, costing ~619ns/call before and ~154ns/call after.

@JoshRosen JoshRosen changed the title Optimize strReplace, lstripChars, rstripChars, and stripChars via specialization Optimize lstripChars, rstripChars, and stripChars via specialization Dec 12, 2024
@JoshRosen
Copy link
Copy Markdown
Contributor Author

I considered specializing strReplace but doing so doesn't help (and is actually slower) on JDK 9+ because https://bugs.openjdk.org/browse/JDK-8058779 optimized java.lang.String.replace(CharSequence, CharSequence).

@stephenamar-db stephenamar-db merged commit f5bf391 into databricks:master Dec 12, 2024
@JoshRosen JoshRosen deleted the specialize-more-builtins branch December 31, 2024 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants