New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use StringBuilder in String.replace (2x faster) #3714
Conversation
Thanks for taking time and thoroughly benchmarking this. Could you also open a corresponding issue in a tracker? |
Could you also post benchmark results with |
Your welcome, |
Edit: in case of no substitution |
I pushed the relative commit to the repository |
val oldValueLength = oldValue.length | ||
val searchStep = oldValueLength.coerceAtLeast(1) | ||
val newLengthHint = length - oldValueLength + newValue.length | ||
if (newLengthHint < 0) throw OutOfMemoryError() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a comment explaining how this overflow proves that it's impossible to build the desired string, and maybe even include some of the length information in a message in the OutOfMemoryError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a comment explaining how this overflow proves that it's impossible to build the desired string
Something like: "detect Int
overflow"?
maybe even include some of the length information in a message in the OutOfMemoryError
Can you provide a suggestion?
Please consider that every append
in this function can cause a similar error, this hint works wello only for a single replacement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expand on it a bit more: "even a single replacement would create a String with more than Int.MAX_VALUE characters, not allowed"
With your point about subsequent replacements causing the same error, I suppose that the errors should appear uniform, so OutOfMemoryError()
is fine.
I've run the provided benchmark and got somewhat similar results. After that I decided to perform a more extensive benchmark: i.e to test each implementation individually (manual, regex-based, delegating to the platform, the current one in stdlib) varying the following parameters: JDK version (8 or 11), the length of an input string, the number of occurrences of a string being searched, case sensitivity, the amount of backtracking required. The benchmarks and their results can be found here: https://github.com/ilya-g/string-replace-benchmark Based on these findings I'd recommend taking the manual implementation for case-sensitive replacement and the regex-based (but with manual replacement, see |
So, if there are no objections, I'll take the proposed implementation by @fvasco and modify it according to the conclusions from the last benchmark.
|
Regardless the benchmark results, I wish to add that the platform implementation reduces the binary size and delegates the responsibility. |
I've merged it manually and applied the proposed changes on top of it: fafb4a7~2...fafb4a7 Thanks again for the PR. |
...and kudos to all discussion creator and contributors |
The current
String.replace
implementation usesplitToSequence
, it is pretty slow.I made a test project to improve
String.replace
's performance, available here: https://github.com/fvasco/kotlin-string-replace/blob/master/src/main/kotlin/org/sample/MyBenchmark.ktResults are:
The regex version is a little faster, but it can be considered in order to reduce the bytecode.
The
new
version is the case-insenditive translation of the Java native version.In both version the platform implementation has been preferred in order to delegate the maintenance.
Original discussion here: https://discuss.kotlinlang.org/t/string-replace-implementation-is-very-poor/19026