Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use StringBuilder in String.replace (2x faster) #3714

Closed
wants to merge 1 commit into from

Conversation

fvasco
Copy link
Contributor

@fvasco fvasco commented Sep 10, 2020

The current String.replace implementation use splitToSequence, it is pretty slow.

I made a test project to improve String.replace's performance, available here: https://github.com/fvasco/kotlin-string-replace/blob/master/src/main/kotlin/org/sample/MyBenchmark.kt

Results are:

Benchmark                     Mode  Cnt        Score       Error  Units
MyBenchmark.replaceNew       thrpt    5  1412205,791 ± 12644,734  ops/s
MyBenchmark.replaceOriginal  thrpt    5   697439,698 ±  5664,105  ops/s
MyBenchmark.replaceRegex     thrpt    5  1098579,545 ± 11091,188  ops/s

The regex version is a little faster, but it can be considered in order to reduce the bytecode.
The new version is the case-insenditive translation of the Java native version.
In both version the platform implementation has been preferred in order to delegate the maintenance.

Original discussion here: https://discuss.kotlinlang.org/t/string-replace-implementation-is-very-poor/19026

@ilya-g
Copy link
Member

ilya-g commented Sep 10, 2020

Thanks for taking time and thoroughly benchmarking this. Could you also open a corresponding issue in a tracker?

@ilya-g
Copy link
Member

ilya-g commented Sep 10, 2020

Could you also post benchmark results with ignoreCase = false?

@fvasco
Copy link
Contributor Author

fvasco commented Sep 10, 2020

Your welcome,
see https://youtrack.jetbrains.com/issue/KT-41799

@fvasco
Copy link
Contributor Author

fvasco commented Sep 11, 2020

Could you also post benchmark results with ignoreCase = false?

Benchmark                     Mode  Cnt         Score       Error  Units
MyBenchmark.replaceNew       thrpt    5  45009927,400 ± 50963,601  ops/s
MyBenchmark.replaceOriginal  thrpt    5   4118892,481 ± 21614,632  ops/s
MyBenchmark.replaceRegex     thrpt    5  44966941,400 ± 85611,686  ops/s

Edit: in case of no substitution

@fvasco
Copy link
Contributor Author

fvasco commented Sep 11, 2020

Could you also post benchmark results with ignoreCase = false?

Benchmark                     Mode  Cnt        Score        Error  Units
MyBenchmark.replaceNew       thrpt    5  5812212,786 ±  64853,335  ops/s
MyBenchmark.replaceOriginal  thrpt    5  2636245,811 ± 188415,757  ops/s
MyBenchmark.replaceRegex     thrpt    5  5761148,430 ±  39618,165  ops/s

I pushed the relative commit to the repository

val oldValueLength = oldValue.length
val searchStep = oldValueLength.coerceAtLeast(1)
val newLengthHint = length - oldValueLength + newValue.length
if (newLengthHint < 0) throw OutOfMemoryError()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a comment explaining how this overflow proves that it's impossible to build the desired string, and maybe even include some of the length information in a message in the OutOfMemoryError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a comment explaining how this overflow proves that it's impossible to build the desired string

Something like: "detect Int overflow"?

maybe even include some of the length information in a message in the OutOfMemoryError

Can you provide a suggestion?
Please consider that every append in this function can cause a similar error, this hint works wello only for a single replacement.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expand on it a bit more: "even a single replacement would create a String with more than Int.MAX_VALUE characters, not allowed"
With your point about subsequent replacements causing the same error, I suppose that the errors should appear uniform, so OutOfMemoryError() is fine.

@ilya-g
Copy link
Member

ilya-g commented Sep 18, 2020

I've run the provided benchmark and got somewhat similar results. After that I decided to perform a more extensive benchmark: i.e to test each implementation individually (manual, regex-based, delegating to the platform, the current one in stdlib) varying the following parameters: JDK version (8 or 11), the length of an input string, the number of occurrences of a string being searched, case sensitivity, the amount of backtracking required.

The benchmarks and their results can be found here: https://github.com/ilya-g/string-replace-benchmark

Based on these findings I'd recommend taking the manual implementation for case-sensitive replacement and the regex-based (but with manual replacement, see replaceRegex1) for case-insensitive replacement.

@ilya-g ilya-g self-assigned this Sep 19, 2020
@ilya-g
Copy link
Member

ilya-g commented Sep 28, 2020

So, if there are no objections, I'll take the proposed implementation by @fvasco and modify it according to the conclusions from the last benchmark.

Based on these findings I'd recommend taking the manual implementation for case-sensitive replacement and the regex-based (but with manual replacement, see replaceRegex1) for case-insensitive replacement.

@fvasco
Copy link
Contributor Author

fvasco commented Sep 28, 2020

Regardless the benchmark results, I wish to add that the platform implementation reduces the binary size and delegates the responsibility.

@ilya-g
Copy link
Member

ilya-g commented Oct 3, 2020

I've merged it manually and applied the proposed changes on top of it: fafb4a7~2...fafb4a7

Thanks again for the PR.

@ilya-g ilya-g closed this Oct 3, 2020
@fvasco
Copy link
Contributor Author

fvasco commented Oct 5, 2020

...and kudos to all discussion creator and contributors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants