Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String.IndexOf(string, int) is VERY slow on linux #41284

Closed
laloutre87 opened this issue Aug 24, 2020 · 9 comments
Closed

String.IndexOf(string, int) is VERY slow on linux #41284

laloutre87 opened this issue Aug 24, 2020 · 9 comments
Labels

Comments

@laloutre87
Copy link

laloutre87 commented Aug 24, 2020

Description

Code that was running fine previously on mono has became extremely slow with .Net Core 3.1 .
I found the bottleneck to be the String.IndexOf(string, int ) function.

Configuration

dotnet --version
3.1.401

debian_version 10.4
4.19.0-9-amd64 #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07) x86_64 GNU/Linux

Regression?

Regression compared to Mono 6.10.

Data

Small test program :
BugIndexOf.zip

Which displays on my windows laptop :

IndexOf(string, ... ) , found ";" 10000 times in 0,0206797 seconds
IndexOf(char, ... ) , found ';' 10000 times in 0,0006354 seconds

And on the linux server :

IndexOf(string, ... ) , found ";" 10000 times in 379,0305622 seconds
IndexOf(char, ... ) , found ';' 10000 times in 0,0006657 seconds

And the real-life scenario that affected me was the usage of this small piece of code (granted : not very optimized code !) made to deserialize data serialized by PHP, where deserializing the data took 70minutes instead of less than 2 seconds previously (Mono 6.10, same machine, same OS version, same code, same data). The code can be found here : https://gist.github.com/xiangwan/1225981/1f6d12679fe510ff241468561e0f80fa757f8db4

@laloutre87 laloutre87 added the tenet-performance Performance related issue label Aug 24, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Aug 24, 2020
@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost
Copy link

ghost commented Aug 24, 2020

Tagging subscribers to this area: @tarekgh, @safern, @krwq
See info in area-owners.md if you want to be subscribed.

@GrabYourPitchforks
Copy link
Member

Marking this as globalization because I'd bet this is a culture-aware comparison, and we know that ICU is by-design slow when it comes to calling IndexOf in a loop.

@tarekgh
Copy link
Member

tarekgh commented Aug 24, 2020

This is covered in the other issue #40942

@tarekgh tarekgh closed this as completed Aug 24, 2020
@tarekgh tarekgh removed the untriaged New issue has not been triaged by the area owner label Aug 24, 2020
@laloutre87
Copy link
Author

Yes, it's globalization if I believe my "dotnet trace" , but #40942 makes it sound like it's not going to be fixed in the 3.x branch. Isn't 3.1 supposed to be LTS ?

@tarekgh
Copy link
Member

tarekgh commented Aug 24, 2020

Yes, it's globalization if I believe my "dotnet trace" , but #40942 makes it sound like it's not going to be fixed in the 3.x branch. Isn't 3.1 supposed to be LTS ?

We expect to fix this will be a big change which can be risky for 3.1. Also, as we depend on ICU, so we are limited by the ICU performance too. I am wondering, did you try to use IndexOf with Ordinal option and look if this can help you in your scenario?

@laloutre87
Copy link
Author

Well, in the case I was mentioning, which was a script from someone else on github for deserializing data serialized by php, I could replace by .indexOf(char , int ) , the use of .IndexOf(String, int ) was a mistake in the first place. So this specific problem is solved, but .indexOf ( ) is quite the central function in any language, and when it's suddently 10000x slower than can be expected, it hurts a bit -_-' .
And, yes it's always possible, even if quite long, to go around the whole codebase to switch to "Ordinal" comparaison mode ... in your own code. 3rd party librairies however ....

@tarekgh
Copy link
Member

tarekgh commented Aug 24, 2020

10000x slower than can be expected

I think you are comparing 2 different things here. you cannot assume using char in the functionality to be the base line you measure against when using 'string'. #40942 has more accurate numbers. I agree IndexOf still slow though so I am not arguing that.

@laloutre87
Copy link
Author

I was comparing to the speed of .IndexOf(string, ... ) on windows (on a slow laptop ^^), and also to my real life scenario which was less than 2 sec deserialization on mono, and more than 70 minutes (i cut at this point) with the same code, same server, with .net core .

And I do understand that the windows version doesn't have (yet), this "ICU" thing, which is apparently cause the slow down, so it may feel unfair to compare, but people running .net core on linux are likely to be (at some point, when they will feel secure enough to migrate, it took me quite a while) the same people that are running programs with Mono since years, and they are going to expect similar performances when migrating their code.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants