Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major decrease in speed after upgrading from v0.1.5 to v0.1.7 #77

Closed
waltertamboer opened this issue Jul 13, 2018 · 10 comments
Closed

Major decrease in speed after upgrading from v0.1.5 to v0.1.7 #77

waltertamboer opened this issue Jul 13, 2018 · 10 comments

Comments

@waltertamboer
Copy link

We have an application where we generate a comparison between two files. This file is pretty big and on v0.1.5 this took 21 seconds. After upgrading to v0.1.7 the comparison took 26 minutes.

After looking at v0.1.5...v0.1.7 - I suspect the multibyte functions to be the cause of this but I did not dug any further into this issue. There probably is some inefficient part in the code that causes this insane increase of time but I have not enough domain knowledge to pinpoint what's going on.

I'm more than happy to help out on this. Just let me know what I can do.

@SavageTiger
Copy link
Contributor

SavageTiger commented Jul 13, 2018

Hi Walter,

I am pretty sure you are right that the multibyte support is causing a performance penalty.
This was also noted when accepting the pull request, I think this is a trade of that has to be made.

My advice for now, make sure you are using atleast php7, maybe consider doing the diffing in a CLI or as a microservice.

@waltertamboer
Copy link
Author

waltertamboer commented Jul 13, 2018

Thanks for your advise Sven. We are indeed on PHP 7 and the diff is generated in the background via a Beanstalkd job. The problem is that it's unacceptable for our customers to wait this long so I'd love to find a solution for this.

@jschroed91
Copy link
Member

@SavageTiger @waltertamboer Do we think adding in a config option to disable multibyte support is something worth doing?

I think that would be an option here that is something we could do immediately, but not the ideal solution - ideal solution would be to make some of the performance improvements that require larger-scale changes... those would take some time

@etaunknown
Copy link

Some users with timeout issues reverted back the the 0.1.5 with significant performance increases with the following hack:

protected function isOnlyWhitespace($str) { // Slightly faster then using preg_match // return $str !== '' && (mb_strlen(trim($str)) === 0); return $str !== '' && (strlen(trim($str)) === 0); }

Some profiling info that pin-pointed the suspected issue

https://www.drupal.org/project/diff/issues/3030979#comment-12964298

@SavageTiger
Copy link
Contributor

See PR: #81

jschroed91 pushed a commit that referenced this issue Feb 20, 2019
* Updated the performance fixture

* Added MbStringUtil as a wrapper around string functions to use mb_* functions only when necessary

* Added mutlibyte test

* Cleanup: Cleanup using Tidy extension code path was disabled a long time ago, pretty sure that it can be removed

* Removed the overhead of checking using strlen(), using strict string compare is about 30% faster

* Added strict types and reduced the call graph, increased the performance by about 3%
@jschroed91
Copy link
Member

Fixes from #81 are included in new release v0.1.9

@waltertamboer
Copy link
Author

Thanks! I'll try to test how these changes perform in our application asap.

@SavageTiger
Copy link
Contributor

Josh packagist still shows 0.1.8 as the latest version. Not sure why

@jschroed91
Copy link
Member

Hm, the connector must be outdated. Will update!

@jschroed91
Copy link
Member

@SavageTiger New release is now updated on packagist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants