-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A few optimizations #280
A few optimizations #280
Conversation
This substantially speeds up the program under Windows (about twice as fast) without any changes to the behavior and results.
…epeteadly This brings a ~40% speedup* on top of the previous commit. * note I'm basing this timings no cProfile runs, but it holds quite nicely on "normal" runs
This changes how things are processed and speeds things up a bit, however now shutil.copy2() is taking a significant % of time of this function, probably due to some of the newer artifacts
Helps another ~10% or so with running times (because we're normcasing every filepath over and over again for every artifact)
makes this function about twice as fast replaced a for that wrote one byte at a time with a bit write plus a condition for the (rare) case where you're writing past the end and you have to repeat part of the written-out output
I checked fnmatch.filter() code on the standard library, and then went with the same style that is used in the other seekers. Basically fnmatch.filter() does the same as the other seekers were doing already, so it's the same (haven't tested it though)
@bconstanzo do you have the tests you used for this? Just for reference. Real results without usagestats on Linux: After |
Under Windows 10 Home 64 bit, I set up a virtual env with Python 3.10.3 and all the dependencies from the requirements.txt file. Then it was just cloning ALEAPP on sunday night, and ran:
I have a Ryzen 4600H, 24GB RAM and ran against a SATA HDD just for going with the worst case scenario, though I also tested against an nvme drive and it ran just as fast. For profiling I'd run it as The timings and speedups I commented about in the commits are based on what I saw during the profile runs, which under cProfile were running about half as fast as it'd run normally, and the reported time by the tool. Right now I just benchmarked with a very simple script:
And it gave me 477 seconds for a clone of ALEAPP (which the tool reported as 6 minutes and 34 seconds) and 202 seconds for the patched version (which the tool reported as 2 minutes sharp). That is just shy of 2.4x faster, and there clearly is something off with the reported time. |
I've been profiling and testing a few changes that have a major impact on the Windows performance of ALEAPP.
The reported time goes down by about 3x on my benchmarks. The changes don't affect behavior as far as I could test, the results are consistent between runs, it just goes faster.
Mainly this is achived by placing a few caches in place (functools.lru_cache with maxsize set to None), and avoiding duplicating work (fnmatch.fnmatch usage is changed by a "deconstructed" version of it). Tried to keep the code as simple and readable as it could be, while netting some nice speedups.
There's also a small bit that covers specifically snappy decompression, where I managed to improve the algorithm just a bit. Enough to get a 4-5% performance increase, without resorting to overly complicated code.
Methodology: