Skip to content

A few optimizations#280

Merged
abrignoni merged 7 commits into
abrignoni:masterfrom
bconstanzo:master
Aug 16, 2022
Merged

A few optimizations#280
abrignoni merged 7 commits into
abrignoni:masterfrom
bconstanzo:master

Conversation

@bconstanzo

@bconstanzo bconstanzo commented Aug 16, 2022

Copy link
Copy Markdown
Contributor

I've been profiling and testing a few changes that have a major impact on the Windows performance of ALEAPP.

The reported time goes down by about 3x on my benchmarks. The changes don't affect behavior as far as I could test, the results are consistent between runs, it just goes faster.

Mainly this is achived by placing a few caches in place (functools.lru_cache with maxsize set to None), and avoiding duplicating work (fnmatch.fnmatch usage is changed by a "deconstructed" version of it). Tried to keep the code as simple and readable as it could be, while netting some nice speedups.

There's also a small bit that covers specifically snappy decompression, where I managed to improve the algorithm just a bit. Enough to get a 4-5% performance increase, without resorting to overly complicated code.

Methodology:

  • Ran ALEAPP against Josh Hickman's Android 12 test image.
  • Checked reported run time.
  • Profiled everything and analyzed the call graphs and functions (with cProfile, gprof2dot, and the amazing line_profiler).
  • Got millions of files and hundreds of gigabytes of storage space used up on my disk.
  • Found out a few spots where things could be improved.
  • Rinse and repeat.

This substantially speeds up the program under Windows (about twice as fast) without any changes to the behavior and results.
…epeteadly

This brings a ~40% speedup* on top of the previous commit.

* note I'm basing this timings no cProfile runs, but it holds quite nicely on "normal" runs
This changes how things are processed and speeds things up a bit, however now shutil.copy2() is taking a significant % of time of this function, probably due to some of the newer artifacts
Helps another ~10% or so with running times (because we're normcasing every filepath over and over again for every artifact)
makes this function about twice as fast

replaced a for that wrote one byte at a time with a bit write plus a condition for the (rare) case where you're writing past the end and you have to repeat part of the written-out output
I checked fnmatch.filter() code on the standard library, and then went with the same style that is used in the other seekers. Basically fnmatch.filter() does the same as the other seekers were doing already, so it's the same (haven't tested it though)
@jijames

jijames commented Aug 16, 2022

Copy link
Copy Markdown
Contributor

@bconstanzo do you have the tests you used for this? Just for reference.
@abrignoni looks great.

Real results without usagestats on Linux:
Before
real 0m24.984s
user 0m9.424s
sys 0m0.541s

After
real 0m22.285s
user 0m9.364s
sys 0m0.469s

@abrignoni abrignoni merged commit f7541e8 into abrignoni:master Aug 16, 2022
@bconstanzo

Copy link
Copy Markdown
Contributor Author

Under Windows 10 Home 64 bit, I set up a virtual env with Python 3.10.3 and all the dependencies from the requirements.txt file.

Then it was just cloning ALEAPP on sunday night, and ran:

python aleapp.py -t tar -i path_to_image -o path_for_reports

I have a Ryzen 4600H, 24GB RAM and ran against a SATA HDD just for going with the worst case scenario, though I also tested against an nvme drive and it ran just as fast.

For profiling I'd run it as python -m cProfile -o profile_file.pstats aleap.py... or add the @Profile decorator and go with a kernprof -lv aleapp.py ....

The timings and speedups I commented about in the commits are based on what I saw during the profile runs, which under cProfile were running about half as fast as it'd run normally, and the reported time by the tool.

Right now I just benchmarked with a very simple script:

import time
import subprocess

t0 = time.perf_counter()
p = subprocess.run(
  r'python aleapp.py -t tar -i "D:\Test\xLEAPP\test_data\Magnet Acquire\Android 12 - Data.tar" -o "D:\Test\xLEAPP\output\aleapp-magnet"',
  shell=True
)
t1 = time.perf_counter()

print()
print()
print(f"Time taken: {t1 - t0}")

And it gave me 477 seconds for a clone of ALEAPP (which the tool reported as 6 minutes and 34 seconds) and 202 seconds for the patched version (which the tool reported as 2 minutes sharp). That is just shy of 2.4x faster, and there clearly is something off with the reported time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants