Add performance benchmark #181

dgzlopes · 2019-05-20T12:34:09Z

Signed-off-by: Daniel González Lopes danielgonzalezlopes@gmail.com

Added performance benchmark for the scan stage as said on #130. Is the startTimeMeasurement() on the proper place?

On the other side, I had to use an external monotonic [0] package as monotonic it's only avaliable on Python 3.3+ [1].

[0] https://github.com/atdt/monotonic
[1] https://docs.python.org/3/library/time.html

KevinHock

lgtm

What are your thoughts @domanchi? Figured you might not like extra-dependency (monotonic), but we can't drop Python 2 until a future point in time.

domanchi · 2019-05-21T23:53:16Z

@dgzlopes Is there a reason why this needs to be integrated into the main package?

I'm thinking an adhoc script should suffice, because this is more for package testing, rather than end user consumption.

KevinHock · 2019-05-21T23:55:03Z

an adhoc script should suffice

Makes sense, I'd agree with that.

dgzlopes · 2019-05-22T13:37:07Z

@domanchi I thought that maybe integrating the base benchmarking functionality, would make easier to add in the future the per plugin/file benchmarks proposed in #130 (comment). Also for the users, may be interesting to know the time elapsed if they run detect-secrets on an automated way.

On the other hand, following the ad-hoc way I understand that It would be a small script that starts the timer, runs detect-secrets and then returns the elapsed time. But if we just want this, maybe it is better to follow the Unix principles and rely on the time command [0] (or in windows the Measure-Command) so there is no need for some project-specific script.

[0] https://linux.die.net/man/1/time

domanchi · 2019-05-22T15:46:58Z

We could do that too.

When I think of performance bench-marking, I think of the following questions:

Which file takes the longest to run?
Which plugin is the slowest?
Have these timings changed drastically after my changes?
What does a "good" timing look like?

There's no caching, so speed should remain relatively consistent between consecutive runs. Furthermore, because this tool is somewhat fast, I doubt there will be much information if we do a complete file speed breakdown (because the majority of files are small, and therefore, won't produce any reasonable timings to work with).

With these questions in mind, these are the objectives I envision for minimal performance bench-marking:

Able to measure how long a single run takes, per plugin
Able to customize which files to test (so we can easily test for known edge cases)
Able to integrate in "manual" testing flow for other developers (automated tests won't work due to its heuristic nature)

scripts/ would be a good place for putting this.

With these objectives, we could do the time command (and write shell script to iterate over all plugins, and disable all of them in turn), or we can still use this library in a more adhoc manner (without introducing it into the core code).

Apologies for not being clearer in the original issue. Hopefully this provides more direction for this added testing capabilities. Thanks for helping empower the community to be able to make better changes by having a scientific way to measure the impact on performance!

dgzlopes · 2019-05-24T13:31:54Z

First, thanks for the time and the nice writeup @domanchi ! Now I know I was missing the point with the first PR. I have updated the PR today adding a new benchmark script (and cleaned the past commits) and would love If you have some time for giving it a spin. I want to note that this is just an update and It's still a WIP (missing some testing, comments and I'm sure that more things!) and for the moment only basic functionality is included:

Running just python bech.py runs the benchmark inside de detect-secrets project root. Any detect-secrets compatible argument is accepted (from files to flags) too e.g. python bech.py ../../powerfulseal --all-files or python bech.py myfilename.js.
It's able to benchmark with all plugins at the same time. Also, It's able to benchmark one plugin at a time (with the other ones excluded).
Takes the PluginOptions disable_flag_text value from core.usage so we don't have to maintain another list of plugins.
Uses monotonic for the elapsed time.
Should work with all the Python versions included in tox.ini using the requirements-dev! But as a note, I don't really like the subprocess32 dependency (even if it's just on the developer environment) so this might change.
As an example, this is the output for python bench.py:

vagrant@ubuntu-xenial:~/detect-secrets/scripts$ python bench.py
Scanning: ['../.']
------------------------------------------
benchmark                           time
------------------------------------------
all-plugins                  3.779828449s
hex-string-scan              1.293300869s
base64-string-scan           1.832755461s
private-key-scan          0.987097253001s
basic-auth-scan              0.772286352s
keyword-scan                 1.026593331s
aws-key-scan                 0.800705507s
slack-scan                   0.721169404s

domanchi

Looking a lot better!

I like your output style, and general output. Couple of bugs and stylistic changes, but almost good for a v1! We'd be happy to run this on our internal data to get an average current performance when this is merged.

scripts/bench.py

Signed-off-by: Daniel González Lopes <danielgonzalezlopes@gmail.com> Fix Spacing Signed-off-by: Daniel González Lopes <danielgonzalezlopes@gmail.com> Add benchmark script Signed-off-by: Daniel González Lopes <danielgonzalezlopes@gmail.com> Fix performance benchmark Signed-off-by: Daniel González Lopes <danielgonzalezlopes@gmail.com> Fix val len Signed-off-by: Daniel González Lopes <danielgonzalezlopes@gmail.com>

dgzlopes · 2019-05-28T13:11:22Z

Squashed all the commits for a cleaner git log! 😊

* Use GHDetectorV2 Supports git-defenders/detect-secrets-discuss#166 * Fix pre-commit (hopefully)

KevinHock approved these changes May 21, 2019

View reviewed changes

KevinHock requested a review from domanchi May 21, 2019 23:49

dgzlopes mentioned this pull request May 22, 2019

Running scan in --silent mode #185

Closed

domanchi reviewed May 24, 2019

View reviewed changes

KevinHock reviewed May 24, 2019

View reviewed changes

scripts/bench.py Outdated Show resolved Hide resolved

dgzlopes force-pushed the add-130-performance-benchmark branch from d5a54c2 to 6ed2b1d Compare May 28, 2019 13:10

domanchi approved these changes May 28, 2019

View reviewed changes

domanchi merged commit effc21f into Yelp:master May 28, 2019

domanchi mentioned this pull request May 28, 2019

Benchmark script improvements #186

Merged

dgzlopes deleted the add-130-performance-benchmark branch May 29, 2019 08:59

killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request May 28, 2020

Use GHDetectorV2 (Yelp#181)

458defd

* Use GHDetectorV2 Supports git-defenders/detect-secrets-discuss#166 * Fix pre-commit (hopefully)

killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request Jul 9, 2020

Use GHDetectorV2 (Yelp#181)

8ee2528

* Use GHDetectorV2 Supports git-defenders/detect-secrets-discuss#166 * Fix pre-commit (hopefully)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add performance benchmark #181

Add performance benchmark #181

dgzlopes commented May 20, 2019

KevinHock left a comment

domanchi commented May 21, 2019

KevinHock commented May 21, 2019 •

edited

Loading

dgzlopes commented May 22, 2019 •

edited

Loading

domanchi commented May 22, 2019

dgzlopes commented May 24, 2019 •

edited

Loading

domanchi left a comment

dgzlopes commented May 28, 2019 •

edited

Loading

Add performance benchmark #181

Add performance benchmark #181

Conversation

dgzlopes commented May 20, 2019

KevinHock left a comment

Choose a reason for hiding this comment

domanchi commented May 21, 2019

KevinHock commented May 21, 2019 • edited Loading

dgzlopes commented May 22, 2019 • edited Loading

domanchi commented May 22, 2019

dgzlopes commented May 24, 2019 • edited Loading

domanchi left a comment

Choose a reason for hiding this comment

dgzlopes commented May 28, 2019 • edited Loading

KevinHock commented May 21, 2019 •

edited

Loading

dgzlopes commented May 22, 2019 •

edited

Loading

dgzlopes commented May 24, 2019 •

edited

Loading

dgzlopes commented May 28, 2019 •

edited

Loading