Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add performance benchmark #181

Merged
merged 1 commit into from
May 28, 2019

Conversation

dgzlopes
Copy link
Contributor

Signed-off-by: Daniel González Lopes danielgonzalezlopes@gmail.com

Added performance benchmark for the scan stage as said on #130. Is the startTimeMeasurement() on the proper place?

On the other side, I had to use an external monotonic [0] package as monotonic it's only avaliable on Python 3.3+ [1].

[0] https://github.com/atdt/monotonic
[1] https://docs.python.org/3/library/time.html

Copy link
Collaborator

@KevinHock KevinHock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

What are your thoughts @domanchi? Figured you might not like extra-dependency (monotonic), but we can't drop Python 2 until a future point in time.

@KevinHock KevinHock requested a review from domanchi May 21, 2019 23:49
@domanchi
Copy link
Contributor

@dgzlopes Is there a reason why this needs to be integrated into the main package?

I'm thinking an adhoc script should suffice, because this is more for package testing, rather than end user consumption.

@KevinHock
Copy link
Collaborator

KevinHock commented May 21, 2019

an adhoc script should suffice

Makes sense, I'd agree with that.

@dgzlopes
Copy link
Contributor Author

dgzlopes commented May 22, 2019

@domanchi I thought that maybe integrating the base benchmarking functionality, would make easier to add in the future the per plugin/file benchmarks proposed in #130 (comment). Also for the users, may be interesting to know the time elapsed if they run detect-secrets on an automated way.

On the other hand, following the ad-hoc way I understand that It would be a small script that starts the timer, runs detect-secrets and then returns the elapsed time. But if we just want this, maybe it is better to follow the Unix principles and rely on the time command [0] (or in windows the Measure-Command) so there is no need for some project-specific script.

[0] https://linux.die.net/man/1/time

@domanchi
Copy link
Contributor

We could do that too.

When I think of performance bench-marking, I think of the following questions:

  • Which file takes the longest to run?
  • Which plugin is the slowest?
  • Have these timings changed drastically after my changes?
  • What does a "good" timing look like?

There's no caching, so speed should remain relatively consistent between consecutive runs. Furthermore, because this tool is somewhat fast, I doubt there will be much information if we do a complete file speed breakdown (because the majority of files are small, and therefore, won't produce any reasonable timings to work with).

With these questions in mind, these are the objectives I envision for minimal performance bench-marking:

  • Able to measure how long a single run takes, per plugin
  • Able to customize which files to test (so we can easily test for known edge cases)
  • Able to integrate in "manual" testing flow for other developers (automated tests won't work due to its heuristic nature)

scripts/ would be a good place for putting this.

With these objectives, we could do the time command (and write shell script to iterate over all plugins, and disable all of them in turn), or we can still use this library in a more adhoc manner (without introducing it into the core code).

Apologies for not being clearer in the original issue. Hopefully this provides more direction for this added testing capabilities. Thanks for helping empower the community to be able to make better changes by having a scientific way to measure the impact on performance!

@dgzlopes
Copy link
Contributor Author

dgzlopes commented May 24, 2019

First, thanks for the time and the nice writeup @domanchi ! Now I know I was missing the point with the first PR. I have updated the PR today adding a new benchmark script (and cleaned the past commits) and would love If you have some time for giving it a spin. I want to note that this is just an update and It's still a WIP (missing some testing, comments and I'm sure that more things!) and for the moment only basic functionality is included:

  • Running just python bech.py runs the benchmark inside de detect-secrets project root. Any detect-secrets compatible argument is accepted (from files to flags) too e.g. python bech.py ../../powerfulseal --all-files or python bech.py myfilename.js.
  • It's able to benchmark with all plugins at the same time. Also, It's able to benchmark one plugin at a time (with the other ones excluded).
  • Takes the PluginOptions disable_flag_text value from core.usage so we don't have to maintain another list of plugins.
  • Uses monotonic for the elapsed time.
  • Should work with all the Python versions included in tox.ini using the requirements-dev! But as a note, I don't really like the subprocess32 dependency (even if it's just on the developer environment) so this might change.
  • As an example, this is the output for python bench.py:
vagrant@ubuntu-xenial:~/detect-secrets/scripts$ python bench.py
Scanning: ['../.']
------------------------------------------
benchmark                           time
------------------------------------------
all-plugins                  3.779828449s
hex-string-scan              1.293300869s
base64-string-scan           1.832755461s
private-key-scan          0.987097253001s
basic-auth-scan              0.772286352s
keyword-scan                 1.026593331s
aws-key-scan                 0.800705507s
slack-scan                   0.721169404s

Copy link
Contributor

@domanchi domanchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking a lot better!

I like your output style, and general output. Couple of bugs and stylistic changes, but almost good for a v1! We'd be happy to run this on our internal data to get an average current performance when this is merged.

scripts/bench.py Outdated Show resolved Hide resolved
scripts/bench.py Outdated Show resolved Hide resolved
scripts/bench.py Outdated Show resolved Hide resolved
scripts/bench.py Outdated Show resolved Hide resolved
scripts/bench.py Outdated Show resolved Hide resolved
scripts/bench.py Outdated Show resolved Hide resolved
scripts/bench.py Outdated Show resolved Hide resolved
scripts/bench.py Outdated Show resolved Hide resolved
Signed-off-by: Daniel González Lopes <danielgonzalezlopes@gmail.com>

Fix Spacing

Signed-off-by: Daniel González Lopes <danielgonzalezlopes@gmail.com>

Add benchmark script

Signed-off-by: Daniel González Lopes <danielgonzalezlopes@gmail.com>

Fix performance benchmark

Signed-off-by: Daniel González Lopes <danielgonzalezlopes@gmail.com>

Fix val len

Signed-off-by: Daniel González Lopes <danielgonzalezlopes@gmail.com>
@dgzlopes dgzlopes force-pushed the add-130-performance-benchmark branch from d5a54c2 to 6ed2b1d Compare May 28, 2019 13:10
@dgzlopes
Copy link
Contributor Author

dgzlopes commented May 28, 2019

Squashed all the commits for a cleaner git log! 😊

@domanchi domanchi merged commit effc21f into Yelp:master May 28, 2019
@dgzlopes dgzlopes deleted the add-130-performance-benchmark branch May 29, 2019 08:59
killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request May 28, 2020
* Use GHDetectorV2

Supports git-defenders/detect-secrets-discuss#166

* Fix pre-commit (hopefully)
killuazhu pushed a commit to IBM/detect-secrets that referenced this pull request Jul 9, 2020
* Use GHDetectorV2

Supports git-defenders/detect-secrets-discuss#166

* Fix pre-commit (hopefully)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants