Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imporving speed by changing wildcard search process from regular expression match to starts_with/ends_with match #890

Merged
merged 9 commits into from
Jan 23, 2023

Conversation

fukusuket
Copy link
Collaborator

@fukusuket fukusuket commented Jan 22, 2023

What Changed

Changed wildcard matching process from regular expression match to a kind of starts_with/ends_with match.

  • A wildcard search with the first (or last) character can be replaced with a search equivalent to starts_with (or ends_with) + to_lowercase(or uppercase)
  • In this PR, if the following conditions are met, changed it to a kind of starts_with(or ends_with) match + to_lowercase.
    • The first(end) character is an *(asterisk).
    • Match string contains only one *(asterisk).
    • All target strings are in the ASCII range.

If there are multiple asterisks or asterisks between, regular expression search will be performed as before.

Evidence

Environment

  • OS: Windows 10 Home edition
  • Hard: Memory 16GB , Core 8, SSD, laptop

Benchmark1

I ran a benchmark using this procedure(6.1GB evtx) and the results were as follows.

Version Elapsed time Memory(peak) Events with hits / Total events Output file size(bytes)
before 00:11:38.499 4.8 GiB 1,593,715 / 4,817,181 575085389
This PR 00:10:44.026 4.7 GiB 1,593,715 / 4,817,181 575085389

Console output

before

PS C:\tmp\hayabusa-2.1.0-win-64-bit> .\hayabusa.exe csv-timeline -d ..\hayabusa-big-evtx\ -o 1.csv --debug
Results Summary:

Events with hits / Total events: 1,593,715 / 4,817,181 (Data reduction: 3,223,466 events (66.92%))

Total | Unique detections: 1,627,284 | 150
Total | Unique critical detections: 0 (0.00%) | 0 (0.00%)
Total | Unique high detections: 12,044 (0.74%) | 20 (13.33%)
Total | Unique medium detections: 11,118 (0.68%) | 38 (25.33%)
Total | Unique low detections: 1,053,623 (64.75%) | 42 (28.00%)
Total | Unique informational detections: 550,499 (33.83%) | 50 (33.33%)
...
Saved file: 1.csv (575.1 MB)
Elapsed time: 00:11:38.499
Rule Parse Processing Time: 00:00:21.788
Analysis Processing Time: 00:10:55.151
Output Processing Time: 00:00:21.559

Memory usage stats:
heap stats:    peak      total      freed    current       unit      count
  reserved:    4.8 GiB    4.8 GiB   83.0 MiB    4.7 GiB
 committed:    4.4 GiB   54.0 GiB   49.7 GiB    4.2 GiB

This PR

PS C:\tmp\hayabusa-2.1.0-win-64-bit> .\hayabusa.exe csv-timeline -d ..\hayabusa-big-evtx\ -o 1.csv --debug
Results Summary:

Events with hits / Total events: 1,593,715 / 4,817,181 (Data reduction: 3,223,466 events (66.92%))

Total | Unique detections: 1,627,284 | 150
Total | Unique critical detections: 0 (0.00%) | 0 (0.00%)
Total | Unique high detections: 12,044 (0.74%) | 20 (13.33%)
Total | Unique medium detections: 11,118 (0.68%) | 38 (25.33%)
Total | Unique low detections: 1,053,623 (64.75%) | 42 (28.00%)
Total | Unique informational detections: 550,499 (33.83%) | 50 (33.33%)

...

Saved file: 1.csv (575.1 MB)
Elapsed time: 00:10:44.026
Rule Parse Processing Time: 00:00:20.191
Analysis Processing Time: 00:10:02.174
Output Processing Time: 00:00:21.658

Memory usage stats:
heap stats:    peak      total      freed    current       unit      count
  reserved:    4.7 GiB    4.7 GiB   83.0 MiB    4.6 GiB
 committed:    4.3 GiB   57.6 GiB   53.4 GiB    4.1 GiB

Benchmark2

I ran a benchmark using hayabusa-sample-evtx and the results were as follows.

Version Elapsed time Memory(peak) Events with hits / Total events Output file size(bytes)
before 00:00:16.653 1.9 GiB 19,606 / 47,458 16448088
This PR 00:00:16.158 1.8 GiB 19,606 / 47,458 16448088

I would appreciate it if you could review🙏

@fukusuket fukusuket changed the title Imporving speed by changing wildcard search process from regular expression match to starts_with/ends_with match` Imporving speed by changing wildcard search process from regular expression match to starts_with/ends_with match Jan 22, 2023
@fukusuket fukusuket self-assigned this Jan 22, 2023
@fukusuket fukusuket added the enhancement New feature or request label Jan 22, 2023
@fukusuket fukusuket changed the title Imporving speed by changing wildcard search process from regular expression match to starts_with/ends_with match Imporving speed by changing wildcard search process from regular expression match to starts_with/ends_with match Jan 22, 2023
@fukusuket fukusuket changed the title Imporving speed by changing wildcard search process from regular expression match to starts_with/ends_with match Imporving speed by changing wildcard search process from regular expression match to starts_with/ends_with match Jan 22, 2023
@fukusuket fukusuket changed the title Imporving speed by changing wildcard search process from regular expression match to starts_with/ends_with match Imporving speed by changing wildcard search process from regular expression match to starts_with/ends_with match Jan 22, 2023
@codecov
Copy link

codecov bot commented Jan 22, 2023

Codecov Report

Base: 71.07% // Head: 71.48% // Increases project coverage by +0.41% 🎉

Coverage data is based on head (1e30acf) compared to base (67244df).
Patch coverage: 98.61% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #890      +/-   ##
==========================================
+ Coverage   71.07%   71.48%   +0.41%     
==========================================
  Files          23       23              
  Lines       14056    14270     +214     
==========================================
+ Hits         9990    10201     +211     
- Misses       4066     4069       +3     
Impacted Files Coverage Δ
src/detections/rule/matchers.rs 96.59% <98.61%> (+0.26%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Collaborator

@hitenkoku hitenkoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fukusuket thank you for your PR.
LGTM

Copy link
Collaborator

@hitenkoku hitenkoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding your test.

I have comments, would you like check it?

src/detections/rule/matchers.rs Outdated Show resolved Hide resolved
src/detections/rule/matchers.rs Show resolved Hide resolved
@fukusuket
Copy link
Collaborator Author

@hitenkoku
Thank you for your review🙇 I fixed the test code you commented :)

Copy link
Collaborator

@hitenkoku hitenkoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your reaponse.
LGTM.

Copy link
Collaborator

@YamatoSecurity YamatoSecurity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ありがとうございます!
良いですね。こちらのベンチマークで約5%スピードアップしています。
26:30 -> 25:12
メモリ使用: 7.7GB -> 7.4GB

@YamatoSecurity YamatoSecurity merged commit 8826515 into main Jan 23, 2023
@fukusuket fukusuket deleted the imporve-speed-by-using-starts-ends-with branch January 23, 2023 03:14
@fukusuket
Copy link
Collaborator Author

ベンチマークありがとうございます🙇引き続き正規表現マッチ減らせるところ調査いたします!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants