Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving speed by adding string length match before regular expression match #883

Merged
merged 5 commits into from
Jan 18, 2023

Conversation

fukusuket
Copy link
Collaborator

@fukusuket fukusuket commented Jan 17, 2023

What Changed

Added string length match before regular expression match(regex matching will be skipped if the length is not same).

  • Because regular expression match on unmatched strings is especially slow.
  • But case-insensitive match is the default behavior in Sigma, so exact string match cannot be used.
    • Therefore, this PR changes it to use string length match instead of exact string match.
    • (For fields with pipes or wildcards, only regular expression match is applied as before)

Evidence

Environment

  • OS: Windows 10 Home edition
  • Hard: Memory 16GB , Core 8, SSD, laptop

Benchmark1

I ran a benchmark using this procedure(6.1GB evtx) and the results were as follows.

Version Elapsed time Memory(peak) Events with hits / Total events Output file size(bytes)
before 00:13:27.170 5.0 GiB 1,593,715 / 4,817,181 575085389
This PR 00:12:08.121 5.0 GiB 1,593,715 / 4,817,181 575085389

Console output

before

PS C:\tmp\hayabusa-2.1.0-win-64-bit> .\hayabusa.exe csv-timeline -d ..\hayabusa-big-evtx\ -o 1.csv --debug
...
Results Summary:

Events with hits / Total events: 1,593,715 / 4,817,181 (Data reduction: 3,223,466 events (66.92%))

Total | Unique detections: 1,627,284 | 150
Total | Unique critical detections: 0 (0.00%) | 0 (0.00%)
Total | Unique high detections: 12,044 (0.74%) | 20 (13.33%)
Total | Unique medium detections: 11,118 (0.68%) | 38 (25.33%)
Total | Unique low detections: 1,053,623 (64.75%) | 42 (28.00%)
Total | Unique informational detections: 550,499 (33.83%) | 50 (33.33%)
...
Saved file: 1.csv (575.1 MB)
Elapsed time: 00:13:27.170
Rule Parse Processing Time: 00:00:20.221
Analysis Processing Time: 00:12:43.367
Output Processing Time: 00:00:23.581

Memory usage stats:
heap stats:    peak      total      freed    current       unit      count
  reserved:    5.0 GiB    5.0 GiB   83.0 MiB    4.9 GiB
 committed:    4.6 GiB   56.5 GiB   52.0 GiB    4.5 GiB

This PR

PS C:\tmp\hayabusa-2.1.0-win-64-bit> .\hayabusa.exe csv-timeline -d ..\hayabusa-big-evtx\ -o 1.csv --debug
...
Results Summary:

Events with hits / Total events: 1,593,715 / 4,817,181 (Data reduction: 3,223,466 events (66.92%))

Total | Unique detections: 1,627,284 | 150
Total | Unique critical detections: 0 (0.00%) | 0 (0.00%)
Total | Unique high detections: 12,044 (0.74%) | 20 (13.33%)
Total | Unique medium detections: 11,118 (0.68%) | 38 (25.33%)
Total | Unique low detections: 1,053,623 (64.75%) | 42 (28.00%)
Total | Unique informational detections: 550,499 (33.83%) | 50 (33.33%)
...
Saved file: 1.csv (575.1 MB)
Elapsed time: 00:12:08.121
Rule Parse Processing Time: 00:00:20.157
Analysis Processing Time: 00:11:26.545
Output Processing Time: 00:00:21.417

Memory usage stats:
heap stats:    peak      total      freed    current       unit      count
  reserved:    5.0 GiB    5.0 GiB   83.0 MiB    4.9 GiB
 committed:    4.6 GiB   56.7 GiB   52.2 GiB    4.5 GiB

Benchmark2

I ran a benchmark using hayabusa-sample-evtx and the results were as follows.

Version Elapsed time Memory(peak) Events with hits / Total events Output file size(bytes)
before 00:00:17.496 2.1 GiB 19,606 / 47,458 16448088
This PR 00:00:17.135 2.1 GiB 19,606 / 47,458 16448088

I would appreciate it if you could review🙏

@codecov
Copy link

codecov bot commented Jan 17, 2023

Codecov Report

Base: 69.59% // Head: 69.67% // Increases project coverage by +0.07% 🎉

Coverage data is based on head (0cbd4a2) compared to base (fa4dbeb).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #883      +/-   ##
==========================================
+ Coverage   69.59%   69.67%   +0.07%     
==========================================
  Files          23       23              
  Lines       13719    13716       -3     
==========================================
+ Hits         9548     9556       +8     
+ Misses       4171     4160      -11     
Impacted Files Coverage Δ
src/detections/configs.rs 49.92% <ø> (+0.57%) ⬆️
src/detections/rule/matchers.rs 96.33% <100.00%> (+0.02%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@hitenkoku
Copy link
Collaborator

@fukusuket thank you for your PR.
I have some suggestions. I will review so. please wait

Copy link
Collaborator

@hitenkoku hitenkoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fukusuket Thank you very much for your patience.
I have sent one suggestion.
I would appreciate it if you could check it out.

src/detections/rule/matchers.rs Outdated Show resolved Hide resolved
@hitenkoku hitenkoku added the enhancement New feature or request label Jan 18, 2023
fukusuket and others added 2 commits January 18, 2023 09:29
Co-authored-by: DustInDark <2350416+hitenkoku@users.noreply.github.com>
@hitenkoku hitenkoku self-requested a review January 18, 2023 00:35
Copy link
Collaborator

@hitenkoku hitenkoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your fixing.
LGTM

@YamatoSecurity YamatoSecurity added this to the v2.2.0 milestone Jan 18, 2023
Copy link
Collaborator

@YamatoSecurity YamatoSecurity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified that it detects all of the same alerts.
Benchmark against 14 GB data:
current main branch: 32 minutes (peak memory 7.9GB)
PR: 27 minutes 53 seconds (peak memory 7.7 GB)

13.7% speed increase! Great performance optimization!

@fukusuket
Copy link
Collaborator Author

Thank you for your review :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants