Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving speed by changing EventID matching process from regular expression to exact string matching #882

Merged
merged 6 commits into from
Jan 17, 2023

Conversation

fukusuket
Copy link
Collaborator

@fukusuket fukusuket commented Jan 16, 2023

Currently all rule string matching processes use regular expression match.
However, I think that exact string match is sufficient for matching process of EventID, except for rare cases like * or ? 🤔 (Just to be sure, I checked that there were no rules in the Sigma repository with non-numeric valueEventID fields)

What Changed

  • For speed, the EventID matching process has been changed to use exact string matching in this PR.
    • Except when the rule EventID field can't be converted to a number.
    • If the EventID number conversion fails, use the regular expression match, as before.

Evidence

Environment

  • OS: Windows 10 Home edition
  • Hard: Memory 16GB , Core 8, SSD, laptop

Benchmark

I ran a benchmark using this procedure(6.1GB evtx) and the results were as follows.

Version Elapsed time Memory(peak) Events with hits / Total events Output file size(bytes)
main 00:13:51.598 5.1 GiB 1,593,715 / 4,817,181 575085389
This PR 00:13:27.170 5.0 GiB 1,593,715 / 4,817,181 575085389

Console output

main

PS C:\tmp\hayabusa-2.1.0-win-64-bit> .\hayabusa-main.exe csv-timeline -d C:\tmp\hayabusa-big-evtx\ -o 1.csv --debug
Results Summary:

Events with hits / Total events: 1,593,715 / 4,817,181 (Data reduction: 3,223,466 events (66.92%))

Total | Unique detections: 1,627,284 | 150
Total | Unique critical detections: 0 (0.00%) | 0 (0.00%)
Total | Unique high detections: 12,044 (0.74%) | 20 (13.33%)
Total | Unique medium detections: 11,118 (0.68%) | 38 (25.33%)
Total | Unique low detections: 1,053,623 (64.75%) | 42 (28.00%)
Total | Unique informational detections: 550,499 (33.83%) | 50 (33.33%)
...
Saved file: 1.csv (575.1 MB)
Elapsed time: 00:13:51.598
Rule Parse Processing Time: 00:00:21.475
Analysis Processing Time: 00:13:08.496
Output Processing Time: 00:00:21.624

Memory usage stats:
heap stats:    peak      total      freed    current       unit      count
  reserved:    5.1 GiB    5.1 GiB   83.0 MiB    5.0 GiB
 committed:    4.7 GiB   56.7 GiB   52.2 GiB    4.5 GiB

This PR

PS C:\tmp\hayabusa-2.1.0-win-64-bit> .\hayabusa-new.exe csv-timeline -d C:\tmp\hayabusa-big-evtx\ -o 1.csv --debug
Results Summary:

Events with hits / Total events: 1,593,715 / 4,817,181 (Data reduction: 3,223,466 events (66.92%))

Total | Unique detections: 1,627,284 | 150
Total | Unique critical detections: 0 (0.00%) | 0 (0.00%)
Total | Unique high detections: 12,044 (0.74%) | 20 (13.33%)
Total | Unique medium detections: 11,118 (0.68%) | 38 (25.33%)
Total | Unique low detections: 1,053,623 (64.75%) | 42 (28.00%)
Total | Unique informational detections: 550,499 (33.83%) | 50 (33.33%)
...
Saved file: 1.csv (575.1 MB)
Elapsed time: 00:13:27.170
Rule Parse Processing Time: 00:00:20.221
Analysis Processing Time: 00:12:43.367
Output Processing Time: 00:00:23.581

Memory usage stats:
heap stats:    peak      total      freed    current       unit      count
  reserved:    5.0 GiB    5.0 GiB   83.0 MiB    4.9 GiB
 committed:    4.6 GiB   56.5 GiB   52.0 GiB    4.5 GiB

I would appreciate it if you could review🙏

Copy link
Collaborator

@hitenkoku hitenkoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your pull request.
LGTM

I'm sorry. I checked the test section and had some questions, so I had to comment.

I would appreciate your response to my question.

@fukusuket fukusuket changed the title Improving speed by changing EventID matching processing from regular expression to exact string matching Improving speed by changing EventID matching process from regular expression to exact string matching Jan 16, 2023
Copy link
Collaborator

@hitenkoku hitenkoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry. I checked the test section and had some questions, so I had to comment.

I would appreciate your response to my question.

src/detections/rule/selectionnodes.rs Show resolved Hide resolved
@hitenkoku hitenkoku added the enhancement New feature or request label Jan 16, 2023
Copy link
Collaborator

@YamatoSecurity YamatoSecurity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fukusuket Great find! I think it is better to do exact string matches by default and fall back to regular expression support if needed like in your PR. (Although it is probably not needed we might as well support it just in case.)

@hitenkoku hitenkoku self-requested a review January 17, 2023 00:16
@codecov
Copy link

codecov bot commented Jan 17, 2023

Codecov Report

Base: 69.49% // Head: 69.59% // Increases project coverage by +0.10% 🎉

Coverage data is based on head (1da20a2) compared to base (a4f4adc).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #882      +/-   ##
==========================================
+ Coverage   69.49%   69.59%   +0.10%     
==========================================
  Files          23       23              
  Lines       13674    13719      +45     
==========================================
+ Hits         9503     9548      +45     
  Misses       4171     4171              
Impacted Files Coverage Δ
src/detections/rule/selectionnodes.rs 90.49% <100.00%> (+0.92%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@YamatoSecurity
Copy link
Collaborator

@fukusuket Sorry, i initially just want to change JQ to jq in the readme but then after checking changed a few things more in the JQ doc. as I want to fix it soon. I'll merge this PR if it is ready now?

@fukusuket
Copy link
Collaborator Author

fukusuket commented Jan 17, 2023

There are currently no rules in the Sigma repository with the following conditions:

  • a non-numeric value in the EventID field
  • EvendID key with pipes(startswith/endswith/contains/re)

and it seems that unlikely to be created(at this time).

Therefore, in this PR, performance is prioritized and the following specifications are used.

  • numeric value in the EventID field use exact string match
  • * and ? in the EventID field use regular expression match

@fukusuket
Copy link
Collaborator Author

Thank you for your review🙇 Yes, it's okay to merge this PR :)

@YamatoSecurity YamatoSecurity merged commit fa4dbeb into main Jan 17, 2023
@fukusuket fukusuket deleted the improve-detection-speed-by-exact-eventid-match branch January 17, 2023 01:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants