Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: use slice instead of replacen in ConditionCompiler.tokenize() #1287

Merged
merged 2 commits into from
Feb 28, 2024

Conversation

fukusuket
Copy link
Collaborator

@fukusuket fukusuket commented Feb 27, 2024

What Changed

Evidence

Environment

  • OS: macOS Sonoma version 14.2.1
  • Hayabusa v2.14.0-dev
  • rustc 1.76.0

I confirmed that there are no differences in the resulting CSV as shown below before after fix.

% ./hayabusa-new csv-timeline -d ../hayabusa-sample-evtx -o new.csv -w --debug -C -q
% ./hayabusa-main csv-timeline -d ../hayabusa-sample-evtx -o main.csv -w --debug -C -q
% diff main.csv new.csv
%

I also confirmed that there were no rule parsing errors and that the number of rules(4077) loaded was the same.

main

% ./hayabusa-main csv-timeline -d ../hayabusa-sample-evtx -o main.csv -w --debug -C -q
Start time: 2024/02/27 23:41

Total event log files: 583
Total file size: 137.1 MB

Loading detection rules. Please wait.

Excluded rules: 26
Noisy rules: 12 (Disabled)

Deprecated rules: 202 (4.95%) (Disabled)
Experimental rules: 1091 (26.76%)
Stable rules: 240 (5.89%)
Test rules: 2746 (67.35%)
Unsupported rules: 45 (1.10%) (Disabled)

Hayabusa rules: 162
Sigma rules: 3915
Total enabled detection rules: 4077

ThisPR

% ./hayabusa-new csv-timeline -d ../hayabusa-sample-evtx -o new.csv -w --debug -C -q
Start time: 2024/02/27 23:41

Total event log files: 583
Total file size: 137.1 MB

Loading detection rules. Please wait.

Excluded rules: 26
Noisy rules: 12 (Disabled)

Deprecated rules: 202 (4.95%) (Disabled)
Experimental rules: 1091 (26.76%)
Stable rules: 240 (5.89%)
Test rules: 2746 (67.35%)
Unsupported rules: 45 (1.10%) (Disabled)

Hayabusa rules: 162
Sigma rules: 3915
Total enabled detection rules: 4077

I would appreciate it if you could check it out when you have time🙏

@fukusuket fukusuket self-assigned this Feb 27, 2024
@fukusuket fukusuket added the enhancement New feature or request label Feb 27, 2024
@fukusuket fukusuket added this to the v2.14.0 milestone Feb 27, 2024
@fukusuket fukusuket changed the title perf: use slice instead of replacen in ConditionCompiler.tokenize() perf: use slice instead of replacen in ConditionCompiler.tokenize() Feb 27, 2024
Copy link

codecov bot commented Feb 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.23%. Comparing base (f1b3cd6) to head (a7a6332).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1287   +/-   ##
=======================================
  Coverage   81.23%   81.23%           
=======================================
  Files          27       27           
  Lines       24407    24407           
=======================================
  Hits        19828    19828           
  Misses       4579     4579           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@hitenkoku hitenkoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@YamatoSecurity
Copy link
Collaborator

@fukusuket I am taking benchmarks but on my mac it seems like it may use more memory:
./target/release/hayabusa csv-timeline -d ../hayabusa-sample-evtx --debug -o test.csv -D -u -w -C

baseline (hayabusa main):
941.7 MiB
941.4 MiB
936.2 MiB
918.6 MiB
919.2 MiB

1277 PR
943.1 MiB
941.9 MiB
940.1 MiB
940.7 MiB
945.1 MiB

I am checking rss usage. I will check a bigger data sample on windows and check the difference.

@fukusuket
Copy link
Collaborator Author

fukusuket commented Feb 28, 2024

@YamatoSecurity
Thank you so much for benchmarking :) I see...
The memory usage reduction effect of this PR may be very small, so it may be difficult to measure the effect in --debug 🤔

This PR only improves memory usage when loading rules and tokenizing conditions string.
For example, in the following rule, when reading the following condition string.
https://github.com/Yamato-Security/hayabusa-rules/blob/0a254ccacaa108a4f634a32f10e0aebe6ce9e3b2/sigma/builtin/application/Other/win_av_relevant_match.yml#L102

The effect of reducing memory usage does not depend on the amount of logs because it only affects the processing before scanning the logs. (only affects the process of compiling rules)

If you increase the number of rules (execute many condition string tokenize processes) and minimize the scan log, you may be able to see a little more memory reduction effect...?

@fukusuket
Copy link
Collaborator Author

@hach1yon @hitenkoku
I'm sorry if I've misunderstood or if the implementation is bad, I'd appreciate it if you could point it out🙏

Copy link
Collaborator

@YamatoSecurity YamatoSecurity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fukusuket Do you think this would affect processing speed? I tested on Windows with bigger logs and the processing time went from 40:37 to 34:57 so it seems that it is faster and no difference in results so LGTM!

@YamatoSecurity
Copy link
Collaborator

I just retested the 2.14.0 main baseline and was able to scan in 35:10 so maybe not such a difference in speed as well.. but there is no regression so LGTM.

@fukusuket
Copy link
Collaborator Author

fukusuket commented Feb 28, 2024

@YamatoSecurity Thank you so much for retesting 🙇

Since the rule compilation process is executed less frequently (compared to scanning process for each log),I think the speedup is relatively small. However, if similar modifications(Avoid creating new String instances unnecessarily) can be applied to the scanning process for each log, I think the effect will be greater.

@YamatoSecurity
Copy link
Collaborator

I see. Please let us know if you have any ideas for optimization. We can discuss tomorrow at the meeting.

Copy link
Collaborator

@hach1yon hach1yon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@YamatoSecurity YamatoSecurity merged commit 61befea into main Feb 28, 2024
7 checks passed
@fukusuket fukusuket deleted the 1277-not-to-use-replacen branch February 28, 2024 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

not to use replacen() in ConditionCompiler.tokenize() function
4 participants