perf: use slice instead of `replacen` in ConditionCompiler.tokenize() #1287

fukusuket · 2024-02-27T14:46:34Z

What Changed

Closed not to use replacen() in ConditionCompiler.tokenize() function #1277

Evidence

Environment

OS: macOS Sonoma version 14.2.1
Hayabusa v2.14.0-dev
rustc 1.76.0

I confirmed that there are no differences in the resulting CSV as shown below before after fix.

% ./hayabusa-new csv-timeline -d ../hayabusa-sample-evtx -o new.csv -w --debug -C -q
% ./hayabusa-main csv-timeline -d ../hayabusa-sample-evtx -o main.csv -w --debug -C -q
% diff main.csv new.csv
%

I also confirmed that there were no rule parsing errors and that the number of rules(4077) loaded was the same.

main

% ./hayabusa-main csv-timeline -d ../hayabusa-sample-evtx -o main.csv -w --debug -C -q
Start time: 2024/02/27 23:41

Total event log files: 583
Total file size: 137.1 MB

Loading detection rules. Please wait.

Excluded rules: 26
Noisy rules: 12 (Disabled)

Deprecated rules: 202 (4.95%) (Disabled)
Experimental rules: 1091 (26.76%)
Stable rules: 240 (5.89%)
Test rules: 2746 (67.35%)
Unsupported rules: 45 (1.10%) (Disabled)

Hayabusa rules: 162
Sigma rules: 3915
Total enabled detection rules: 4077

ThisPR

% ./hayabusa-new csv-timeline -d ../hayabusa-sample-evtx -o new.csv -w --debug -C -q
Start time: 2024/02/27 23:41

Total event log files: 583
Total file size: 137.1 MB

Loading detection rules. Please wait.

Excluded rules: 26
Noisy rules: 12 (Disabled)

Deprecated rules: 202 (4.95%) (Disabled)
Experimental rules: 1091 (26.76%)
Stable rules: 240 (5.89%)
Test rules: 2746 (67.35%)
Unsupported rules: 45 (1.10%) (Disabled)

Hayabusa rules: 162
Sigma rules: 3915
Total enabled detection rules: 4077

I would appreciate it if you could check it out when you have time🙏

codecov · 2024-02-27T14:54:42Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.23%. Comparing base (f1b3cd6) to head (a7a6332).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1287   +/-   ##
=======================================
  Coverage   81.23%   81.23%           
=======================================
  Files          27       27           
  Lines       24407    24407           
=======================================
  Hits        19828    19828           
  Misses       4579     4579

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

hitenkoku

LGTM

YamatoSecurity · 2024-02-27T23:32:01Z

@fukusuket I am taking benchmarks but on my mac it seems like it may use more memory:
./target/release/hayabusa csv-timeline -d ../hayabusa-sample-evtx --debug -o test.csv -D -u -w -C

baseline (hayabusa main):
941.7 MiB
941.4 MiB
936.2 MiB
918.6 MiB
919.2 MiB

1277 PR
943.1 MiB
941.9 MiB
940.1 MiB
940.7 MiB
945.1 MiB

I am checking rss usage. I will check a bigger data sample on windows and check the difference.

fukusuket · 2024-02-28T00:32:04Z

@YamatoSecurity
Thank you so much for benchmarking :) I see...
The memory usage reduction effect of this PR may be very small, so it may be difficult to measure the effect in --debug 🤔

This PR only improves memory usage when loading rules and tokenizing conditions string.
For example, in the following rule, when reading the following condition string.
https://github.com/Yamato-Security/hayabusa-rules/blob/0a254ccacaa108a4f634a32f10e0aebe6ce9e3b2/sigma/builtin/application/Other/win_av_relevant_match.yml#L102

The effect of reducing memory usage does not depend on the amount of logs because it only affects the processing before scanning the logs. (only affects the process of compiling rules)

If you increase the number of rules (execute many condition string tokenize processes) and minimize the scan log, you may be able to see a little more memory reduction effect...?

fukusuket · 2024-02-28T00:37:20Z

@hach1yon @hitenkoku
I'm sorry if I've misunderstood or if the implementation is bad, I'd appreciate it if you could point it out🙏

YamatoSecurity

@fukusuket Do you think this would affect processing speed? I tested on Windows with bigger logs and the processing time went from 40:37 to 34:57 so it seems that it is faster and no difference in results so LGTM!

YamatoSecurity · 2024-02-28T02:41:42Z

I just retested the 2.14.0 main baseline and was able to scan in 35:10 so maybe not such a difference in speed as well.. but there is no regression so LGTM.

fukusuket · 2024-02-28T04:00:48Z

@YamatoSecurity Thank you so much for retesting 🙇

Since the rule compilation process is executed less frequently (compared to scanning process for each log),I think the speedup is relatively small. However, if similar modifications(Avoid creating new String instances unnecessarily) can be applied to the scanning process for each log, I think the effect will be greater.

YamatoSecurity · 2024-02-28T04:42:53Z

I see. Please let us know if you have any ideas for optimization. We can discuss tomorrow at the meeting.

hach1yon

LGTM

perf: use slice instead of replacen

063c251

fukusuket self-assigned this Feb 27, 2024

fukusuket added the enhancement New feature or request label Feb 27, 2024

fukusuket added this to the v2.14.0 milestone Feb 27, 2024

fix: cargo fmt error

a7a6332

fukusuket changed the title ~~perf: use slice instead of replacen in ConditionCompiler.tokenize()~~ perf: use slice instead of replacen in ConditionCompiler.tokenize() Feb 27, 2024

fukusuket requested review from hitenkoku, hach1yon and YamatoSecurity February 27, 2024 15:03

hitenkoku approved these changes Feb 27, 2024

View reviewed changes

YamatoSecurity approved these changes Feb 28, 2024

View reviewed changes

hach1yon approved these changes Feb 28, 2024

View reviewed changes

YamatoSecurity merged commit 61befea into main Feb 28, 2024
7 checks passed

fukusuket deleted the 1277-not-to-use-replacen branch February 28, 2024 07:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: use slice instead of `replacen` in ConditionCompiler.tokenize() #1287

perf: use slice instead of `replacen` in ConditionCompiler.tokenize() #1287

fukusuket commented Feb 27, 2024 •

edited

Loading

codecov bot commented Feb 27, 2024

hitenkoku left a comment

YamatoSecurity commented Feb 27, 2024

fukusuket commented Feb 28, 2024 •

edited

Loading

fukusuket commented Feb 28, 2024

YamatoSecurity left a comment

YamatoSecurity commented Feb 28, 2024

fukusuket commented Feb 28, 2024 •

edited

Loading

YamatoSecurity commented Feb 28, 2024

hach1yon left a comment

perf: use slice instead of replacen in ConditionCompiler.tokenize() #1287

perf: use slice instead of replacen in ConditionCompiler.tokenize() #1287

Conversation

fukusuket commented Feb 27, 2024 • edited Loading

What Changed

Evidence

Environment

main

ThisPR

codecov bot commented Feb 27, 2024

Codecov Report

hitenkoku left a comment

Choose a reason for hiding this comment

YamatoSecurity commented Feb 27, 2024

fukusuket commented Feb 28, 2024 • edited Loading

fukusuket commented Feb 28, 2024

YamatoSecurity left a comment

Choose a reason for hiding this comment

YamatoSecurity commented Feb 28, 2024

fukusuket commented Feb 28, 2024 • edited Loading

YamatoSecurity commented Feb 28, 2024

hach1yon left a comment

Choose a reason for hiding this comment

perf: use slice instead of `replacen` in ConditionCompiler.tokenize() #1287

perf: use slice instead of `replacen` in ConditionCompiler.tokenize() #1287

fukusuket commented Feb 27, 2024 •

edited

Loading

fukusuket commented Feb 28, 2024 •

edited

Loading

fukusuket commented Feb 28, 2024 •

edited

Loading