-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: use slice instead of replacen
in ConditionCompiler.tokenize()
#1287
Conversation
replacen
in ConditionCompiler.tokenize()
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1287 +/- ##
=======================================
Coverage 81.23% 81.23%
=======================================
Files 27 27
Lines 24407 24407
=======================================
Hits 19828 19828
Misses 4579 4579 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@fukusuket I am taking benchmarks but on my mac it seems like it may use more memory: baseline (hayabusa main): 1277 PR I am checking |
@YamatoSecurity This PR only improves memory usage when loading rules and tokenizing conditions string. The effect of reducing memory usage does not depend on the amount of logs because it only affects the processing before scanning the logs. (only affects the process of compiling rules) If you increase the number of rules (execute many |
@hach1yon @hitenkoku |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fukusuket Do you think this would affect processing speed? I tested on Windows with bigger logs and the processing time went from 40:37 to 34:57 so it seems that it is faster and no difference in results so LGTM!
I just retested the 2.14.0 main baseline and was able to scan in 35:10 so maybe not such a difference in speed as well.. but there is no regression so LGTM. |
@YamatoSecurity Thank you so much for retesting 🙇 Since the rule compilation process is executed less frequently (compared to scanning process for each log),I think the speedup is relatively small. However, if similar modifications(Avoid creating new String instances unnecessarily) can be applied to the scanning process for each log, I think the effect will be greater. |
I see. Please let us know if you have any ideas for optimization. We can discuss tomorrow at the meeting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What Changed
Evidence
Environment
I confirmed that there are no differences in the resulting CSV as shown below before after fix.
I also confirmed that there were no rule parsing errors and that the number of rules(4077) loaded was the same.
main
ThisPR
I would appreciate it if you could check it out when you have time🙏