Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage by removing unnecessary regex #894

Merged
merged 11 commits into from
Jan 26, 2023

Conversation

fukusuket
Copy link
Collaborator

What Changed

Evidence

Environment

  • OS: Windows 10 Home edition
  • Hard: Memory 16GB , Core 8, SSD, laptop

Benchmark1

I ran a benchmark using this procedure(6.1GB evtx) and the results were as follows.

Version Elapsed time Memory(peak) Events with hits / Total events Output file size(bytes)
before 00:10:44.026 4.7 GiB 1,593,715 / 4,817,181 575085389
This PR 00:10:42.195 4.4 GiB 1,593,715 / 4,817,181 575085385

Console output

before

PS C:\tmp\hayabusa-2.1.0-win-64-bit> .\hayabusa.exe csv-timeline -d ..\hayabusa-big-evtx\ -o 1.csv --debug
Results Summary:

Events with hits / Total events: 1,593,715 / 4,817,181 (Data reduction: 3,223,466 events (66.92%))

Total | Unique detections: 1,627,284 | 150
Total | Unique critical detections: 0 (0.00%) | 0 (0.00%)
Total | Unique high detections: 12,044 (0.74%) | 20 (13.33%)
Total | Unique medium detections: 11,118 (0.68%) | 38 (25.33%)
Total | Unique low detections: 1,053,623 (64.75%) | 42 (28.00%)
Total | Unique informational detections: 550,499 (33.83%) | 50 (33.33%)

...

Saved file: 1.csv (575.1 MB)
Elapsed time: 00:10:44.026
Rule Parse Processing Time: 00:00:20.191
Analysis Processing Time: 00:10:02.174
Output Processing Time: 00:00:21.658

Memory usage stats:
heap stats:    peak      total      freed    current       unit      count
  reserved:    4.7 GiB    4.7 GiB   83.0 MiB    4.6 GiB
 committed:    4.3 GiB   57.6 GiB   53.4 GiB    4.1 GiB

This PR

PS C:\tmp\hayabusa-2.1.0-win-64-bit> .\hayabusa.exe csv-timeline -d ..\hayabusa-big-evtx\ -o 1.csv --debug
Results Summary:

Events with hits / Total events: 1,593,715 / 4,817,181 (Data reduction: 3,223,466 events (66.92%))

Total | Unique detections: 1,627,284 | 150
Total | Unique critical detections: 0 (0.00%) | 0 (0.00%)
Total | Unique high detections: 12,044 (0.74%) | 20 (13.33%)
Total | Unique medium detections: 11,118 (0.68%) | 38 (25.33%)
Total | Unique low detections: 1,053,623 (64.75%) | 42 (28.00%)
Total | Unique informational detections: 550,499 (33.83%) | 50 (33.33%)
...

Saved file: 1.csv (575.1 MB)
Elapsed time: 00:10:42.195
Rule Parse Processing Time: 00:00:20.234
Analysis Processing Time: 00:10:00.446
Output Processing Time: 00:00:21.513

Memory usage stats:
heap stats:    peak      total      freed    current       unit      count
  reserved:    4.4 GiB    4.4 GiB   83.0 MiB    4.4 GiB
 committed:    4.0 GiB   65.0 GiB   61.2 GiB    3.8 GiB

Benchmark2

I ran a benchmark using hayabusa-sample-evtx and the results were as follows.

Version Elapsed time Memory(peak) Events with hits / Total events Output file size(bytes)
before 00:00:16.158 1.9 GiB 19,606 / 47,458 16447790
This PR 00:00:15.422 1.6 GiB 19,606 / 47,458 16447790

I would appreciate it if you could review🙏

@fukusuket fukusuket self-assigned this Jan 25, 2023
@fukusuket fukusuket added the enhancement New feature or request label Jan 25, 2023
@codecov
Copy link

codecov bot commented Jan 25, 2023

Codecov Report

Base: 73.84% // Head: 73.92% // Increases project coverage by +0.08% 🎉

Coverage data is based on head (47af259) compared to base (6cb871b).
Patch coverage: 96.12% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #894      +/-   ##
==========================================
+ Coverage   73.84%   73.92%   +0.08%     
==========================================
  Files          23       23              
  Lines       14576    14643      +67     
==========================================
+ Hits        10763    10825      +62     
- Misses       3813     3818       +5     
Impacted Files Coverage Δ
src/detections/rule/matchers.rs 96.45% <96.12%> (-0.15%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Collaborator

@hitenkoku hitenkoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fukusuket Thanks for your pull request.

LGTM

@fukusuket
Copy link
Collaborator Author

@hitenkoku
Thank you so much for quick review🙇 I fixed typo :) 30d78d6

@YamatoSecurity
Copy link
Collaborator

@fukusuket PRありがとうございます!
Macのベンチマークではメモリが300MB減っていますが、処理時間が少し増えています。
21:54.523 -> 22:26.437
1回目はメモリ使用 7.5GB -> 7.2GB、2回目は 7.5GB -> 7.4GBで、処理時間は二回目も同じ22:26でした。
明日Windowsの方でも試してみるので、少々お待ち下さい。

@fukusuket
Copy link
Collaborator Author

@YamatoSecurity
ベンチマーク早速取得いただきありがとうございます🙇
マッチ速度に影響しない想定の変更でしたが、すみません!どこかでデグレさせてしまっていそうです、確認いたします!
(またデータにより遅くなるパターンがありそうでしたら、こちらのPR一旦取り下げます!)

@fukusuket fukusuket marked this pull request as draft January 25, 2023 14:34
@fukusuket
Copy link
Collaborator Author

fukusuket commented Jan 25, 2023

6.1GB evtxに対して手元のMac環境で試したところ(それぞれ測定前に再起動, 2回ずつ測定)、以下の結果でした🤔
なので、データのパターンによりスピード劣化してしまうケースがあるということかもしれません.... 😭 

Environment

  • OS: macOS montery version 13.1
  • Hard: Macbook Air(M1, 2020) , Memory 8GB, Core 8

before

Elapsed time: 00:06:22.257
Rule Parse Processing Time: 00:00:01.861
Analysis Processing Time: 00:06:15.218
Output Processing Time: 00:00:05.178

Memory usage stats:
heap stats:    peak      total      freed    current       unit      count
  reserved:    4.9 GiB    4.9 GiB   83.0 MiB    4.8 GiB
 committed:    4.5 GiB   49.8 GiB   45.5 GiB    4.3 GiB

This PR

Elapsed time: 00:06:12.900
Rule Parse Processing Time: 00:00:01.310
Analysis Processing Time: 00:06:06.634
Output Processing Time: 00:00:04.955

Memory usage stats:
heap stats:    peak      total      freed    current       unit      count
  reserved:    4.6 GiB    4.6 GiB   83.0 MiB    4.5 GiB
 committed:    4.1 GiB   49.6 GiB   45.7 GiB    3.9 GiB

@YamatoSecurity
Copy link
Collaborator

@fukusuket Hummm, yea, it might just be my environment so I will check tonight on a different computer.

@hitenkoku hitenkoku marked this pull request as ready for review January 26, 2023 11:05
Copy link
Collaborator

@YamatoSecurity YamatoSecurity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
When checking on my Windows environment.
Speed improvement: 25:25 -> 24:45
Memory reduction: 7.6GB -> 7.2GB

@YamatoSecurity YamatoSecurity merged commit b97b224 into main Jan 26, 2023
@fukusuket
Copy link
Collaborator Author

Thank you for your review and benchmark :)

@fukusuket fukusuket deleted the improve-speed-by-reduce-regex-match-when-pipe branch January 26, 2023 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants