Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage when reading JSONL file #921

Merged
merged 3 commits into from
Feb 14, 2023

Conversation

fukusuket
Copy link
Collaborator

What Changed

  • Changed to return Iterator instead of Vec when converting json.
    • Iterators can reduce memory usage because memory allocation only occurs one record at a time.
    • Also, change the return type to Box<dyn Iterator> to make Iterator the return value of the function.
      • Iterator is trait (not structs), so need to be wrapped in a Box to return from a function
  • Refactored JSON conversion function.
    • Separated logic for handling JSONL format.

Evidence

Environment

  • OS: macOS montery version 13.1
  • Hard: Macbook Air(M1, 2020) , Memory 8GB, Core 8

Benchmark1

Data: OTRF/Security-Datasets apt29/day2 json.
Command: ./hayabusa json-timeline -J -f ../apt29_evals_day2_manual_2020-05-02035409.json -o new2.json --debug

Version Elapsed time Memory(peak) Events with hits / Total events Output file size(bytes)
v2.2.0 00:02:32.057 5.4 GiB 22,305 / 587,286 248364107
This PR 00:02:31.172 1.2 GiB 22,305 / 587,286 248364107

I also verified that there are no diffs between the 2 result files.

Console output

v2.2.0

Results Summary:

Events with hits / Total events: 22,305 / 587,286 (Data reduction: 564,981 events (96.20%))

Total | Unique detections: 176,937 | 168
Total | Unique critical detections: 0 (0.00%) | 0 (0.00%)
Total | Unique high detections: 411 (0.23%) | 43 (25.60%)
Total | Unique medium detections: 4,683 (2.65%) | 47 (27.98%)
Total | Unique low detections: 100,107 (56.58%) | 35 (20.83%)
Total | Unique informational detections: 71,736 (40.54%) | 43 (25.60%)
...
Saved file: main2.json (248.4 MB)
Elapsed time: 00:02:32.057
Rule Parse Processing Time: 00:00:00.860
Analysis Processing Time: 00:02:28.933
Output Processing Time: 00:00:02.263

Memory usage stats:
heap stats:    peak      total      freed    current       unit      count
  reserved:    5.4 GiB    5.4 GiB   36.0 MiB    5.4 GiB
 committed:    5.2 GiB   11.5 GiB   10.5 GiB    1.0 GiB

This PR

Results Summary:

Events with hits / Total events: 22,305 / 587,286 (Data reduction: 564,981 events (96.20%))

Total | Unique detections: 176,937 | 168
Total | Unique critical detections: 0 (0.00%) | 0 (0.00%)
Total | Unique high detections: 411 (0.23%) | 43 (25.60%)
Total | Unique medium detections: 4,683 (2.65%) | 47 (27.98%)
Total | Unique low detections: 100,107 (56.58%) | 35 (20.83%)
Total | Unique informational detections: 71,736 (40.54%) | 43 (25.60%)
...
Saved file: new2.json (248.4 MB)
Elapsed time: 00:02:31.172
Rule Parse Processing Time: 00:00:00.885
Analysis Processing Time: 00:02:28.203
Output Processing Time: 00:00:02.083

Memory usage stats:
heap stats:    peak      total      freed    current       unit      count
  reserved:    1.2 GiB    1.2 GiB      0        1.2 GiB
 committed:    1.1 GiB   10.0 GiB    9.1 GiB  895.8 MiB

Benchmark2

Data: OTRF/Security-Datasets apt29/day1 json.
Command: ./hayabusa json-timeline -J -f ../apt29_evals_day1_manual_2020-05-01225525.json -o out.json --debug

Version Elapsed time Memory(peak) Events with hits / Total events Output file size(bytes)
v.2.2.0 00:00:36.544 2.1 GiB 26,034 / 196,081 32589352
This PR 00:00:36.623 832.1 MiB 26,034 / 196,081 32589352

I also verified that there are no diffs between the 2 result files.

I would appreciate it if you could review🙏

@fukusuket fukusuket self-assigned this Feb 14, 2023
@fukusuket fukusuket added the enhancement New feature or request label Feb 14, 2023
Copy link
Collaborator

@hitenkoku hitenkoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your pull request,
LGTM.

@hitenkoku
Copy link
Collaborator

I added this pull request changes in CHANGELOG. ( 5ace0a7 )

@hitenkoku hitenkoku added this to the v2.2.2 milestone Feb 14, 2023
@hitenkoku hitenkoku added this to In progress in hayabusa development board Feb 14, 2023
@hitenkoku hitenkoku moved this from In progress to In Review in hayabusa development board Feb 14, 2023
@codecov
Copy link

codecov bot commented Feb 14, 2023

Codecov Report

Base: 74.77% // Head: 74.79% // Increases project coverage by +0.01% 🎉

Coverage data is based on head (5ace0a7) compared to base (40252dd).
Patch coverage: 81.66% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #921      +/-   ##
==========================================
+ Coverage   74.77%   74.79%   +0.01%     
==========================================
  Files          24       24              
  Lines       15758    15776      +18     
==========================================
+ Hits        11783    11799      +16     
- Misses       3975     3977       +2     
Impacted Files Coverage Δ
src/main.rs 27.12% <35.29%> (-0.13%) ⬇️
src/detections/utils.rs 83.63% <100.00%> (+1.27%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Collaborator

@YamatoSecurity YamatoSecurity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Memory usage decreased very much. Thank you for the PR!

@YamatoSecurity YamatoSecurity merged commit 9dd45c1 into main Feb 14, 2023
@fukusuket fukusuket deleted the improve-memory-usage-when-read-jsonl branch February 14, 2023 23:26
@fukusuket
Copy link
Collaborator Author

Thank you so much for updating changelog and quick review :)

@hitenkoku hitenkoku moved this from In Review to Done & Praise🎉 in hayabusa development board Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
hayabusa development board
  
Done & Praise🎉
Development

Successfully merging this pull request may close these issues.

None yet

3 participants