Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize regexp memory usage #204

Merged
merged 4 commits into from Mar 30, 2022
Merged

Conversation

atuchin-m
Copy link
Collaborator

@atuchin-m atuchin-m commented Mar 28, 2022

Resolves #203
Currently, the regexp field consumes a lot of memory because of the size of Arc<RwLock<Option<Arc<CompiledRegex>>>>.
It takes 40 bytes per item even when None is stored.

Most of the rules is non-regex(~90%).
This new approach takes 8 bytes per item for such rules.
It saves about 7Mb of memory in total.

Important note: this makes the engine not thread-safe and wrap under a new feature.
In fact, the browser uses adblock from a single sequence => we don't need to synchronize threads, the thread-safety feature should be disabled in browser to get benefit from the optimization.

Memory usage after loading data/rs-ABPFilterParserData.dat in tests/deserialization.rs
(Release, ubuntu x64, added jemallocator locally to calculate):
before: 42094712 bytes allocated/53563392 bytes resident after: 34332128 bytes allocated/48881664 bytes resident

@atuchin-m atuchin-m self-assigned this Mar 28, 2022
@atuchin-m
Copy link
Collaborator Author

@antonok-edm I'm not sure about enabling the feature in native/Cargo.toml.
Looks like it's used only in node-js and multi-threading doesn't need here.

Copy link
Collaborator

@antonok-edm antonok-edm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a few minor fixes, including:

  • 4 space indentation as per standard Rust formatting
  • removed manual default() implementation on RegexStorage
  • related the feature to object-pooling as an opt-out optimization
  • added static assertions for Send and Sync implementations on Engine

This should be all set now, I will merge it and create a new release

@antonok-edm antonok-edm merged commit 4230014 into master Mar 30, 2022
@antonok-edm antonok-edm deleted the optimize-regexp-memory-usage branch March 30, 2022 01:02
@antonok-edm
Copy link
Collaborator

Published as v0.5.2 on crates.io

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize regexp memory usage
2 participants