Skip to content

Conversation

@atuchin-m
Copy link
Collaborator

No description provided.

@atuchin-m atuchin-m self-assigned this Nov 20, 2025
@atuchin-m atuchin-m requested a review from a team as a code owner November 20, 2025 07:43
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust Benchmark

Benchmark suite Current: 29cf12d Previous: 3f09734 Ratio
rule-match-browserlike/brave-list 1991942926 ns/iter (± 7538002) 2062822761 ns/iter (± 6207839) 0.97
rule-match-first-request/brave-list 1091120 ns/iter (± 7198) 1126665 ns/iter (± 9322) 0.97
blocker_new/brave-list 132721976 ns/iter (± 1001032) 146929925 ns/iter (± 463260) 0.90
blocker_new/brave-list-deserialize 24291322 ns/iter (± 243802) 23264715 ns/iter (± 60876) 1.04
memory-usage/brave-list-initial 10213344 ns/iter (± 3) 10213344 ns/iter (± 3) 1
memory-usage/brave-list-initial/max 60612235 ns/iter (± 3) 60612235 ns/iter (± 3) 1
memory-usage/brave-list-initial/alloc-count 996170 ns/iter (± 3) 1231711 ns/iter (± 3) 0.81
memory-usage/brave-list-1000-requests 2692712 ns/iter (± 3) 2692712 ns/iter (± 3) 1
memory-usage/brave-list-1000-requests/alloc-count 69464 ns/iter (± 3) 71607 ns/iter (± 3) 0.97
url_cosmetic_resources/brave-list 190467 ns/iter (± 798) 192394 ns/iter (± 2074) 0.99
cosmetic-class-id-match/brave-list 3383031 ns/iter (± 938484) 3365434 ns/iter (± 902553) 1.01

This comment was automatically generated by workflow using github-action-benchmark.

github-actions[bot]

This comment was marked as resolved.

Comment on lines 884 to 894
#[deprecated(since = "0.11.1", note = "use get_tokens_optimized instead")]
pub fn get_tokens(&self) -> Vec<Vec<Hash>> {
match self.get_tokens_optimized() {
let mut tokens_buffer = TokensBuffer::default();
match self.get_tokens_optimized(&mut tokens_buffer) {
FilterTokens::OptDomains(domains) => {
domains.into_iter().map(|domain| vec![domain]).collect()
domains.iter().map(|domain| vec![*domain]).collect()
}
FilterTokens::Other(tokens) => vec![tokens],
FilterTokens::Other(tokens) => vec![tokens.to_vec()],
FilterTokens::Empty => vec![],
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're making a breaking version change, shall we remove this one also?

we can rename get_tokens_optimized back to get_tokens in the process

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, I am open to making these pub(crate) to prevent needing similar breaking changes in the future

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I also made pub(crate) related TokensBuffer and FilterTokens

fn matches_test(&self, request: &request::Request) -> bool;
}

impl NetworkMatchable for NetworkFilter {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can also make NetworkMatchable into a pub(crate) trait since it's only implemented for FlatNetworkFilter now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I also removed unused matches_test.
Maybe it makes sense to completely remove trait NetworkMatchable, but since it non-public, we can do it later.

@atuchin-m atuchin-m force-pushed the pre-0.12.x branch 2 times, most recently from 458febb to e56ec36 Compare November 21, 2025 08:23
The PR reduces the amount of allocations by using arrayvec crate that allocates memory on a stack instead of a heap.
* the number of allocations is reduced (-19%, memory-usage/brave-list-initial/alloc-count);
* building time is improved (about -10%, memory-usage/brave-list-initial);
* the token limit is increased up to 256;
* the tests expectations were updated (one case hit the old token limit).
The PR removes impl NetworkMatchable for NetworkFilter and port the related tests to the flatbuffer impl (that we actually use in production)
Added:
- `arrayvec` dependency

Changed:
- Improved performance and memory usage when parsing filters

Removed:
- `FilterTokens`, `get_tokens`, `get_tokens_optimized`, and
  `NetworkMatchable` are now crate-internal only
@github-actions
Copy link

[puLL-Merge] - brave/adblock-rust@574

Description

This PR upgrades the adblock-rust crate from version 0.11.1 to 0.12.0. The main changes focus on performance optimization by:

  1. Replacing heap-allocated Vec<Hash> with stack-allocated ArrayVec for token storage to avoid allocations during filter tokenization
  2. Refactoring the NetworkMatchable trait to be internal-only and changing test infrastructure to use full Engine instances instead of directly testing filters
  3. Improving regex manager ergonomics by making the borrow method public within the crate and unifying the API across single-threaded and multi-threaded configurations
  4. Updating test infrastructure to use more realistic testing scenarios with complete engines rather than isolated filter matching

Possible Issues

  1. Breaking API changes: The get_tokens() method on NetworkFilter has been deprecated and removed from public API. The get_tokens_optimized() method has been changed to get_tokens() with a different signature that requires a mutable TokensBuffer parameter. This is a breaking change for any external code that directly calls these methods.

  2. ArrayVec capacity limit: The new TokensBuffer type has a fixed capacity of 256 elements. If tokenization ever produces more than 256 tokens, they will be silently dropped. The old code used Vec which could grow unbounded. This could potentially affect filter matching accuracy for extremely complex filters.

  3. Test timing changes: The regex manager tests now use mock_instant::thread_local::MockClock instead of mock_instant::global::MockClock, which could affect test isolation in multi-threaded test runs.

  4. Test expectation change: In tests/matching.rs, the expected values changed from (106860, 136085) to (106861, 136084), indicating one request changed from "passed" to "blocked". This suggests a subtle behavior change in filter matching logic, though the cause isn't obvious from the diff.

Security Hotspots

  1. Silent token truncation (Low-Medium risk): In fast_tokenizer_no_regex in src/utils.rs (lines 49-51), when the token buffer reaches capacity minus 1, the function returns early without processing remaining tokens:
    if tokens_buffer.capacity() - tokens_buffer.len() <= 1 {
        return; // reserve one free slot for the zero token
    }
    This could theoretically allow an attacker to craft filters that exceed the 256-token limit, causing important tokens to be dropped and potentially bypassing filter rules. However, the practical risk is low as legitimate filters are unlikely to generate this many tokens.

Privacy Hotspots

No significant privacy issues identified in this PR. The changes are primarily performance optimizations and internal refactoring that don't affect data collection, storage, or transmission.

Changes

Changes

Cargo.toml / Cargo.lock / js/Cargo.toml / package.json

  • Version bump from 0.11.1 to 0.12.0
  • Added arrayvec = "0.7" dependency

src/utils.rs

  • Added TokensBuffer type alias: ArrayVec<Hash, 256> (fixed-size array-vector)
  • Removed TOKENS_BUFFER_SIZE and TOKENS_MAX constants
  • Modified tokenization functions to accept &mut TokensBuffer instead of &mut Vec<Hash>
  • Changed early return condition to prevent buffer overflow while reserving space for zero token
  • Updated tokenize() to convert TokensBuffer to Vec<Hash> for return

src/filters/network.rs

  • Changed FilterTokens enum to use borrowed slices (&'a [Hash]) instead of owned vectors
  • Modified get_tokens_optimized() to get_tokens() with new signature requiring &'a mut TokensBuffer parameter
  • Made get_tokens() internal (pub(crate))
  • Removed deprecated get_tokens() method
  • Added matches_test() method for testing that creates a full engine
  • Made NetworkMatchable trait internal (pub(crate))
  • Removed NetworkMatchable implementation for NetworkFilter

src/filters/fb_network.rs

  • Removed matches_test() method from NetworkMatchable implementation for FlatNetworkFilter

src/filters/fb_network_builder.rs

  • Added TokensBuffer to serialization logic
  • Modified to use borrowed token slices from get_tokens()

src/filters/network_matchers.rs

  • Removed check_included_domains() and check_excluded_domains() functions
  • Only retained the _mapped variants of these functions

src/blocker.rs

  • Added RegexManagerRef type alias to abstract over single-threaded and multi-threaded implementations
  • Made borrow_regex_manager() method pub(crate)
  • Unified regex manager borrowing logic for both feature configurations
  • Added check_exceptions() test helper method

src/engine.rs

  • Added check_network_request_exceptions() test helper method
  • Added borrow_regex_manager() test helper method

src/regex_manager.rs

  • Changed mock clock import from mock_instant::global::Instant to mock_instant::thread_local::Instant for tests

src/request.rs

  • Modified calculate_tokens() to use TokensBuffer and convert to Vec<Hash>

tests/ (multiple files)

  • Refactored tests to use full Engine instances instead of directly testing NetworkFilter.matches()
  • Removed direct RegexManager usage in tests
  • Updated test assertions to check engine results rather than filter matches
  • Modified domain checking tests to use mapped domain functions
sequenceDiagram
    participant Client
    participant Engine
    participant Blocker
    participant FilterList
    participant TokensBuffer
    participant RegexManager
    
    Client->>Engine: check_network_request(request)
    Engine->>Blocker: check(request, resources)
    Blocker->>Blocker: borrow_regex_manager()
    Blocker->>RegexManager: get mutable reference
    RegexManager-->>Blocker: RegexManagerRef
    
    Blocker->>FilterList: check(request, regex_manager)
    
    loop For each filter in list
        FilterList->>TokensBuffer: create/clear buffer
        FilterList->>NetworkFilter: get_tokens(&mut buffer)
        NetworkFilter->>TokensBuffer: tokenize into buffer (stack allocated)
        TokensBuffer-->>NetworkFilter: token references
        NetworkFilter-->>FilterList: FilterTokens (borrowed slices)
        
        FilterList->>NetworkFilter: matches(request, regex_manager)
        NetworkFilter->>RegexManager: compile_if_needed(pattern)
        RegexManager-->>NetworkFilter: compiled regex
        NetworkFilter-->>FilterList: match result
    end
    
    FilterList-->>Blocker: BlockerResult
    Blocker-->>Engine: BlockerResult
    Engine-->>Client: BlockerResult
Loading

@antonok-edm antonok-edm merged commit c69021f into master Nov 24, 2025
9 checks passed
@antonok-edm antonok-edm deleted the pre-0.12.x branch November 24, 2025 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants