Skip to content

Conversation

@atuchin-m
Copy link
Collaborator

@atuchin-m atuchin-m commented Oct 17, 2025

The commits were rebased in a new branch 0.11.x-merge (to preserve the original commits).

atuchin-m and others added 22 commits October 16, 2025 23:32
The PR moves from per-NetworkList flatbuffers to a one (per-Engine).
It doesn't affect the performance metrics, but opens
a possibility to put cosmetic filters to the same flatbuffer.
It also simplifies the serialization/deserialization code.
* Unsafe tools moved to flatbuffers.
* Added FlatMultiMapView, FlatFilterMap replaced.
* Added FlatSetView.
* Removed unused utils.
* Temporary marked FlatSet as dead_code.
* Added test for FlatSetView.
* Added FlatMultiMapView tests.
* Clean up.
* Ignore live test.
* Clippy fixes.
* Review issues are addressed.
* Optional iterator for multimap (improve perf).
* Removed unused file.
The PR introduces new structures to be used in cosmetic_filters:

FlatSerialize trait for things that can be serialized to flatbuffer;
Builder structure to help construct a serializable representation + unit tests;
Migrate network filter to the new API;
Increase MIN_ALIGNMENT to 8 to support using u64 as a key.
The PR introduces new structures to store cosmetic filters in flatbuffer.

* The algorithms to sort and apply rules shouldn't be touched, only storage level is changed.
* CosmeticFilterCache is now a view for a flatbuffer data.
* Old storage layer (via serde) is removed, the version is now stored in the flatbuffer,
* Another container 'FlatMap' is introduced + tests
* Most host-specific rules are stored in a single FlatMap (domain_hash => HostnameSpecificRules). Although, the most common rule kinds are stored as a dedicated multi-maps to save memory
* A code to build flatbuffer structure is moved to dedicated files.
unfortunately this requires duplicating some definitions to support
additional `Send + Sync` trait bounds, since Rust does not natively
support conditional supertraits.
@atuchin-m atuchin-m self-assigned this Oct 17, 2025
@atuchin-m atuchin-m requested a review from a team as a code owner October 17, 2025 13:38
@atuchin-m atuchin-m changed the title 0.11.x merge 0.11.x merge to master Oct 17, 2025
@github-actions
Copy link

[puLL-Merge] - brave/adblock-rust@530

Description

This PR significantly refactors the internal storage and serialization architecture of the adblock-rust crate. The main changes include:

  1. Serialization format overhaul: Migrates from MessagePack (rmp-serde) to FlatBuffers for engine serialization, with a new .dat file format that includes integrity checking via seahash
  2. Storage refactoring: Replaces in-memory HashMap/HashSet storage with FlatBuffer-based containers, introducing new container types like FlatMapView, FlatMultiMapView, HashMapView, etc.
  3. Feature flag changes: Renames unsync-regex-caching to single-thread and regex-debug-info to debug-info
  4. Cosmetic filter storage redesign: Moves cosmetic filter caching from runtime HashMap structures to pre-serialized FlatBuffer containers
  5. Resource storage abstraction: Introduces ResourceStorageBackend trait to allow custom resource storage implementations
  6. Dependency updates: Pins rustc-hash to v1.1.0 for better performance

Possible Issues

  1. Breaking changes: The serialization format version bump from 1 to 2 means existing .dat files will be incompatible and need regeneration
  2. Memory usage patterns: The new FlatBuffer-based approach may have different memory usage characteristics that could affect performance in memory-constrained environments
  3. Complex migration path: The extensive refactoring makes it difficult to incrementally adopt changes, potentially causing issues for downstream consumers
  4. Hash function stability: The code relies on rustc_hash::FxHasher being stable across runs, which is critical for deterministic serialization

Security Hotspots

  1. Unsafe operations in flatbuffer handling (lines around fb_vector_to_slice, VerifiedFlatbufferMemory): Uses unsafe code to convert FlatBuffer vectors to slices, with alignment assumptions that could cause memory safety issues if violated
  2. Integrity checking bypass: While seahash verification is added, the from_builder path skips verification entirely, potentially allowing corrupted data to be processed
  3. Permission encoding in cosmetic filters: The encode_script_with_permission function encodes permission bits as the last character of a string, which could be vulnerable to string manipulation attacks
Changes

Changes

  • Cargo.toml/Cargo.lock: Updates dependencies, pins rustc-hash to v1.1.0, renames feature flags
  • src/data_format/: Complete rewrite of serialization logic, removes MessagePack dependency
  • src/filters/: Major refactoring of network filter storage, introduces FlatBuffer-based implementations
  • src/flatbuffers/: New module with container abstractions for FlatBuffer data structures
  • src/cosmetic_filter_cache.rs: Converts from runtime HashMap storage to FlatBuffer-based storage
  • src/cosmetic_filter_cache_builder.rs: New builder pattern for cosmetic filter serialization
  • src/resources/: Introduces ResourceStorageBackend trait for pluggable storage backends
  • src/engine.rs: Updates to use new storage architecture, changes serialization API
  • benches/: Updates benchmarks to work with new API
  • tests/: Extensive test updates and new container-specific tests
sequenceDiagram
    participant User
    participant Engine
    participant FilterDataContext
    participant FlatBufferMemory
    participant CosmeticCache
    participant ResourceStorage

    User->>Engine: from_rules(rules)
    Engine->>FilterDataContext: new(flatbuffer_memory)
    FilterDataContext->>FlatBufferMemory: create verified memory
    Engine->>CosmeticCache: from_context(filter_data_context)
    Engine->>ResourceStorage: default (in-memory backend)

    User->>Engine: check_network_request(request)
    Engine->>FilterDataContext: access network filters
    FilterDataContext->>FlatBufferMemory: query flatbuffer data
    FlatBufferMemory-->>Engine: filter results

    User->>Engine: serialize()
    Engine->>FilterDataContext: get flatbuffer data
    FilterDataContext-->>Engine: raw flatbuffer bytes
    Engine-->>User: serialized .dat file with integrity hash

    User->>Engine: deserialize(data)
    Engine->>FlatBufferMemory: verify integrity hash
    FlatBufferMemory->>FilterDataContext: create new context
    Engine->>CosmeticCache: rebuild from new context
Loading

flatbuffers::Vector<'a, flatbuffers::ForwardsUOffset<&'a str>>,
>>(StringVector::VT_DATA, None)
.unwrap()
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

flatbuffers::Vector<'a, flatbuffers::ForwardsUOffset<&'a str>>,
>>(CosmeticFilters::VT_SIMPLE_CLASS_RULES, None)
.unwrap()
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

flatbuffers::Vector<'a, flatbuffers::ForwardsUOffset<&'a str>>,
>>(CosmeticFilters::VT_SIMPLE_ID_RULES, None)
.unwrap()
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

flatbuffers::Vector<'a, flatbuffers::ForwardsUOffset<&'a str>>,
>>(CosmeticFilters::VT_MISC_GENERIC_SELECTORS, None)
.unwrap()
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

flatbuffers::Vector<'a, flatbuffers::ForwardsUOffset<&'a str>>,
>>(CosmeticFilters::VT_COMPLEX_CLASS_RULES_INDEX, None)
.unwrap()
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

self._tab.get::<flatbuffers::ForwardsUOffset<
flatbuffers::Vector<'a, flatbuffers::ForwardsUOffset<&'a str>>,
>>(HostnameSpecificRules::VT_UNHIDE, None)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

self._tab.get::<flatbuffers::ForwardsUOffset<
flatbuffers::Vector<'a, flatbuffers::ForwardsUOffset<&'a str>>,
>>(HostnameSpecificRules::VT_UNINJECT_SCRIPT, None)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

self._tab.get::<flatbuffers::ForwardsUOffset<
flatbuffers::Vector<'a, flatbuffers::ForwardsUOffset<&'a str>>,
>>(HostnameSpecificRules::VT_PROCEDURAL_ACTION, None)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

>>(
HostnameSpecificRules::VT_PROCEDURAL_ACTION_EXCEPTION, None
)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

pub(crate) fn filter_list(&self) -> fb::NetworkFilterList<'_> {
unsafe { fb::root_as_network_filter_list_unchecked(self.data()) }
pub(crate) fn root(&self) -> fb::Engine<'_> {
unsafe { fb::root_as_engine_unchecked(self.data()) }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reported by reviewdog 🐶
[semgrep] Detected 'unsafe' usage, please audit for secure usage

Source: https://semgrep.dev/r/rust.lang.security.unsafe-usage.unsafe-usage


Cc @thypon @kdenhartog

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust Benchmark

Benchmark suite Current: dad4934 Previous: 1bfadd6 Ratio
rule-match-browserlike/brave-list 2298565192 ns/iter (± 13731034) 2157327098 ns/iter (± 15833464) 1.07
rule-match-first-request/brave-list 1142959 ns/iter (± 9598) 1067208 ns/iter (± 8765) 1.07
blocker_new/brave-list 157723755 ns/iter (± 1127633) 175128732 ns/iter (± 2762769) 0.90
blocker_new/brave-list-deserialize 24367977 ns/iter (± 501562) 73769476 ns/iter (± 854420) 0.33
memory-usage/brave-list-initial 11287711 ns/iter (± 3) 18344503 ns/iter (± 3) 0.62
memory-usage/brave-list-initial/max 66961277 ns/iter (± 3) 66961277 ns/iter (± 3) 1
memory-usage/brave-list-initial/alloc-count 1635258 ns/iter (± 3) 1616082 ns/iter (± 3) 1.01
memory-usage/brave-list-1000-requests 2562777 ns/iter (± 3) 2551890 ns/iter (± 3) 1.00
memory-usage/brave-list-1000-requests/alloc-count 69244 ns/iter (± 3) 68816 ns/iter (± 3) 1.01
url_cosmetic_resources/brave-list 197905 ns/iter (± 578) 189708 ns/iter (± 2713) 1.04
cosmetic-class-id-match/brave-list 3340829 ns/iter (± 968415) 5166097 ns/iter (± 1523809) 0.65

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Collaborator

@antonok-edm antonok-edm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@antonok-edm antonok-edm merged commit 71a1a48 into master Oct 19, 2025
9 checks passed
@antonok-edm antonok-edm deleted the 0.11.x-merge branch October 19, 2025 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants