Skip to content

Conversation

@atuchin-m
Copy link
Collaborator

@atuchin-m atuchin-m commented Oct 14, 2025

The PR fixes cosmetic filtering performance that regressed after refactoring cosmetic filters to flatbuffers (binary search is slower than hash set/map)
That's a dealbreaker for rolling 0.11.x.

The PR:

  • Introduce new structures FlatHash{Set,Map} that uses open hashing approach. That provide a good balance between performance and memory usage.
  • Hasher is used rustc-hash (the same version used in Chromium)
  • Add new traits to reuse the current code.
  • Memory usage is increased (for 0.11.x: 10 MB => 10.5MB, looks acceptable)

TODO: add unit tests (if the rest looks fine). Done

@atuchin-m atuchin-m self-assigned this Oct 14, 2025
@atuchin-m atuchin-m force-pushed the flat-hash-structures-4 branch 2 times, most recently from 89727d6 to 2778c65 Compare October 14, 2025 22:06
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust Benchmark

Benchmark suite Current: 388c759 Previous: 1bfadd6 Ratio
rule-match-browserlike/brave-list 2236581787 ns/iter (± 11322539) 2157327098 ns/iter (± 15833464) 1.04
rule-match-first-request/brave-list 1133621 ns/iter (± 8618) 1067208 ns/iter (± 8765) 1.06
blocker_new/brave-list 152545779 ns/iter (± 632857) 175128732 ns/iter (± 2762769) 0.87
blocker_new/brave-list-deserialize 21994108 ns/iter (± 64316) 73769476 ns/iter (± 854420) 0.30
memory-usage/brave-list-initial 10508747 ns/iter (± 3) 18344503 ns/iter (± 3) 0.57
memory-usage/brave-list-initial/max 64817658 ns/iter (± 3) 66961277 ns/iter (± 3) 0.97
memory-usage/brave-list-initial/alloc-count 1534842 ns/iter (± 3) 1616082 ns/iter (± 3) 0.95
memory-usage/brave-list-1000-requests 2516487 ns/iter (± 3) 2551890 ns/iter (± 3) 0.99
memory-usage/brave-list-1000-requests/alloc-count 66788 ns/iter (± 3) 68816 ns/iter (± 3) 0.97
url_cosmetic_resources/brave-list 202280 ns/iter (± 2208) 189708 ns/iter (± 2713) 1.07
cosmetic-class-id-match/brave-list 3438051 ns/iter (± 949610) 5166097 ns/iter (± 1523809) 0.67

This comment was automatically generated by workflow using github-action-benchmark.

@atuchin-m atuchin-m requested a review from boocmp October 14, 2025 22:10
@atuchin-m atuchin-m force-pushed the flat-hash-structures-4 branch from 2778c65 to 12f07cb Compare October 14, 2025 23:10
@atuchin-m atuchin-m marked this pull request as ready for review October 14, 2025 23:16
@atuchin-m atuchin-m requested a review from Copilot October 15, 2025 19:58
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses a performance regression in cosmetic filtering that occurred after refactoring to use flatbuffers. The change replaces binary search-based lookups with hash-based lookups to restore performance while maintaining acceptable memory usage (increasing from 10 MB to 10.5 MB).

Key Changes:

  • Introduces FlatHashSet and FlatHashMap structures using open hashing with rustc-hash (FxHasher) for improved lookup performance
  • Replaces binary search containers with hash-based containers for cosmetic filter rules
  • Updates data format version and serialization hash expectations
  • Adds comprehensive unit tests for the new hash-based containers

Reviewed Changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/flatbuffers/containers/hash_index.rs Core hash table implementation with open addressing and 25-50% load factor
src/flatbuffers/containers/hash_set.rs HashSet wrapper around HashIndex for set operations
src/flatbuffers/containers/hash_map.rs HashMap wrapper around HashIndex for key-value storage
src/flatbuffers/containers/fb_index.rs New trait FbIndex for unified indexed data access
src/flatbuffers/containers/sorted_index.rs Refactored to use FbIndex trait instead of duplicating methods
src/cosmetic_filter_cache_builder.rs Updated to use hash-based containers and StringVector wrapper
src/cosmetic_filter_cache.rs Updated to use HashSetView and HashMapStringView for lookups
src/flatbuffers/fb_network_filter.fbs Added StringVector table definition
src/flatbuffers/fb_network_filter_generated.rs Generated code for StringVector support
src/data_format/mod.rs Renamed constant for clarity (ADBLOCK_FLATBUFFER_VERSION → ADBLOCK_RUST_DAT_VERSION)
tests/unit/engine.rs Updated expected serialization hashes for new format
tests/unit/flatbuffers/containers/hash_set.rs Unit tests for HashSet functionality
tests/unit/flatbuffers/containers/hash_map.rs Unit tests for HashMap functionality
Cargo.toml Added rustc-hash dependency

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Collaborator

@antonok-edm antonok-edm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just the version documentation I mentioned above

@atuchin-m atuchin-m force-pushed the flat-hash-structures-4 branch from f8e360f to a567b94 Compare October 16, 2025 19:19
@atuchin-m atuchin-m merged commit 2842d68 into 0.11.x Oct 16, 2025
7 checks passed
@atuchin-m atuchin-m deleted the flat-hash-structures-4 branch October 16, 2025 20:01
@atuchin-m atuchin-m added the perf label Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants