Skip to content

Conversation

@atuchin-m
Copy link
Collaborator

@atuchin-m atuchin-m commented Nov 6, 2025

The PR tokenizes some common regex filters like ||dlscord.*$all (with token dlscord). This reduces the number of filers that always checked (stored with token = 0) that improves the filter matching performance.

-13% on my machine:

rule-match-browserlike/brave-list
master:  time:    [1.5573 s 1.5734 s 1.5927 s]
this PR:  time:   [1.3547 s 1.3618 s 1.3694 s]

@atuchin-m atuchin-m self-assigned this Nov 6, 2025
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust Benchmark

Benchmark suite Current: 8684dbc Previous: 33c03d9 Ratio
rule-match-browserlike/brave-list 1963619323 ns/iter (± 12330527) 2083435514 ns/iter (± 5325633) 0.94
rule-match-first-request/brave-list 1118875 ns/iter (± 10421) 1125686 ns/iter (± 7090) 0.99
blocker_new/brave-list 174628523 ns/iter (± 1212720) 182730408 ns/iter (± 2195358) 0.96
blocker_new/brave-list-deserialize 24469477 ns/iter (± 1527515) 22093509 ns/iter (± 288739) 1.11
memory-usage/brave-list-initial 10212216 ns/iter (± 3) 10212224 ns/iter (± 3) 1.00
memory-usage/brave-list-initial/max 62258783 ns/iter (± 3) 62256231 ns/iter (± 3) 1.00
memory-usage/brave-list-initial/alloc-count 1362674 ns/iter (± 3) 1362293 ns/iter (± 3) 1.00
memory-usage/brave-list-1000-requests 2666974 ns/iter (± 3) 2666926 ns/iter (± 3) 1.00
memory-usage/brave-list-1000-requests/alloc-count 71393 ns/iter (± 3) 71337 ns/iter (± 3) 1.00
url_cosmetic_resources/brave-list 190111 ns/iter (± 716) 197629 ns/iter (± 523) 0.96
cosmetic-class-id-match/brave-list 3333567 ns/iter (± 920199) 3419655 ns/iter (± 911093) 0.97

This comment was automatically generated by workflow using github-action-benchmark.

@atuchin-m atuchin-m force-pushed the tokenize-some-regexp-patterns branch 3 times, most recently from 3e88a7d to d7ef08d Compare November 10, 2025 10:05
github-actions[bot]

This comment was marked as resolved.

@atuchin-m atuchin-m force-pushed the tokenize-some-regexp-patterns branch from d7ef08d to 8684dbc Compare November 10, 2025 10:11
@atuchin-m atuchin-m marked this pull request as ready for review November 10, 2025 10:14
@atuchin-m atuchin-m requested a review from a team as a code owner November 10, 2025 10:14
@atuchin-m atuchin-m changed the title tokenize some regexp patterns [perf] tokenize some regexp patterns Nov 10, 2025
}
let expected_hash: u64 = if cfg!(feature = "css-validation") {
9439492009815519037
15545091389304905433
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to update ADBLOCK_RUST_DAT_VERSION, because the matching logic isn't changed. Tt's only a optimization how we store rules. The current .dat files can be used as is.

@atuchin-m atuchin-m added the perf label Nov 10, 2025
@atuchin-m atuchin-m merged commit 04a095e into master Nov 10, 2025
8 of 9 checks passed
@atuchin-m atuchin-m deleted the tokenize-some-regexp-patterns branch November 10, 2025 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants