Skip to content

Optimize isFakeDomain(): Map-based lookup & suffix matching for 1,000x faster validations#130

Open
guplem wants to merge 6 commits into7c:mainfrom
guplem:copilot/improve-filtering-performance
Open

Optimize isFakeDomain(): Map-based lookup & suffix matching for 1,000x faster validations#130
guplem wants to merge 6 commits into7c:mainfrom
guplem:copilot/improve-filtering-performance

Conversation

@guplem
Copy link
Copy Markdown

@guplem guplem commented Jan 23, 2026

The isFakeDomain() function was iterating through all +4,500 domains with regex matching on every lookup, causing ~3.2ms per lookup.

Changes

Core optimization (index.js):

  • Replaced linear iteration with Map for O(1) exact domain matching
  • Implemented suffix-based subdomain detection by iterating domain parts instead of regex
  • Added WeakMap cache to avoid rebuilding lookup structures

Before:

for (let dom of Object.keys(json.domains)) {
    if (dom === domain.toLowerCase().trim()) return dom
    if (domain.search(new RegExp(`.+\\.${dom}`)) === 0) return dom
}

After:

const lookupMap = getLookupStructure(json)  // Cached Map
const normalizedDomain = domain.toLowerCase().trim()

// O(1) exact match
if (lookupMap.has(normalizedDomain)) return normalizedDomain

// O(k) suffix match where k = domain parts
const parts = normalizedDomain.split('.')
for (let i = 1; i < parts.length; i++) {
    const suffix = parts.slice(i).join('.')
    if (lookupMap.has(suffix)) return suffix
}

Performance

Metric Before After Improvement
Algorithm O(n) regex O(1) exact, O(k) suffix
Lookups/sec ~309 ~350,000 1,132x
10k validations 32s 0.03s 1,067x

Testing

Performance test files have been removed, but they can be found here.

Added comprehensive test suites (test-performance.js, test-offline.js, benchmark.js) validating:

  • 700+ assertions covering exact/subdomain/edge cases
  • Backward compatibility (custom JSON support, same API)
  • Dataset validation with all +4,500 domains

Copilot AI and others added 5 commits November 12, 2025 14:30
Co-authored-by: guplem <11029629+guplem@users.noreply.github.com>
Co-authored-by: guplem <11029629+guplem@users.noreply.github.com>
Co-authored-by: guplem <11029629+guplem@users.noreply.github.com>
Deleted PERFORMANCE.md, benchmark.js, test-offline.js, and test-performance.js. This removes performance documentation and related test/benchmark scripts from the repository.
@guplem
Copy link
Copy Markdown
Author

guplem commented Jan 23, 2026

This is tackling the same problem this other PR does.
@7c since at least two people already have worried about it, maybe it should be considered.

@guplem guplem changed the title Copilot/improve filtering performance Optimize isFakeDomain(): Map-based lookup & suffix matching for 1,000x faster validations Jan 23, 2026
@GalacticHypernova
Copy link
Copy Markdown

@guplem you may optimize it further like I did by first making an exact match check:

const normalizedDomain = domain.trim().toLowerCase()
if (json.domains[normalizedDomain]) return normalizedDomain

That way you won't even spend CPU cycles on structure setup if there's an exact match :)

Skip Map/WeakMap cache overhead for exact matches by checking json.domains directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@guplem
Copy link
Copy Markdown
Author

guplem commented Mar 27, 2026

Hey @GalacticHypernova, good call! You're right that a direct property lookup on json.domains can skip the Map/WeakMap machinery entirely for exact matches.

That said, after looking into it a bit more, the practical benefit turns out to be pretty small. Since the WeakMap caches the Map after the first call, subsequent lookups are already just a WeakMap.get() + Map.has() -- both O(1) hash table lookups, just like plain object property access. So we're saving nanoseconds per call on top of the ~1,000x improvement already in place.

Still, it's a clean and harmless optimization, so I went ahead and pushed it. Thanks for the suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants