This python module does the aggregation of several ads/tracking/malware lists, and merges them into a unified list with duplicates removed. Create your own list from several sources.
See the blocklist-domains repository for an implementation.
Default sources are defined on the configuration file
If you want to generate your own unified blocklist, install this module with the pip command.
pip install blocklist_aggregator
This basic example enable to get a unified list of domains. You can save-it in a file or do what you want.
import blocklist_aggregator
unified = blocklist_aggregator.fetch()
print(unified)
[ "doubleclick.net", ..., "telemetry.dropbox.com" ]
print(len(unified))
152978
See the default configuration file
The configuration contains:
- the ads/tracking/malware URL lists with the pattern (regex) to use
- the domains list to exclude (whitelist)
- additionnal domains list to block (blacklist)
The configuration can be overwritten at runtime.
cfg_yaml = "verbose: true"
unified = blocklist_aggregator.fetch(cfg_update=cfg_yaml)
or loaded from external config file
unified = blocklist_aggregator.fetch(cfg_filename="/home/custom-blocklist.conf")
This module can be used to export the list in several format:
- text
- hosts
- CDB (key/value database)
import blocklist_aggregator
# fetch domains
unified = blocklist_aggregator.fetch()
# save to a text file
blocklist_aggregator.save_raw(filename="/tmp/unified_list.txt")
# save to hosts file
blocklist_aggregator.save_hosts(filename="/tmp/unified_hosts.txt", ip="0.0.0.0")
# save to CDB
blocklist_aggregator.save_cdb(filename="/tmp/unified_domains.cdb")
Run test units
python3 -m unittest discover tests/ -v