Skip to content

dmachard/python-blocklist-aggregator

Repository files navigation

Testing Build Publish

Blocklist aggregator

This python module does the aggregation of several ads/tracking/malware lists, and merges them into a unified list with duplicates removed. Create your own list from several sources.

See the blocklist-domains repository for an implementation.

Default sources are defined on the configuration file

Table of contents

Installation

python 3.12.x python 3.11.x python 3.10.x python 3.9.x python 3.8.x

If you want to generate your own unified blocklist, install this module with the pip command.

pip install blocklist_aggregator

Get started

This basic example enable to get a unified list of domains. You can save-it in a file or do what you want.

import blocklist_aggregator

unified = blocklist_aggregator.fetch()
print(unified)
[ "doubleclick.net", ..., "telemetry.dropbox.com" ]

print(len(unified))
152978

Custom configuration

See the default configuration file

The configuration contains:

  • the ads/tracking/malware URL lists with the pattern (regex) to use
  • the domains list to exclude (whitelist)
  • additionnal domains list to block (blacklist)

The configuration can be overwritten at runtime.

cfg_yaml = "verbose: true"
unified = blocklist_aggregator.fetch(cfg_update=cfg_yaml)

or loaded from external config file

unified = blocklist_aggregator.fetch(cfg_filename="/home/custom-blocklist.conf")

Fetch and save-it to files

This module can be used to export the list in several format:

  • text
  • hosts
  • CDB (key/value database)
import blocklist_aggregator

# fetch domains
unified = blocklist_aggregator.fetch()

# save to a text file
blocklist_aggregator.save_raw(filename="/tmp/unified_list.txt")

# save to hosts file
blocklist_aggregator.save_hosts(filename="/tmp/unified_hosts.txt", ip="0.0.0.0")

# save to CDB
blocklist_aggregator.save_cdb(filename="/tmp/unified_domains.cdb")

For developpers

Run test units

python3 -m unittest discover tests/ -v