Skip to content

asteroid-99942/apophis-blocklist

Repository files navigation

📛 Apophis Blocklist Blaster
A fast, resilient, and highly accurate blocklist aggregation engine designed for Pi‑hole, AdGuard Home, DNS sinkholes, and security tooling. It merges dozens of third‑party blocklists, extracts valid domains from mixed formats, applies allowlists, and produces clean, deduplicated output.

While the focus is malware, scams / fraud and phishing, third party blocklists may also incorporate advertising blocking. These are generally ad programs / hosts which have been previously abused to deliver malware. These can be legitimate services that due to past abuse and potential for future abuse are deemed by list managers to be worth blocking as a precaution. There may also be instances of advertising / telemetry blocking where a list manager feels the advertising platform is over intrusive on a user's privacy or is not transparent enough or data collection is noneconsensual. General adblocking is not supported since controlling ad blocking locally is often better to ensure websites function correctly.

At the time of writing the list has over 3 million entries compiled from third party blocklists. A big thank you to all those contributors who share their lists with the community.

In PiHole add the following URL to your list manager. https://raw.githubusercontent.com/asteroid-99942/apophis-blocklist/main/output/hosts.txt

Note: This is an A.I written project, which has been developed with human input on functions and script behaviour. The intention was to dedup and consolidate malware blocklists into one efficient list.

✨ Features -Hybrid TLD Validation
Accepts domains if:

  • The Public Suffix List (PSL) recognises the TLD or
  • The TLD is alphabetic and 2–10 characters This captures malware domains using fake or newly‑created TLDs while still rejecting garbage.

-Advanced domain extraction
Extracts domains from:

  • Bare domain lists
  • Hosts files (0.0.0.0 example.com)
  • URLs anywhere in a line
  • Adblock syntax (||domain^)
  • Wildcards (*.example.com)
  • Multi‑domain lines
  • IDNA / punycode

-Resilient downloading

  • ETag + Last‑Modified caching
  • Exponential backoff
  • Automatic fallback to cached content
  • HTML/JSON error‑page detection

-Parallel processing

  • Downloads and parses lists concurrently for speed.

-Diff reporting
Generates a daily report showing:

  • Added domains
  • Removed domains
  • Total domain count

-Clean output
Produces:

  • blocklist.txt
  • allowlist.txt
  • regexlist.txt
  • diff_report.txt
  • blocklist_previous.txt

📦 Requirements -Python 3.10+ -Dependencies:

  • requests
  • idna
  • publicsuffix2
  • tomli (Python <3.11)

Install dependencies:

bash
pip install requests idna publicsuffix2 tomli

📁 Directory Structure Code . ├── blocklistblaster.py
├── blocklistblaster.toml
├── data/
│ └── public_suffix_list.dat
├── cache/
│ └── metadata.json (auto‑generated)
└── lists/
├── blocklist.txt
├── allowlist.txt
├── regexlist.txt
├── diff_report.txt
└── blocklist_previous.txt

⚙️ Configuration (TOML) Your configuration file (blocklistblaster.toml) controls which lists are merged.

Example:

toml
[lists]
block = [
    "https://example.com/block1.txt",
    "https://example.com/block2.txt"
]

allow = [
    "https://example.com/allowlist.txt"
]

regex = [
    "https://example.com/regexlist.txt"
]

[output]
block = "lists/blocklist.txt"
allow = "lists/allowlist.txt"
regex = "lists/regexlist.txt"

🧠 Hybrid TLD Validation (Why It Matters) Traditional PSL‑only validation rejects:

  • Malware domains using fake TLDs
  • Newly created TLDs not yet in the PSL
  • Internal botnet C2 domains
  • Typosquatted TLDs

Hybrid mode fixes this by allowing:

  • Any PSL‑valid domain
  • Any domain with a TLD matching: -- Alphabetic -- 2–10 characters

This means domains like:

||0-4-zoll-in-cm.klimafuechse.de^

are correctly extracted as:

0-4-zoll-in-cm.klimafuechse.de

🔍 How Domain Extraction Works The extractor handles:

URLs
https://malicious.com/path → malicious.com

Hosts format
0.0.0.0 badsite.net → badsite.net

Adblock syntax
||phishingsite.org^ → phishingsite.org

Wildcards
*.tracker.com → tracker.com

Multiple domains per line
0.0.0.0 a.com b.net c.org → all three extracted

IDNA / punycode
xn--bcher-kva.example → normalised

🚀 Running the Script

bash
python3 blocklistblaster.py -c blocklistblaster.toml

Optional:

bash
python3 blocklistblaster.py --max-workers 16

📤 Output Files File | Description
lists/blocklist.txt | Final merged blocklist
lists/allowlist.txt | Combined allowlist
lists/regexlist.txt | Regex rules from sources
lists/diff_report.txt | Added/removed domains since last run
lists/blocklist_previous.txt | Snapshot of previous blocklist

🤖 GitHub Actions Integration This script is CI‑friendly:

  • Automatically creates missing directories
  • Uses safe atomic writes
  • Handles cold‑start cache states
  • Works reliably on ephemeral runners

🛠️ Troubleshooting Cache errors If you ever see:

cache/metadata.json is corrupted

Just delete the file — it will be regenerated automatically.

PSL missing Ensure:

data/public_suffix_list.dat

exists and is up to date.

About

Apophis Blocklist is a fast, standards‑driven domain filtering engine for Pi‑hole. It merges multiple threat‑intelligence feeds, normalises domains using IDNA and the Public Suffix List, applies allowlists, and outputs a clean, deduplicated blocklist. The focus is malware and other nefarious threats.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages