Skip to content
Simple multi threaded tool to extract domain related data from
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Simple multi threaded tool to extract domain related data from

Usage [-h] -d domain -o path [-t THREADS] [-f index1] [-f index2]

necessary arguments:
  -d, --domain   The domain you want to search for in CC data.
  -o, --outfile  The path and filename where you want the results to be saved to.

optional arguments:
  -h, --help     Show help message and exit
  -f, --filter   Use only indices which contain this string
  -t, --threads  Threads for requests


Search for and save to /home/folder/cc/data.txt

python3 -d -o /home/folder/cc/data.txt

Search for in indices which contain "CC-MAIN-2017-09", save to data.txt

python3 -d -o ./data.txt -f CC-MAIN-2017-09

Search for in indices which contain "2013" and "2014", save to data.txt

python3 -d -o ./data.txt -f 2014 -f 2013

Search for using 10 threads, save to data.txt

python3 -d -o ./data.txt -t 10

grep tips

I am no grep expert but I know how to extract data, if you have better solutions for my existing commands OR additional ideas what to search for: PR

  1. Find entries which end with popular file extension indicating dynamic pages etc:
grep -i -E '\.(php|asp|dev|jsp|wsdl|xml|cgi|json|html)$' /home/folder/cc/data.txt
  1. Find interesting files like backups, archives, log files...
grep -i -E '\.(zip|rar|tar|bkp|sql|zip|bz2|gz|txt|bak|conf|log|error|debug|yml|lock|template|tpl)$' /home/folder/cc/data.txt
  1. Find entries which contain popular strings like "admin" etc:
grep -i -E '(admin|account|debug|control|config|upload|system|secret|environment|dashboard)$' /home/folder/cc/data.txt
  1. Find files which begin with "." (htaccess, ...):
grep -i -E '\/\.' /home/folder/cc/data.txt
  1. Find obvious backup files:
grep -i -E '(\.bkp|\.bak|backup|\.dump|\.sql)' /home/folder/cc/data.txt
  1. Extract subdomains:
sed -e 's|^[^/]*//||' -e 's|^www\.||' -e 's|/.*$||' /home/folder/cc/data.txt | grep -v ":" | grep -v "@" | grep -v "?" | grep -v "/" | sort -u
  1. Find urls with parameters in it:
grep -i -E '(\?|\&)(.*?)=((.*?)|)' /home/folder/cc/data.txt | sort -u


  • python3
  • requests
  • argparse
  • json


This project was initially forked from but since I refactored it completely and si9int took another path I decided to create a stand alone project.

You can’t perform that action at this time.