Part of my MCN (make clean no)-project.
Scripts for downloading and extracting .no domains from the data of the commoncrawl.org project.
Howto:
- git submodule init
- git submodule update
- sudo apt install python-bs4 parallel
- ./get-indexes.sh
- ./verify-indexes.sh
- ./list_domains.sh