Skip to content

Latest commit

 

History

History
13 lines (10 loc) · 383 Bytes

README.md

File metadata and controls

13 lines (10 loc) · 383 Bytes

mcn-source-ct

Part of my MCN (make clean no)-project.

Scripts for downloading and extracting .no domains from the data of the commoncrawl.org project.

Howto:

  • git submodule init
  • git submodule update
  • sudo apt install python-bs4 parallel
  • ./get-indexes.sh
  • ./verify-indexes.sh
  • ./list_domains.sh