Crawler for a specific web site and shows the site map with directory tree from terminal and a html file. Also save urls in a csv file.
First clone project:
git clone https://github.com/berkayberkman/crawler.git
Create and open a virtual environment:
python3 -m venv scraper
source scraper/bin/activate
Install the dependicies:
pip install -r requirements.txt
sudo apt-get install tree
Go to the script
directory:
cd script
Finally run the script via:
python multithread_url_scraper.py
Crawling takes less than a minute. It will depends with your internet connection.