A web crawler is a program or automated script which browses the Web in a methodical, automated manner. Many legitimate sites, in particular search engines, use crawling as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine, that will index the downloaded pages to provide fast searches. Crawlers can also be used for automating maintenance tasks on a Web site, such as checking links or validating HTML code.
How to run:
- Compile and run 'crawler.py' with python3
- Enter full site address when prompted or choose from options
- This programme basically puts the url into a set (for no dublication) and recursively calls urls with same domain extracted from html parsing of tags and hrefs.
*Tested on W10 and Ubuntu with Python3