GitHub - azanbinzahid/web-crawler: An automated script which browses the Web in a methodical, automated manner.

A web crawler is a program or automated script which browses the Web in a methodical, automated manner. Many legitimate sites, in particular search engines, use crawling as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine, that will index the downloaded pages to provide fast searches. Crawlers can also be used for automating maintenance tasks on a Web site, such as checking links or validating HTML code.

How to run:

Compile and run 'crawler.py' with python3
Enter full site address when prompted or choose from options
This programme basically puts the url into a set (for no dublication) and recursively calls urls with same domain extracted from html parsing of tags and hrefs.

*Tested on W10 and Ubuntu with Python3

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
crawler.py		crawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

azanbinzahid/web-crawler

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages