GitHub - aishwarya4444/WebCrawler: Python script for implementng a Web Crawler.

PROBLEM STATEMENT

Write a web crawler.

A crawler is a program that starts with a url on the web (ex: http://python.org), fetches the web-page corresponding to that url, and parses all the links on that page into a repository of links. Next, it fetches the contents of any of the url from the repository just created, parses the links from this new content into the repository and continues this process for all links in the repository until stopped or after a given number of links are fetched.

RUNNING THE SCRIPT

progName URL numberOfUrlsToBeCrawled

./webCrawler.py http://python.org/pypi 4

ERRORS

Unable to crawl for "http://stackoverflow.com" and "http://python.org"

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
webCrawler.py		webCrawler.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PROBLEM STATEMENT

RUNNING THE SCRIPT

ERRORS

About

Uh oh!

Releases

Packages

Languages

aishwarya4444/WebCrawler

Folders and files

Latest commit

History

Repository files navigation

PROBLEM STATEMENT

RUNNING THE SCRIPT

ERRORS

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages