Skip to content

Leakycom/Sitemap-Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

	AUTHOR: Darren Nix
	Version: 0.1
	Date:	2011-9-7
	Site: www.darrennix.com
	License: Apache 2.0

	Crawls a site to find unique page URLs and returns them as a list.
	Ignores query strings, badly formed URLs, and links to domains
	outside of the starting domain.
	
	Inspired by sitemap_gen from Valdimir Toncar

	DEPENDENCIES:
	BeautifulSoup HTML parsing library

About

Crawls a site to find every unique page URL. In Python & Django.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages