Crawler is a bare-bones spider designed to quickly and effectively build an index of all files and pages on a given Web site as well as the link relationship (both incoming and outgoing) between each page.
PHP
Failed to load latest commit information.
includes
README.txt Add ability to store html files in database Jun 3, 2012
TODO.txt Add TODO task. Add crawl_tag to export.php. Make link of last crawled… Jun 3, 2012
browse.php initial commit Feb 22, 2011
config.php Add exclude functionality to filter links based on array of patterns Jun 4, 2012
crawl.php
create-tables.sql Add index on field crawl_tag. Jun 3, 2012
export.php Add TODO task. Add crawl_tag to export.php. Make link of last crawled… Jun 4, 2012
query.php Add crawl_tag to query results. Jun 4, 2012
sitemap.php Put <?xml tag into echo statement to avoid error when using php short… Jun 3, 2012
stats.php

README.txt

TO USE:

1. Edit config.PHP with appropriate database and domain information
2. (for now) in phpMyAdmin insert the seed URL into the urls table.
	* URL should be something like: www.fcc.gov
	* URL should have a trailing slash
	* (for now) May also want to set clicks to '0' to avoid problems 
3. Open crawler.php
4. (optional) open stats.php to watch progress

TIPS:
	Changes to php.ini
		1. Increase memory limit (1GB)
		2. Remove execution time limit
	Changes to mysql.ini
		* Increased max query size (to avoid "mysql went away" error)

Additional documentation (source code) in (/source)