Skip to content
Crawler is a bare-bones spider designed to quickly and effectively build an index of all files and pages on a given Web site as well as the link relationship (both incoming and outgoing) between each page.
PHP
Find file
Failed to load latest commit information.
includes
README.txt
TODO.txt Add TODO task. Add crawl_tag to export.php. Make link of last crawled… Jun 3, 2012
browse.php
config.php
crawl.php
create-tables.sql
export.php
query.php
sitemap.php
stats.php

README.txt

TO USE:

1. Edit config.PHP with appropriate database and domain information
2. (for now) in phpMyAdmin insert the seed URL into the urls table.
	* URL should be something like: www.fcc.gov
	* URL should have a trailing slash
	* (for now) May also want to set clicks to '0' to avoid problems 
3. Open crawler.php
4. (optional) open stats.php to watch progress

TIPS:
	Changes to php.ini
		1. Increase memory limit (1GB)
		2. Remove execution time limit
	Changes to mysql.ini
		* Increased max query size (to avoid "mysql went away" error)

Additional documentation (source code) in (/source)
Something went wrong with that request. Please try again.