Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Crawler is a bare-bones spider designed to quickly and effectively build an index of all files and pages on a given Web site as well as the link relationship (both incoming and outgoing) between each page.

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 includes
Octocat-spinner-32 README.txt
Octocat-spinner-32 TODO.txt
Octocat-spinner-32 browse.php
Octocat-spinner-32 config.php
Octocat-spinner-32 crawl.php
Octocat-spinner-32 create-tables.sql
Octocat-spinner-32 export.php
Octocat-spinner-32 query.php
Octocat-spinner-32 sitemap.php
Octocat-spinner-32 stats.php
README.txt
TO USE:

1. Edit config.PHP with appropriate database and domain information
2. (for now) in phpMyAdmin insert the seed URL into the urls table.
	* URL should be something like: www.fcc.gov
	* URL should have a trailing slash
	* (for now) May also want to set clicks to '0' to avoid problems 
3. Open crawler.php
4. (optional) open stats.php to watch progress

TIPS:
	Changes to php.ini
		1. Increase memory limit (1GB)
		2. Remove execution time limit
	Changes to mysql.ini
		* Increased max query size (to avoid "mysql went away" error)

Additional documentation (source code) in (/source)
Something went wrong with that request. Please try again.