Skip to content
A web crawler in PHP
PHP R
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
networks
unit_test
.gitignore
LIB_db_functions.php
LIB_encoding.php
LIB_exclusion_list.php
LIB_http.php
LIB_parse.php
LIB_resolve_addresses.php
LIB_simple_spider.php
LICENCE.txt
README.md
dumpfiles.php
example_CONFIG_db.php
example_db.sql
example_db_alt.sql
getExternalLinkedHosts.php
getNetworksPerDomain.php
listNodes_graphml.php
listNodes_graphml_04_byDomain02_centralGov.php
listNodes_graphml_forDomain.php
set_status.php
spider.php

README.md

phpWebCralwer

A web crawler in PHP.

Note that an additional file, CONFIG_db.php, is required. This sets the database server, name and password, as well as various other global options. An example file (example_CONFIG_db.php) is included.

TODO:

  • Interface the Public Suffix List, to get correct domains parsed for domains table

Prerequisites

  • PHP
  • MySQL
  • TidyHTML (php5-tidy)
  • CURL (php5-curl)
  • PDO (php5-mysql)
Something went wrong with that request. Please try again.