Skip to content
This repository has been archived by the owner on Apr 19, 2023. It is now read-only.

Latest commit

 

History

History
38 lines (27 loc) · 971 Bytes

README.md

File metadata and controls

38 lines (27 loc) · 971 Bytes

crawlr

A web spider, written in Go, that scans your site looking for pages that are missing particular string patterns.

Configuration

The configuration file is passed to the spider as its first command-line parameter. The file should contain a JSON object with the following keys:

  • startUrl
  • dropHttps
  • allowedDomains
  • rewriteDomains
  • filteredUrls
  • droppedParameters
  • requiredPatterns

An example JSON configuration file is included in the repo.

Options

You can also change certain operational details of the spider through the use of command-line parameters:

  • -h : print list of accepted parameters
  • -d : how deep to spider the site (default:2)
  • -n : maximum concurrent requests (default:1)
  • -s : show live stats (default:false)
  • -v : verbose output (default:false)

Output

As the spider progresses, it will print out the URL's of pages that didn't contain the required patterns, as well as the list of patterns that weren't found.