Skip to content
Distributed, asynchronous web crawler
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
gradle/wrapper
scripts
widow-analyze
widow-core
widow-fetch
widow-index
widow-parse
.gitignore
LICENSE
README.md
build.gradle
gradlew
gradlew.bat
high_level_architecture.svg
settings.gradle
setup_commands

README.md

Widow - the extensible crawler for your website

Widow is meant to be a crawler to index only the domains you specify. Instead of crawling the entire world, Widow will crawl your website to create your own search metadata. From this, you can see the average page load time, asset size, etc.

Widow has several parts:

  • The Core, which contains machinery to pull messages and process them in a multi-threaded environment
  • The Fetcher, which pulls pages down from the internet
  • The Parser, which parses pages in an extensible way
  • The Indexer, which pushes metadata into a search index
  • The Analyzer, which gives interesting data about the pages fetched
You can’t perform that action at this time.