Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Welcome to the storm-crawler wiki!
- Configuration: how to configure the storm-crawler
Registering Metadata for Serialization: If your topology doesn't extend
ConfigurableTopology, you will need to manually register storm-crawler's
Metadataclass for serialization in Storm.
- Debug with Eclipse
- Protocols: Network protocols that are usable in storm-crawler
- JSoupParserBolt: parse HTML documents
- SiteMapParserBolt: how to handle sitemaps
- URLFilters: how to filter or normalise outlinks
- ParseFilters: extract metadata from documents
Clone this wiki locally
Press h to open a hovercard with more details.