Skip to content

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.

License

Norconex/crawlers

Repository files navigation

Norconex Crawlers

Norconex web and filesystem crawlers are full-featured crawlers (or spider) that can manipulate and store collected data in a repository of your choice (e.g., a search engine). They are very flexible, powerful, easy to extend, and portable. They can be used command-line with file-based configuration on any OS or embedded into Java applications using well-documented APIs.

Visit the website for binary downloads and documentation: https://opensource.norconex.com/crawlers/

Are you on the right branch?

This branch holds version 4 code, which is still in development.

For the latest stable release of Norconex Web Crawler, use the version 3 branch.

UPCOMING: Crawler V4 Stack

As of Feb 24, 2024, the default main branch holds code for the upcoming version 4 crawler stack. It is now a mono-repo containing all Norconex crawler-related projects previously maintained in their own repos. All projects in this mono report will now be released simultaneously and share the same version number.

Until v4 is officially released, this branch should not be considered stable.

Projects

Java CI with Maven

Folder Artifact Id Build
crawler/core/ nx-crawler-core test Quality Gate Status
crawler/fs/ nx-crawler-fs Quality Gate Status
crawler/web/ nx-crawler-web Quality Gate Status
importer/ nx-importer Quality Gate Status
committer/amazoncloudsearch/ nx-committer-amazoncloudsearch Quality Gate Status
committer/apachekafka/ nx-committer-apachekafka Quality Gate Status
committer/azurecognitivesearch/ nx-committer-azurecognitivesearch Quality Gate Status
committer/core/ nx-committer-core Quality Gate Status
committer/idol/ nx-committer-idol Quality Gate Status
committer/elasticsearch/ nx-committer-elasticsearch Quality Gate Status
committer/neo4j/ nx-committer-neo4j Quality Gate Status
committer/solr/ nx-committer-solr Quality Gate Status
committer/sql/ nx-committer-sql Quality Gate Status

All projects in this repository share the same Maven group id:

com.norconex.crawler

About

Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •  

Languages