HyperSeek

A powerful and efficient web crawling library written in Rust that allows you to crawl web pages, extract URLs, and traverse the web with ease.

Features

The main purpose of HyperSeek is to enable fast and efficient search engine indexing. Other feaures include:

Crawling Metadata: Collect and store additional metadata about crawled pages, such as HTTP response status codes, content type that mostly HTML for now with plans for other content types in the future, last-modified timestamps, or page sizes. This information can be useful for analysis and understanding the crawled websites.
Politeness and Respectful Crawling: Implement mechanisms to respect website policies such as robots.txt files and crawl delay. This ensures that your crawler behaves respectfully and avoids overloading servers with excessive requests.
Parallel Processing: Enable concurrent processing of multiple URLs to improve crawling speed and efficiency. Utilize Rust's concurrency primitives to efficiently distribute crawling tasks across multiple threads or even across multiple machines.
Customizable Crawling Rules: Allow users to define custom rules for crawling, such as specifying which domains to crawl, setting depth limits, or filtering URLs based on patterns or criteria.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.husky		.husky
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md
commitlint.config.js		commitlint.config.js
conventionalCommit.json		conventionalCommit.json
package.json		package.json
yarn.lock		yarn.lock