Skip to content
A simple but powerful web crawler library in C#
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src Minor version bump (0.2.0) Mar 15, 2019
tests Added workaround for benchmarking ops issue Jan 3, 2019
.appveyor.yml
.codecov.yml
.editorconfig Initial project & solution files Dec 29, 2018
.gitignore Initial commit Dec 28, 2018
InfinityCrawler.sln
LICENSE Initial commit Dec 28, 2018
README.md Added Nuget badge Jan 1, 2019

README.md

Infinity Crawler

A simple but powerful web crawler library in C#

AppVeyor Codecov NuGet

Features

  • Obeys robots.txt (crawl delay & allow/disallow)
  • Uses sitemap.xml to seed the initial crawl of the site
  • Built around a parllel task async/await system
  • Auto-throttling (see below)

Polite Crawling

The crawler is built around fast but "polite" crawling of website. This is accomplished through a number of settings that allow adjustments of delays and throttles.

You can control:

  • Number of simulatenous requests
  • The delay between requests starting (Note: If a crawl-delay is defined for the User-agent, that will be the minimum)
  • Artificial "jitter" in request delays (requests seem less "robotic")
  • Timeout for a request before throttling will apply for new requests
  • Throttling request backoff: The amount of time added to the delay to throttle requests (this is cumulative)
  • Minimum number of requests under the throttle timeout before the throttle is gradually removed
You can’t perform that action at this time.