Skip to content

Cryptoc1/earl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

earl

Looking for URLs in your area.

Language Checks Coverage Version

Earl is a suite of APIs for developing url crawlers & web scrapers driven by a middleware pattern similar to, and strongly influenced by, ASP.NET Core.

Basic Usage

var services = new ServiceCollection()
    .AddEarlCrawler()
    .AddEarlJsonPersistence()
    .BuildServiceProvider();

var crawler = services.GetService<IEarlCrawler>();
var options = CrawlerOptionsBuilder.CreateDefault()
    .BatchSize( 50 )
    .MaxRequestCount( 500 )
    .On<CrawlUrlResultEvent>( 
        ( CrawlUrlResultEvent e, CancellationToken cancellation ) =>
        {
            Console.WriteLine( $"Crawled {e.Result.Url}" );
            return default;
        }
    )
    .Timeout( TimeSpan.FromMinutes( 30 ) )
    .Use(
        ( CrawlUrlContext context, CrawlUrlDelegate next ) =>
        {
            Console.WriteLine( $"Executing delegate middleware while crawling {context.Url}" );
            return next( context );
        }
    )
    .PersistTo( persist => persist.ToJson( json => json.Destination(...) ) )
    .Build();

await crawler.CrawlAsync( new Uri(...), options );

Documentation

Documentation can be find within the READMEs of the sub-directories representing the conceptual components of Earl:

All public APIs should contain thorough XML (triple slash) comments.

Something missing, still have questions? Please open an Issue or submit a PR!

About

Earl is looking for URLs in your area.

Topics

Resources

License

Stars

Watchers

Forks

Languages