Skip to content
Any::Moose wrapper for queued downloads via Net::Curl & AnyEvent
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



AnyEvent::Net::Curl::Queued - Any::Moose wrapper for queued downloads via Net::Curl & AnyEvent


version 0.025


    #!/usr/bin/env perl

    package CrawlApache;
    use strict;
    use utf8;
    use warnings qw(all);

    use HTML::LinkExtor;
    use Any::Moose;

    extends 'AnyEvent::Net::Curl::Queued::Easy';

    after finish => sub {
        my ($self, $result) = @_;

        say $result . "\t" . $self->final_url;

        if (
            not $self->has_error
            and $self->getinfo('content_type') =~ m{^text/html}
        ) {
            my @links;

            HTML::LinkExtor->new(sub {
                my ($tag, %links) = @_;
                push @links,
                    grep { $_->scheme eq 'http' and $_->host eq 'localhost' }
                    values %links;
            }, $self->final_url)->parse(${$self->data});

            for my $link (@links) {
                $self->queue->prepend(sub {
                    CrawlApache->new({ initial_url => $link });

    no Any::Moose;


    package main;
    use strict;
    use utf8;
    use warnings qw(all);

    use AnyEvent::Net::Curl::Queued;

    my $q = AnyEvent::Net::Curl::Queued->new;
    $q->append(sub {
        CrawlApache->new({ initial_url => 'http://localhost/manual/' })


Efficient and flexible batch downloader with a straight-forward interface:

  • create a queue;
  • append/prepend URLs;
  • wait for downloads to end (retry on errors).

Download init/finish/error handling is defined through Moose's method modifiers.


I am very unhappy with the performance of LWP. It's almost perfect for properly handling HTTP headers, cookies & stuff, but it comes at the cost of speed. While this doesn't matter when you make single downloads, batch downloading becomes a real pain.

When I download large batch of documents, I don't care about cookies or headers, only content and proper redirection matters. And, as it is clearly an I/O bottleneck operation, I want to make as many parallel requests as possible.

So, this is what CPAN offers to fulfill my needs:

AnyEvent::Net::Curl::Queued is a glue module to wrap it all together. It offers no callbacks and (almost) no default handlers. It's up to you to extend the base class AnyEvent::Net::Curl::Queued::Easy so it will actually download something and store it somewhere.


As there's more than one way to do it, I'll list the alternatives which can be used to implement batch downloads:

  • WWW::Mechanize: no (builtin) parallelism, no (builtin) queueing. Slow, but very powerful for site traversal;
  • LWP::UserAgent: no parallelism, no queueing. WWW::Mechanize is built on top of LWP, by the way;
  • LWP::Curl: LWP::UserAgent-alike interface for WWW::Curl. No parallelism, no queueing. Fast and simple to use;
  • HTTP::Tiny: no parallelism, no queueing. Fast and part of CORE since Perl v5.13.9;
  • HTTP::Lite: no parallelism, no queueing. Also fast;
  • Furl: no parallelism, no queueing. Very fast;
  • AnyEvent::Curl::Multi: queued parallel downloads via WWW::Curl. Queues are non-lazy, thus large ones can use many RAM;
  • Parallel::Downloader: queued parallel downloads via AnyEvent::HTTP. Very fast and is pure-Perl (compiling event driver is optional). You only access results when the whole batch is done; so huge batches will require lots of RAM to store contents.


Obviously, the bottleneck of any kind of download agent is the connection itself. However, socket handling and header parsing add a lots of overhead.

The script eg/ compares AnyEvent::Net::Curl::Queued against several other download agents. Only AnyEvent::Net::Curl::Queued itself, AnyEvent::Curl::Multi, Parallel::Downloader and lftp support parallel connections natively; thus, Parallel::ForkManager is used to reproduce the same behaviour for the remaining agents. Both AnyEvent::Curl::Multi and LWP::Curl are frontends for WWW::Curl. Parallel::Downloader uses AnyEvent::HTTP as it's backend.

The download target is a copy of the Apache documentation on a local Apache server. The test platform configuration:

  • Intel® Core™ i7-2600 CPU @ 3.40GHz with 8 GB RAM;
  • Ubuntu 11.10 (64-bit);
  • Perl v5.16.2 (installed via perlbrew);
  • libcurl 7.26.0 (without AsynchDNS, which slows down curl_easy_init()).
                              Request rate   W::M    LWP  AE::C::M  H::Lite  H::Tiny  P::D  YADA  lftp  Furl  wget  curl  L::Curl
    WWW::Mechanize v1.72             265/s     --   -61%      -86%     -86%     -87%  -90%  -91%  -91%  -95%  -96%  -97%     -97%
    LWP::UserAgent v6.04             674/s   154%     --      -63%     -64%     -67%  -75%  -77%  -78%  -88%  -89%  -91%     -91%
    AnyEvent::Curl::Multi v1.1      1850/s   596%   174%        --      -1%     -10%  -31%  -38%  -39%  -66%  -71%  -76%     -77%
    HTTP::Lite v2.4                 1860/s   601%   176%        1%       --      -9%  -31%  -38%  -39%  -66%  -71%  -76%     -77%
    HTTP::Tiny v0.017               2040/s   670%   203%       11%      10%       --  -24%  -31%  -33%  -63%  -68%  -74%     -74%
    Parallel::Downloader v0.121560  2680/s   909%   297%       45%      44%      31%    --  -10%  -12%  -51%  -58%  -65%     -66%
    YADA v0.025                     2980/s  1023%   342%       61%      60%      46%   11%    --   -2%  -45%  -53%  -61%     -62%
    lftp v4.3.1                     3030/s  1041%   349%       64%      63%      48%   13%    2%    --  -45%  -53%  -61%     -62%
    Furl v0.40                      5460/s  1959%   710%      196%     194%     168%  104%   83%   80%    --  -15%  -29%     -31%
    wget v1.12                      6400/s  2312%   849%      247%     244%     213%  139%  115%  111%   17%    --  -17%     -19%
    curl v7.26.0                    7720/s  2809%  1044%      318%     315%     278%  188%  159%  155%   41%   21%    --      -3%
    LWP::Curl v0.12                 7930/s  2890%  1076%      330%     327%     288%  196%  166%  162%   45%   24%    3%       --



Allow duplicate requests (default: false). By default, requests to the same URL (more precisely, requests with the same signature are issued only once. To seed POST parameters, you must extend the AnyEvent::Net::Curl::Queued::Easy class. Setting allow_dups to true value disables request checks.


Count completed requests.


AnyEvent condition variable. Initialized automatically, unless you specify your own. Also reset automatically after "wait", so keep your own reference if you really need it!


Maximum number of parallel connections (default: 4; minimum value: 1).


Net::Curl::Multi instance.


ArrayRef to the queue. Has the following helper methods:

  • queue_push: append item at the end of the queue;
  • queue_unshift: prepend item at the top of the queue;
  • dequeue: shift item from the top of the queue;
  • count: number of items in queue.


Net::Curl::Share instance.


AnyEvent::Net::Curl::Queued::Stats instance.


Timeout (default: 60 seconds).


Signature cache.


The last resort against the non-deterministic chaos of evil lurking sockets.



Populate empty request slots with workers from the queue.


Check if there are active requests or requests in queue.


Activate a worker.


Put the worker (instance of AnyEvent::Net::Curl::Queued::Easy) at the end of the queue. For lazy initialization, wrap the worker in a sub { ... }, the same way you do with the Moose default => sub { ... }:

    $queue->append(sub {
        AnyEvent::Net::Curl::Queued::Easy->new({ initial_url => 'http://.../' })


Put the worker (instance of AnyEvent::Net::Curl::Queued::Easy) at the beginning of the queue. For lazy initialization, wrap the worker in a sub { ... }, the same way you do with the Moose default => sub { ... }:

    $queue->prepend(sub {
        AnyEvent::Net::Curl::Queued::Easy->new({ initial_url => 'http://.../' })


Process queue.


The "Attempt to free unreferenced scalar: SV 0xdeadbeef during global destruction." message on finalization is mostly harmless.



Stanislaw Pusep <>


This software is copyright (c) 2012 by Stanislaw Pusep.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

Something went wrong with that request. Please try again.