Skip to content
AngryCurl - Anonymized Rolling Curl class, used for parsing information from remote resourse using user-predefined amount of simultaneous connections over proxies-list.
PHP
Branch: master
Clone or download
Pull request Compare This branch is 1 commit ahead of 2naive:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
classes
demo
import
.htaccess
README.md
example.php

README.md

AngryCurl

  • used for parsing information from remote resourse using user-predefined amount of simultaneous connections over proxies-list.

Basic information

Depencies:

  • PHP 5 >= 5.1.0
  • RollingCurl
  • cURL

Use cases:

  • multi-threaded parsing over proxy
  • overcoming simple parsing protection by using User-Agent header and proxy-lists
  • proxy list checking
  • validating proxies' response

Main features

  • loading proxy-list from file or array
  • removing duplicates
  • filtering alive proxies
  • checking if proxy given response content is correct
  • loading useragent-list from file or array
  • changing proxy/useragent "on the fly"
  • preventing direct connections without any proxy/useragent if such options are set
  • multi-thread connections
  • callback functions
  • working with chains of requests
  • web-console mode
  • logging

Documentation

Preferred environment configuration

  • PHP as Apache module
  • safe_mode Off
  • open_basedir is NOT set
  • PHP cURL installed
  • gzip Off

Basic usage

require("RollingCurl.class.php");
require("AngryCurl.class.php");

function my_callback($response, $info, $request)
{
    // callback function here
}

// sending callback function name as param
$AC = new AngryCurl('my_callback');
// initializing console-style output
$AC->init_console();


// Importing proxy and useragent lists, setting regexp, proxy type and target url for proxy check
// You may also import proxy from an array as simple as $AC->load_proxy_list($proxy array);
$AC->load_proxy_list(
    // path to proxy-list file
    'proxy_list.txt',
    // optional: number of threads
    200,
    // optional: proxy type
    'http',
    // optional: target url to check
    'http://google.com',
    // optional: target regexp to check
    'title>G[o]{2}gle'
);
// You may also import useragents from an array as simple as $AC->load_useragent_list($proxy array);
$AC->load_useragent_list('useragent_list.txt');

while(/* */)
{
    $url = /**/;
    // adding URL to queue
    $AC->get($url);
    
    // you may also use 
    // $AC->post($url, $post_data = null, $headers = null, $options = null);
    // $AC->get($url, $headers = null, $options = null);
    // $AC->request($url, $method = "GET", $post_data = null, $headers = null, $options = null);
    // as well
}

// setting amount of threads and starting connections
$AC->execute(200);

// if console_mode is off
//AngryCurl::print_debug(); 

unset($AC);

cURL options

You may also pass cURL options for each url before adding to queue like here:

// Define HTTP headers (CURLOPT_HTTPHEADER) if needed, or just set to NULL
$headers = array('Content-type: text/plain', 'Content-length: 100');
// Define cURL options (will be passed through curl_setopt_array) if needed, or just set to NULL
$options = array(CURLOPT_HEADER => true, CURLOPT_NOBODY => true);
// Define post-data array to send in case of POST method, or just set to NULL
$post_data = array('param' => 'value');

// Add request
$AC->get($url, $headers, $options);
// or
$AC->post($url, $post_data, $headers, $options) ;
// or
$AC->request($url, $method = "GET", $post_data, $headers, $options);

// ATTENTION: temporary "on-the-fly" proxy/useragents lists are not
// working with AngryCurlRequest. Keep it in mind if you will use code below
// as alternative to written above.

$request = new AngryCurlRequest($url);
// $url, $method, $post_data, $headers, $options - public properties of AngryCurlRequest
$request->options = array(CURLOPT_HEADER => true, CURLOPT_NOBODY => true);
$AC->add($request);

Because this class is kind of extension of RollingCurl class you may use any constructions RollingCurl has. For other information read here: http://code.google.com/p/rolling-curl/source/browse/trunk/

TODO

  • chains of requests
  • stop on error_limit exceed
  • better documentation and examples

Credits

You may join this class discussion here: http://stupid.su/php-curl_multi/ Any questions, change requests and other things you may send to my email written in class comments.

Thank you for reading.

  • naive
You can’t perform that action at this time.