Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Goutte, a simple PHP Web Scraper
PHP
branch: master

minor #213 Removed extra call addPostFiles method (spolischook)

This PR was squashed before being merged into the 2.0-dev branch (closes #213).

Discussion
----------

Removed extra call addPostFiles method

Commits
-------

e378002 Removed extra call addPostFiles method
latest commit 357876943a
@fabpot fabpot authored

README.rst

Goutte, a simple PHP Web Scraper

Goutte is a screen scraping and web crawling library for PHP.

Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses.

Requirements

Goutte depends on PHP 5.4+ and Guzzle 4+.

Tip

If you need support for PHP 5.3 or Guzzle 3, use Goutte 1.x.

Installation

Add fabpot/goutte as a require dependency in your composer.json file:

composer require fabpot/goutte

Tip

You can also download the Goutte.phar file:

require_once '/path/to/goutte.phar';

The phars for Goutte 1.x are also available for download <http://get.sensiolabs.org/goutte-v1.0.7.phar>.

Usage

Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\Client):

use Goutte\Client;

$client = new Client();

Make requests with the request() method:

// Go to the symfony.com website
$crawler = $client->request('GET', 'http://www.symfony.com/blog/');

The method returns a Crawler object (Symfony\Component\DomCrawler\Crawler).

Fine-tune cURL options:

$client->getClient()->setDefaultOption('config/curl/'.CURLOPT_TIMEOUT, 60);

Click on links:

// Click on the "Security Advisories" link
$link = $crawler->selectLink('Security Advisories')->link();
$crawler = $client->click($link);

Extract data:

// Get the latest post in this category and display the titles
$crawler->filter('h2 > a')->each(function ($node) {
    print $node->text()."\n";
});

Submit forms:

$crawler = $client->request('GET', 'http://github.com/');
$crawler = $client->click($crawler->selectLink('Sign in')->link());
$form = $crawler->selectButton('Sign in')->form();
$crawler = $client->submit($form, array('login' => 'fabpot', 'password' => 'xxxxxx'));
$crawler->filter('.flash-error')->each(function ($node) {
    print $node->text()."\n";
});

More Information

Read the documentation of the BrowserKit and DomCrawler Symfony Components for more information about what you can do with Goutte.

Pronunciation

Goutte is pronounced goot i.e. it rhymes with boot and not out.

Technical Information

Goutte is a thin wrapper around the following fine PHP libraries:

  • Symfony Components: BrowserKit, CssSelector and DomCrawler;
  • Guzzle HTTP Component.

License

Goutte is licensed under the MIT license.

Something went wrong with that request. Please try again.