Skip to content
Goutte, a simple PHP Web Scraper
PHP
Branch: master
Clone or download

Latest commit

fabpot feature #397 Switch to use Symfony HttpClient (fabpot)
This PR was merged into the 3.3-dev branch.

Discussion
----------

Switch to use Symfony HttpClient

Commits
-------

122aa37 Switch to use Symfony HttpClient
Latest commit 05f6994 Dec 6, 2019

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Goutte Switch to use Symfony HttpClient Dec 6, 2019
.gitignore removed the phar file from the repository Jan 10, 2014
.travis.yml Add PHP 7.4 in tests Dec 6, 2019
LICENSE Updated copyright to 2016 Jan 1, 2016
README.rst Switch to use Symfony HttpClient Dec 6, 2019
box.json removed the phar file from the repository Jan 10, 2014
composer.json Switch to use Symfony HttpClient Dec 6, 2019
phpunit.xml.dist Modernize tests Dec 6, 2019

README.rst

Goutte, a simple PHP Web Scraper

Goutte is a screen scraping and web crawling library for PHP.

Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses.

Requirements

Goutte depends on PHP 7.1+.

Installation

Add fabpot/goutte as a require dependency in your composer.json file:

composer require fabpot/goutte

Usage

Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\HttpBrowser):

use Goutte\Client;

$client = new Client();

Make requests with the request() method:

// Go to the symfony.com website
$crawler = $client->request('GET', 'https://www.symfony.com/blog/');

The method returns a Crawler object (Symfony\Component\DomCrawler\Crawler).

To use your own HTTP settings, you may create and pass an HttpClient instance to Goutte. For example, to add a 60 second request timeout:

use Goutte\Client;
use Symfony\Component\HttpClient\HttpClient;

$client = new Client(HttpClient::create(['timeout' => 60]));

Click on links:

// Click on the "Security Advisories" link
$link = $crawler->selectLink('Security Advisories')->link();
$crawler = $client->click($link);

Extract data:

// Get the latest post in this category and display the titles
$crawler->filter('h2 > a')->each(function ($node) {
    print $node->text()."\n";
});

Submit forms:

$crawler = $client->request('GET', 'https://github.com/');
$crawler = $client->click($crawler->selectLink('Sign in')->link());
$form = $crawler->selectButton('Sign in')->form();
$crawler = $client->submit($form, array('login' => 'fabpot', 'password' => 'xxxxxx'));
$crawler->filter('.flash-error')->each(function ($node) {
    print $node->text()."\n";
});

More Information

Read the documentation of the BrowserKit, DomCrawler, and HttpClient Symfony Components for more information about what you can do with Goutte.

Pronunciation

Goutte is pronounced goot i.e. it rhymes with boot and not out.

Technical Information

Goutte is a thin wrapper around the following Symfony Components: BrowserKit, CssSelector, DomCrawler, and HttpClient.

License

Goutte is licensed under the MIT license.

You can’t perform that action at this time.