Skip to content
This repository has been archived by the owner on Apr 1, 2023. It is now read-only.

Commit

Permalink
merged branch mtdowling/guzzle (PR #29)
Browse files Browse the repository at this point in the history
Commits
-------

b218d17 Replacing Zend HTTP client with Guzzle

Discussion
----------

Replacing Zend HTTP client with Guzzle

Here's how Goutte would look using Guzzle.  What do you think?

---------------------------------------------------------------------------

by stof at 2011-11-12T20:00:55Z

There was a discussion on Twitter yesterday to replace Zend Http Client with [Buzz](https://github.com/kriswallsmith/Buzz) and @fabpot said he already started the implementation locally

---------------------------------------------------------------------------

by stof at 2011-11-12T20:26:06Z

Btw, a good point about both ZHC and Buzz is that they support different ways to do the request whereas Guzzle requires having the curl extension AFAICS (correct me if I missed something there).

And another point in favor of Buzz is that the library is more lightweight as it focuses on being an HTTP client whereas Guzzle has more features (which is great when dealing with a webservice client but not needed for Goutte).

However, nothing forbids to have 2 different BrowserKit implementations, one based on Buzz and the other on Guzzle and letting the user choose which one he wants to use (for instance someone using Guzzle in its project may prefer a version based on Guzzle).

@fabpot what do you think about this ?

---------------------------------------------------------------------------

by mtdowling at 2011-11-12T22:19:35Z

You're right -- Guzzle requires curl.  Every major linux distro includes curl and a version of PHP with the libcurl bindings enabled.  I'm not sure about Windows, but I imagine it's trivial to install if it is not installed already.  This is not an issue.

Guzzle is primarily an HTTP client and provides a very small namespace dedicated to building web service clients.  There are currently only 14 classes in the ``Service`` layer that would not normally be used by Goutte.  This again, in my opinion, is not an issue.

Guzzle has 100% code coverage and is a perfect match for Goutte.  From what fabpot said on the Buzz implementation was that Buzz was not ready.  One of the things that Guzzle can handle that I don't think is yet implemented in Buzz, is properly handling multiple Cookies of the same name (correct me if I'm wrong).

If it is decided to use multiple BrowserKit implementations, then let me know.

---------------------------------------------------------------------------

by stof at 2011-11-12T22:23:45Z

Buzz was not ready when he started to try it. There were many improvements in the previous weeks with the 0.3 release.

Buzz supports multiple cookies properly. The place where they were not supported properly (and may still not be as I'm not totally sure if a fix has been merged) is BrowserKit.

---------------------------------------------------------------------------

by mtdowling at 2011-11-12T23:11:44Z

You're right.  Looks like it was [fixed](kriswallsmith/Buzz#11) yesterday.  It also looks like [Buzz currently supports only two kinds of clients](https://github.com/kriswallsmith/Buzz/tree/master/lib/Buzz/Client) -- file_get_contents and curl based clients.  file_get_contents requires ``allow_url_fopen = On`` and the curl implementation, of course, requires curl.

Another reason I think Guzzle would be a smart move for Goutte is because [Guzzle manages persistent connections](http://guzzlephp.org/docs/tour/http/#managed-persistent-http-connections).  Goutte, being tool for scraping websites, is going to send a large number of requests in succession to the same server.  Buzz creates a new curl handle for each request (per client), whereas Guzzle implicitly manages a pool of persistent connections that are reused when possible (file_get_contents does not reuse connections either).  Using persistent connections will make Goutte more efficient, faster, cause less strain on a network (fewer TCP connections), and reduced load on the remote server.

---------------------------------------------------------------------------

by mtdowling at 2011-11-14T18:23:08Z

I think there could be some value in being able to choose which http client you want to use with Goutte (buzz, Zend, guzzle, etc).  I think most people will use the generated phar file, so adding these other clients would only slow down cloning the repo and running the tests (which didn't exist before this PR).  The generated phar could use whichever client is deemed as the default so that it doesn't bloat the end product.

I'm not sure what the problem is with the Zend implementation.  If it's because we need to include the full ZF project, then couldn't we maintain a subtree split or use the one that's maintained by knplabs (iirc)?

I could probably submit a pull request with guzzle, buzz, and Zend browser kit adapters within a day or two if that's something of interest.

-Michael

On Nov 12, 2011, at 4:23 PM, Christophe Coevoet<reply@reply.github.com> wrote:

> Buzz was not ready when he started to try it. There were many improvements in the previous weeks with the 0.3 release.
>
> Buzz supports multiple cookies properly. The place where they were not supported properly (and may still not be as I'm not totally sure if a fix has been merged) is BrowserKit.
>
> ---
> Reply to this email directly or view it on GitHub:
> #29 (comment)

---------------------------------------------------------------------------

by stof at 2011-11-14T18:32:06Z

The issue is that ZF components are heavily coupled: we need to use 6 or 7 subtrees.

---------------------------------------------------------------------------

by jc- at 2012-01-17T12:34:21Z

Another issue also is that the Zend HTTP client doesn't support multiple cookie headers.

---------------------------------------------------------------------------

by cystbear at 2012-02-22T10:37:45Z

:+1:
  • Loading branch information
fabpot committed May 15, 2012
2 parents 5ecceb7 + b218d17 commit a7ed527
Show file tree
Hide file tree
Showing 12 changed files with 268 additions and 102 deletions.
1 change: 1 addition & 0 deletions .gitignore
@@ -0,0 +1 @@
phpunit.xml
6 changes: 3 additions & 3 deletions .gitmodules
Expand Up @@ -10,12 +10,12 @@
[submodule "vendor/Symfony/Component/Process"]
path = vendor/Symfony/Component/Process
url = https://github.com/symfony/Process
[submodule "vendor/zend"]
path = vendor/zend
url = https://github.com/zendframework/zf2
[submodule "vendor/Symfony/Component/ClassLoader"]
path = vendor/Symfony/Component/ClassLoader
url = https://github.com/symfony/ClassLoader
[submodule "vendor/Symfony/Component/Finder"]
path = vendor/Symfony/Component/Finder
url = https://github.com/symfony/Finder
[submodule "vendor/Guzzle"]
path = vendor/Guzzle
url = git://github.com/guzzle/guzzle.git
3 changes: 2 additions & 1 deletion README.md
Expand Up @@ -71,11 +71,12 @@ Goutte is a thin wrapper around the following fine PHP libraries:

* Symfony Components: BrowserKit, ClassLoader, CssSelector, DomCrawler, Finder, and Process

* Zend libraries: Date, Uri, Http, and Validator
* [Guzzle](http://www.guzzlephp.org)

License
-------

Goutte is licensed under the MIT license.

[1]: https://raw.github.com/fabpot/Goutte/master/goutte.phar

2 changes: 1 addition & 1 deletion autoload.php
Expand Up @@ -9,7 +9,7 @@
$loader = new UniversalClassLoader();
$loader->registerNamespaces(array(
'Symfony' => __DIR__.'/vendor',
'Zend' => __DIR__.'/vendor/zend/library',
'Guzzle' => __DIR__.'/vendor/Guzzle/src',
'Goutte' => __DIR__.'/src',
));
$loader->register();
18 changes: 18 additions & 0 deletions phpunit.xml.dst
@@ -0,0 +1,18 @@
<?xml version="1.0" encoding="UTF-8"?>

<phpunit backupGlobals="false"
backupStaticAttributes="false"
colors="true"
convertErrorsToExceptions="true"
convertNoticesToExceptions="true"
convertWarningsToExceptions="true"
processIsolation="false"
stopOnFailure="false"
syntaxCheck="false"
bootstrap="./tests/bootstrap.php">
<testsuites>
<testsuite name="Goutte Test Suite">
<directory>./tests/Goutte/Tests</directory>
</testsuite>
</testsuites>
</phpunit>
131 changes: 64 additions & 67 deletions src/Goutte/Client.php
Expand Up @@ -8,8 +8,11 @@
use Symfony\Component\BrowserKit\Request;
use Symfony\Component\BrowserKit\Response;

use Zend\Http\Client as ZendClient;
use Zend\Http\Response as ZendResponse;
use Guzzle\Http\Curl\CurlException;
use Guzzle\Http\Message\RequestInterface as GuzzleRequestInterface;
use Guzzle\Http\Message\Response as GuzzleResponse;
use Guzzle\Service\ClientInterface as GuzzleClientInterface;
use Guzzle\Service\Client as GuzzleClient;

/*
* This file is part of the Goutte package.
Expand All @@ -25,115 +28,109 @@
*
* @package Goutte
* @author Fabien Potencier <fabien.potencier@symfony-project.com>
* @author Michael Dowling <michael@guzzlephp.org>
*/
class Client extends BaseClient
{
const VERSION = '0.1';

protected $zendConfig;
protected $headers = array();
protected $auth = null;
protected $client;

public function __construct(array $zendConfig = array(), array $server = array(), History $history = null, CookieJar $cookieJar = null)
public function setClient(GuzzleClientInterface $client)
{
$this->zendConfig = $zendConfig;
$this->client = $client;

parent::__construct($server, $history, $cookieJar);
return $this;
}

public function getClient()
{
if (!$this->client) {
$this->client = new GuzzleClient();
}

return $this->client;
}

public function setHeader($name, $value)
{
$this->headers[$name] = $value;

return $this;
}

public function setAuth($user, $password = '', $type = ZendClient::AUTH_BASIC)
public function setAuth($user, $password = '', $type = GuzzleRequestInterface::AUTH_BASIC)
{
$this->auth = array(
'user' => $user,
'user' => $user,
'password' => $password,
'type' => $type
);
}

protected function doRequest($request)
{
$client = $this->createClient($request);

$response = $client->send($client->getRequest());

return $this->createResponse($response);
return $this;
}

protected function createClient(Request $request)
protected function doRequest($request)
{
$client = $this->createZendClient();
$client->setUri($request->getUri());
$client->setConfig(array_merge(array(
'maxredirects' => 0,
'timeout' => 30,
'useragent' => $this->server['HTTP_USER_AGENT'],
'adapter' => 'Zend\\Http\\Client\\Adapter\\Socket',
), $this->zendConfig));
$client->setMethod(strtoupper($request->getMethod()));

if ($request->getContent() !== null) {
$client->setRawBody($request->getContent());
}

if ('POST' == $request->getMethod()) {
$client->setParameterPost($request->getParameters());
}
$client->setHeaders($this->headers);
$guzzleRequest = $this->getClient()->createRequest(
strtoupper($request->getMethod()),
$request->getUri(),
$this->headers,
$request->getParameters()
);

if ($this->auth !== null) {
$client->setAuth(
$guzzleRequest->setAuth(
$this->auth['user'],
$this->auth['password'],
$this->auth['type']
);
}

foreach ($this->getCookieJar()->allValues($request->getUri()) as $name => $value) {
$client->addCookie($name, $value);
$guzzleRequest->addCookie($name, $value);
}

$this->addFileUploadsRecursively($client, $request->getFiles());

return $client;
}

/**
* Goes recursively through the files array and adds uploads to the ZendClient
*/
protected function addFileUploadsRecursively(ZendClient $client, array $files, $arrayName = '')
{
foreach ($files as $name => $info) {
if (!empty($arrayName)) {
$name = $arrayName . '[' . $name . ']';
if ($request->getMethod() == 'POST') {
foreach ($request->getFiles() as $name => $info) {
if (isset($info['tmp_name']) && '' !== $info['tmp_name']) {
$guzzleRequest->addPostFiles(array(
$name => $info['tmp_name']
));
}
}
if (isset($info['tmp_name']) && isset($info['name'])) {
if ('' !== $info['tmp_name'] && '' !== $info['name']) {
$filename = $info['name'];

if (false === ($data = @file_get_contents($info['tmp_name']))) {
throw new \RuntimeException("Unable to read file '{$filename}' for upload");
}
}

$client->setFileUpload($filename, $name, $data);
}
} elseif (is_array($info)) {
$this->addFileUploadsRecursively($client, $info, $name);
$guzzleRequest->setHeader('User-Agent', $this->server['HTTP_USER_AGENT']);

$guzzleRequest->getCurlOptions()->merge(array(
CURLOPT_MAXREDIRS => 0,
CURLOPT_TIMEOUT => 30
));

// Let BrowserKit handle redirects
try {
$response = $guzzleRequest->send();
} catch (CurlException $e) {
if (strpos($e->getMessage(), 'redirects')) {
$response = $e->getResponse();
} else {
throw $e;
}
}
}

protected function createResponse(ZendResponse $response)
{
return new Response($response->getBody(), $response->getStatusCode(), $response->headers()->toArray());
return $this->createResponse($response);
}

protected function createZendClient()
protected function createResponse(GuzzleResponse $response)
{
return new ZendClient(null, array('encodecookies' => false));
return new Response(
$response->getBody(true),
$response->getStatusCode(),
$response->getHeaders()->getAll()
);
}
}

32 changes: 4 additions & 28 deletions src/Goutte/Compiler.php
Expand Up @@ -37,9 +37,7 @@ public function compile($pharFile = 'goutte.phar')
foreach ($this->getFiles() as $file)
{
$path = str_replace(__DIR__.'/', '', $file);
$content = preg_replace("#require_once 'Zend/.*?';#", '', php_strip_whitespace($file));

$phar->addFromString($path, $content);
$phar->addFromString($path, file_get_contents($file));
}

// Stubs
Expand Down Expand Up @@ -82,28 +80,7 @@ protected function getFiles()
$files = array(
'LICENSE',
'autoload.php',
'vendor/Symfony/Component/ClassLoader/UniversalClassLoader.php',
'vendor/zend/library/Zend/Registry.php',
//'vendor/zend/library/Zend/Date.php',
'vendor/zend/library/Zend/Uri/Uri.php',
'vendor/zend/library/Zend/Validator/Validator.php',
'vendor/zend/library/Zend/Validator/AbstractValidator.php',
'vendor/zend/library/Zend/Validator/Hostname.php',
'vendor/zend/library/Zend/Validator/Ip.php',
//'vendor/zend/library/Zend/Validator/Hostname/Biz.php',
//'vendor/zend/library/Zend/Validator/Hostname/Cn.php',
'vendor/zend/library/Zend/Validator/Hostname/Com.php',
'vendor/zend/library/Zend/Validator/Hostname/Jp.php',
'vendor/zend/library/Zend/Stdlib/Dispatchable.php',
'vendor/zend/library/Zend/Stdlib/Message.php',
'vendor/zend/library/Zend/Stdlib/MessageDescription.php',
'vendor/zend/library/Zend/Stdlib/RequestDescription.php',
'vendor/zend/library/Zend/Stdlib/Parameters.php',
'vendor/zend/library/Zend/Stdlib/ParametersDescription.php',
'vendor/zend/library/Zend/Stdlib/ResponseDescription.php',
'vendor/zend/library/Zend/Loader/PluginClassLoader.php',
'vendor/zend/library/Zend/Loader/PluginClassLocator.php',
'vendor/zend/library/Zend/Loader/ShortNameLocator.php',
'vendor/Symfony/Component/ClassLoader/UniversalClassLoader.php'
);

$dirs = array(
Expand All @@ -112,9 +89,7 @@ protected function getFiles()
'vendor/Symfony/Component/DomCrawler',
'vendor/Symfony/Component/CssSelector',
'vendor/Symfony/Component/Process',
//'vendor/zend/library/Zend/Date',
'vendor/zend/library/Zend/Uri',
'vendor/zend/library/Zend/Http',
'vendor/Guzzle/src/Guzzle'
);

$finder = new Finder();
Expand All @@ -123,3 +98,4 @@ protected function getFiles()
return array_merge($files, iterator_to_array($iterator));
}
}

0 comments on commit a7ed527

Please sign in to comment.