Almost every aspect of the client is configurable. Most users will only need to configure a few parameters to suit their needs, but it is possible to completely replace much of the internals if required.
Custom configuration is accomplished before the client is instantiated, through the ClientBuilder helper object. We’ll walk through all the configuration options and show sample code to replace the various components.
The most common configuration is telling the client about your cluster: how many nodes, their addresses and ports. If
no hosts are specified, the client will attempt to connect to localhost:9200
.
This behavior can be changed by using the setHosts()
method on ClientBuilder
. The method accepts an array of values,
each entry corresponding to one node in your cluster. The format of the host can vary, depending on your needs (ip vs
hostname, port, ssl, etc)
$hosts = [
'192.168.1.1:9200', // IP + Port
'192.168.1.2', // Just IP
'mydomain.server.com:9201', // Domain + Port
'mydomain2.server.com', // Just Domain
'https://localhost', // SSL to localhost
'https://192.168.1.3:9200' // SSL to IP + Port
];
$client = ClientBuilder::create() // Instantiate a new ClientBuilder
->setHosts($hosts) // Set the hosts
->build(); // Build the client object
Notice that the ClientBuilder
object allows chaining method calls for brevity. It is also possible to call the methods
individually:
$hosts = [
'192.168.1.1:9200', // IP + Port
'192.168.1.2', // Just IP
'mydomain.server.com:9201', // Domain + Port
'mydomain2.server.com', // Just Domain
'https://localhost', // SSL to localhost
'https://192.168.1.3:9200' // SSL to IP + Port
];
$clientBuilder = ClientBuilder::create(); // Instantiate a new ClientBuilder
$clientBuilder->setHosts($hosts); // Set the hosts
$client = $clientBuilder->build(); // Build the client object
The client also supports an extended host configuration syntax. The inline configuration method relies on PHP’s
filter_var()
and parse_url()
methods to validate and extract the components of a URL. Unfortunately, these built-in
methods run into problems with certain edge-cases. For example, filter_var()
will not accept URL’s that have underscores
(which are questionably legal, depending on how you interpret the RFCs). Similarly, parse_url()
will choke if a
Basic Auth’s password contains special characters such as a pound sign (#
) or question-marks (?
).
For this reason, the client supports an extended host syntax which provides greater control over host initialization. None of the components are validated, so edge-cases like underscores domain names will not cause problems.
The extended syntax is an array of parameters for each host:
$hosts = [
// This is effectively equal to: "https://username:password!#$?*abc@foo.com:9200/"
[
'host' => 'foo.com',
'port' => '9200',
'scheme' => 'https',
'user' => 'username',
'pass' => 'password!#$?*abc'
],
// This is equal to "http://localhost:9200/"
[
'host' => 'localhost', // Only host is required
]
];
$client = ClientBuilder::create() // Instantiate a new ClientBuilder
->setHosts($hosts) // Set the hosts
->build(); // Build the client object
Only the host
parameter is required for each configured host. If not provided, the default port is 9200
. The default
scheme is http
.
For details about HTTP Authorization and SSL encryption, please see Authorization and SSL.
By default, the client will retry n
times, where n = number of nodes
in your cluster. A retry is only performed
if the operation results in a "hard" exception: connection refusal, connection timeout, DNS lookup timeout, etc. 4xx and
5xx errors are not considered retry’able events, since the node returns an operational response.
If you would like to disable retries, or change the number, you can do so with the setRetries()
method:
$client = ClientBuilder::create()
->setRetries(2)
->build();
When the client runs out of retries, it will throw the last exception that it received. For example, if you have ten
alive nodes, and setRetries(5)
, the client will attempt to execute the command up to five times. If all five nodes
result in a connection timeout (for example), the client will throw an OperationTimeoutException
. Depending on the
Connection Pool being used, these nodes may also be marked dead.
To help in identification, exceptions that are thrown due to max retries will wrap a MaxRetriesException
. For example,
you can catch a specific curl exception then check if it wraps a MaxRetriesException using getPrevious()
:
$client = Elasticsearch\ClientBuilder::create()
->setHosts(["localhost:1"])
->setRetries(0)
->build();
try {
$client->search($searchParams);
} catch (Elasticsearch\Common\Exceptions\Curl\CouldNotConnectToHost $e) {
$previous = $e->getPrevious();
if ($previous instanceof 'Elasticsearch\Common\Exceptions\MaxRetriesException') {
echo "Max retries!";
}
}
Alternatively, all "hard" curl exceptions (CouldNotConnectToHost
, CouldNotResolveHostException
, OperationTimeoutException
)
extend the more general TransportException
. So you could instead catch the general TransportException
and then
check it’s previous value:
$client = Elasticsearch\ClientBuilder::create()
->setHosts(["localhost:1"])
->setRetries(0)
->build();
try {
$client->search($searchParams);
} catch (Elasticsearch\Common\Exceptions\TransportException $e) {
$previous = $e->getPrevious();
if ($previous instanceof 'Elasticsearch\Common\Exceptions\MaxRetriesException') {
echo "Max retries!";
}
}
Elasticsearch-PHP supports logging, but it is not enabled by default for performance reasons. If you wish to enable logging,
you need to select a logging implementation, install it, then enable the logger in the Client. The recommended logger
is Monolog, but any logger that implements the PSR/Log
interface will work.
You might have noticed that Monolog was suggested during installation. To begin using Monolog, add it to your composer.json
:
{
"require": {
...
"elasticsearch/elasticsearch" : "~2.0",
"monolog/monolog": "~1.0"
}
}
And then update your composer installation:
php composer.phar update
Once Monolog (or another logger) is installed, you need to create a log object and inject it into the client. The
ClientBuilder
object has a helper static function that will generate a common Monolog-based logger for you. All you need
to do is provide the path to your desired logging location:
$logger = ClientBuilder::defaultLogger('path/to/your.log');
$client = ClientBuilder::create() // Instantiate a new ClientBuilder
->setLogger($logger) // Set the logger with a default logger
->build(); // Build the client object
You can also specify the severity of log messages that you wish to log:
// set severity with second parameter
$logger = ClientBuilder::defaultLogger('/path/to/logs/', Logger::INFO);
$client = ClientBuilder::create() // Instantiate a new ClientBuilder
->setLogger($logger) // Set the logger with a default logger
->build(); // Build the client object
The defaultLogger()
method is just a helper, you are not required to use it. You can create your own logger and inject
that instead:
use Monolog\Logger;
use Monolog\Handler\StreamHandler;
$logger = new Logger('name');
$logger->pushHandler(new StreamHandler('path/to/your.log', Logger::WARNING));
$client = ClientBuilder::create() // Instantiate a new ClientBuilder
->setLogger($logger) // Set your custom logger
->build(); // Build the client object
Elasticsearch-PHP uses an interchangeable HTTP transport layer called RingPHP. This allows the client to construct a generic HTTP request, then pass it to the transport layer to execute. The actual execution details are hidden from the client and modular, so that you can choose from several HTTP handlers depending on your needs.
The default handler that the client uses is a combination handler. When executing in synchronous mode, the handler
uses CurlHandler
, which executes single curl calls. These are very fast for single requests. When asynchronous (future)
mode is enabled, the handler switches to CurlMultiHandler
, which uses the curl_multi interface. This involves a bit
more overhead, but allows batches of HTTP requests to be processed in parallel.
You can configure the HTTP handler with one of several helper functions, or provide your own custom handler:
$defaultHandler = ClientBuilder::defaultHandler();
$singleHandler = ClientBuilder::singleHandler();
$multiHandler = ClientBuilder::multiHandler();
$customHandler = new MyCustomHandler();
$client = ClientBuilder::create()
->setHandler($defaultHandler)
->build();
For details on creating your own custom Ring handler, please see the RingPHP Documentation
The default handler is recommended in almost all cases. This allows fast synchronous execution, while retaining flexibility
to invoke parallel batches with async future mode. You may consider using just the singleHandler
if you know you will
never need async capabilities, since it will save a small amount of overhead by reducing indirection.
The client maintains a pool of connections, with each connection representing a node in your cluster. There are several
connection pool implementations available, and each has slightly different behavior (pinging vs no pinging, etc).
Connection pools are configured via the setConnectionPool()
method:
$connectionPool = '\Elasticsearch\ConnectionPool\StaticNoPingConnectionPool';
$client = ClientBuilder::create()
->setConnectionPool($connectionPool)
->build();
For more details, please see the dedicated page on configuring connection pools.
The connection pool manages the connections to your cluster, but the Selector is the logic that decides which connection
should be used for the next API request. There are several selectors that you can choose from. Selectors can be changed
via the setSelector()
method:
$selector = '\Elasticsearch\ConnectionPool\Selectors\StickyRoundRobinSelector';
$client = ClientBuilder::create()
->setSelector($selector)
->build();
For more details, please see the dedicated page on configuring selectors.
Requests are given to the client in the form of associative arrays, but Elasticsearch expects JSON. The Serializer’s job is to serialize PHP objects into JSON. It also de-serializes JSON back into PHP arrays. This seems trivial, but there are a few edgecases which make it useful for the serializer to remain modular.
The majority of people will never need to change the default serializer (SmartSerializer
), but if you need to,
it can be done via the setSerializer()
method:
$serializer = '\Elasticsearch\Serializers\SmartSerializer';
$client = ClientBuilder::create()
->setSerializer($serializer)
->build();
For more details, please see the dedicated page on configuring serializers.
The ConnectionFactory instantiates new Connection objects when requested by the ConnectionPool. A single Connection represents a single node. Since the client hands actual networking work over to RingPHP, the Connection’s main job is book-keeping: Is this node alive? Did it fail a ping request? What is the host and port?
There is little reason to provide your own ConnectionFactory, but if you need to do so, you need to supply an intact
ConnectionFactory object to the setConnectionFactory()
method. The object should implement the ConnectionFactoryInterface
interface.
class MyConnectionFactory implements ConnectionFactoryInterface
{
public function __construct($handler, array $connectionParams,
SerializerInterface $serializer,
LoggerInterface $logger,
LoggerInterface $tracer)
{
// Code here
}
/**
* @param $hostDetails
*
* @return ConnectionInterface
*/
public function create($hostDetails)
{
// Code here...must return a Connection object
}
}
$connectionFactory = new MyConnectionFactory(
$handler,
$connectionParams,
$serializer,
$logger,
$tracer
);
$client = ClientBuilder::create()
->setConnectionFactory($connectionFactory);
->build();
As you can see, if you decide to inject your own ConnectionFactory, you take over the responsibiltiy of wiring it correctly. The ConnectionFactory requires a working HTTP handler, serializer, logger and tracer.
The client uses an Endpoint closure to dispatch API requests to the correct Endpoint object. A namespace object will construct a new Endpoint via this closure, which means this is a handy location if you wish to extend the available set of API endpoints available
For example, we could add a new endpoint like so:
$transport = $this->transport;
$serializer = $this->serializer;
$newEndpoint = function ($class) use ($transport, $serializer) {
if ($class == 'SuperSearch') {
return new MyProject\SuperSearch($transport);
} else {
// Default handler
$fullPath = '\\Elasticsearch\\Endpoints\\' . $class;
if ($class === 'Bulk' || $class === 'Msearch' || $class === 'MPercolate') {
return new $fullPath($transport, $serializer);
} else {
return new $fullPath($transport);
}
}
};
$client = ClientBuilder::create()
->setEndpoint($newEndpoint)
->build();
Obviously, by doing this you take responsibility that all existing endpoints still function correctly. And you also assume the responsibility of correctly wiring the Transport and Serializer into each endpoint.
To help ease automated building of the client, all configurations can be provided in a setting
hash instead of calling the individual methods directly. This functionality is exposed through
the ClientBuilder::FromConfig()
static method, which accepts an array of configurations
and returns a fully built client.
Array keys correspond to the method name, e.g. retries
key corresponds to setRetries()
method.
$params = [
'hosts' => [
'localhost:9200'
],
'retries' => 2,
'handler' => ClientBuilder::singleHandler()
];
$client = ClientBuilder::fromConfig($params);
Unknown parameters will throw an exception, to help the user find potential problems. If this behavior is not desired (e.g. you are using the hash for other purposes, and may have keys unrelated to the Elasticsearch client), you can set $quiet = true in fromConfig() to silence the exceptions.
$params = [
'hosts' => [
'localhost:9200'
],
'retries' => 2,
'imNotReal' => 5
];
// Set $quiet to true to ignore the unknown `imNotReal` key
$client = ClientBuilder::fromConfig($params, true);