Skip to content

Latest commit

 

History

History
442 lines (339 loc) · 16.3 KB

configuration.asciidoc

File metadata and controls

442 lines (339 loc) · 16.3 KB

Configuration

Almost every aspect of the client is configurable. Most users will only need to configure a few parameters to suit their needs, but it is possible to completely replace much of the internals if required.

Custom configuration is accomplished before the client is instantiated, through the ClientBuilder helper object. We’ll walk through all the configuration options and show sample code to replace the various components.

Inline Host Configuration

The most common configuration is telling the client about your cluster: how many nodes, their addresses and ports. If no hosts are specified, the client will attempt to connect to localhost:9200.

This behavior can be changed by using the setHosts() method on ClientBuilder. The method accepts an array of values, each entry corresponding to one node in your cluster. The format of the host can vary, depending on your needs (ip vs hostname, port, ssl, etc)

$hosts = [
    '192.168.1.1:9200',         // IP + Port
    '192.168.1.2',              // Just IP
    'mydomain.server.com:9201', // Domain + Port
    'mydomain2.server.com',     // Just Domain
    'https://localhost',        // SSL to localhost
    'https://192.168.1.3:9200'  // SSL to IP + Port
];
$client = ClientBuilder::create()           // Instantiate a new ClientBuilder
                    ->setHosts($hosts)      // Set the hosts
                    ->build();              // Build the client object

Notice that the ClientBuilder object allows chaining method calls for brevity. It is also possible to call the methods individually:

$hosts = [
    '192.168.1.1:9200',         // IP + Port
    '192.168.1.2',              // Just IP
    'mydomain.server.com:9201', // Domain + Port
    'mydomain2.server.com',     // Just Domain
    'https://localhost',        // SSL to localhost
    'https://192.168.1.3:9200'  // SSL to IP + Port
];
$clientBuilder = ClientBuilder::create();   // Instantiate a new ClientBuilder
$clientBuilder->setHosts($hosts);           // Set the hosts
$client = $clientBuilder->build();          // Build the client object

Extended Host Configuration

The client also supports an extended host configuration syntax. The inline configuration method relies on PHP’s filter_var() and parse_url() methods to validate and extract the components of a URL. Unfortunately, these built-in methods run into problems with certain edge-cases. For example, filter_var() will not accept URL’s that have underscores (which are questionably legal, depending on how you interpret the RFCs). Similarly, parse_url() will choke if a Basic Auth’s password contains special characters such as a pound sign (#) or question-marks (?).

For this reason, the client supports an extended host syntax which provides greater control over host initialization. None of the components are validated, so edge-cases like underscores domain names will not cause problems.

The extended syntax is an array of parameters for each host:

$hosts = [
    // This is effectively equal to: "https://username:password!#$?*abc@foo.com:9200/"
    [
        'host' => 'foo.com',
        'port' => '9200',
        'scheme' => 'https',
        'user' => 'username',
        'pass' => 'password!#$?*abc'
    ],

    // This is equal to "http://localhost:9200/"
    [
        'host' => 'localhost',    // Only host is required
    ]
];
$client = ClientBuilder::create()           // Instantiate a new ClientBuilder
                    ->setHosts($hosts)      // Set the hosts
                    ->build();              // Build the client object

Only the host parameter is required for each configured host. If not provided, the default port is 9200. The default scheme is http.

Authorization and Encryption

For details about HTTP Authorization and SSL encryption, please see Authorization and SSL.

Set retries

By default, the client will retry n times, where n = number of nodes in your cluster. A retry is only performed if the operation results in a "hard" exception: connection refusal, connection timeout, DNS lookup timeout, etc. 4xx and 5xx errors are not considered retry’able events, since the node returns an operational response.

If you would like to disable retries, or change the number, you can do so with the setRetries() method:

$client = ClientBuilder::create()
                    ->setRetries(2)
                    ->build();

When the client runs out of retries, it will throw the last exception that it received. For example, if you have ten alive nodes, and setRetries(5), the client will attempt to execute the command up to five times. If all five nodes result in a connection timeout (for example), the client will throw an OperationTimeoutException. Depending on the Connection Pool being used, these nodes may also be marked dead.

To help in identification, exceptions that are thrown due to max retries will wrap a MaxRetriesException. For example, you can catch a specific curl exception then check if it wraps a MaxRetriesException using getPrevious():

$client = Elasticsearch\ClientBuilder::create()
    ->setHosts(["localhost:1"])
    ->setRetries(0)
    ->build();

try {
    $client->search($searchParams);
} catch (Elasticsearch\Common\Exceptions\Curl\CouldNotConnectToHost $e) {
    $previous = $e->getPrevious();
    if ($previous instanceof 'Elasticsearch\Common\Exceptions\MaxRetriesException') {
        echo "Max retries!";
    }
}

Alternatively, all "hard" curl exceptions (CouldNotConnectToHost, CouldNotResolveHostException, OperationTimeoutException) extend the more general TransportException. So you could instead catch the general TransportException and then check it’s previous value:

$client = Elasticsearch\ClientBuilder::create()
    ->setHosts(["localhost:1"])
    ->setRetries(0)
    ->build();

try {
    $client->search($searchParams);
} catch (Elasticsearch\Common\Exceptions\TransportException $e) {
    $previous = $e->getPrevious();
    if ($previous instanceof 'Elasticsearch\Common\Exceptions\MaxRetriesException') {
        echo "Max retries!";
    }
}

Enabling the Logger

Elasticsearch-PHP supports logging, but it is not enabled by default for performance reasons. If you wish to enable logging, you need to select a logging implementation, install it, then enable the logger in the Client. The recommended logger is Monolog, but any logger that implements the PSR/Log interface will work.

You might have noticed that Monolog was suggested during installation. To begin using Monolog, add it to your composer.json:

{
    "require": {
        ...
        "elasticsearch/elasticsearch" : "~2.0",
        "monolog/monolog": "~1.0"
    }
}

And then update your composer installation:

php composer.phar update

Once Monolog (or another logger) is installed, you need to create a log object and inject it into the client. The ClientBuilder object has a helper static function that will generate a common Monolog-based logger for you. All you need to do is provide the path to your desired logging location:

$logger = ClientBuilder::defaultLogger('path/to/your.log');

$client = ClientBuilder::create()       // Instantiate a new ClientBuilder
            ->setLogger($logger)        // Set the logger with a default logger
            ->build();                  // Build the client object

You can also specify the severity of log messages that you wish to log:

// set severity with second parameter
$logger = ClientBuilder::defaultLogger('/path/to/logs/', Logger::INFO);

$client = ClientBuilder::create()       // Instantiate a new ClientBuilder
            ->setLogger($logger)        // Set the logger with a default logger
            ->build();                  // Build the client object

The defaultLogger() method is just a helper, you are not required to use it. You can create your own logger and inject that instead:

use Monolog\Logger;
use Monolog\Handler\StreamHandler;

$logger = new Logger('name');
$logger->pushHandler(new StreamHandler('path/to/your.log', Logger::WARNING));

$client = ClientBuilder::create()       // Instantiate a new ClientBuilder
            ->setLogger($logger)        // Set your custom logger
            ->build();                  // Build the client object

Configure the HTTP Handler

Elasticsearch-PHP uses an interchangeable HTTP transport layer called RingPHP. This allows the client to construct a generic HTTP request, then pass it to the transport layer to execute. The actual execution details are hidden from the client and modular, so that you can choose from several HTTP handlers depending on your needs.

The default handler that the client uses is a combination handler. When executing in synchronous mode, the handler uses CurlHandler, which executes single curl calls. These are very fast for single requests. When asynchronous (future) mode is enabled, the handler switches to CurlMultiHandler, which uses the curl_multi interface. This involves a bit more overhead, but allows batches of HTTP requests to be processed in parallel.

You can configure the HTTP handler with one of several helper functions, or provide your own custom handler:

$defaultHandler = ClientBuilder::defaultHandler();
$singleHandler  = ClientBuilder::singleHandler();
$multiHandler   = ClientBuilder::multiHandler();
$customHandler  = new MyCustomHandler();

$client = ClientBuilder::create()
            ->setHandler($defaultHandler)
            ->build();

For details on creating your own custom Ring handler, please see the RingPHP Documentation

The default handler is recommended in almost all cases. This allows fast synchronous execution, while retaining flexibility to invoke parallel batches with async future mode. You may consider using just the singleHandler if you know you will never need async capabilities, since it will save a small amount of overhead by reducing indirection.

Setting the Connection Pool

The client maintains a pool of connections, with each connection representing a node in your cluster. There are several connection pool implementations available, and each has slightly different behavior (pinging vs no pinging, etc). Connection pools are configured via the setConnectionPool() method:

$connectionPool = '\Elasticsearch\ConnectionPool\StaticNoPingConnectionPool';
$client = ClientBuilder::create()
            ->setConnectionPool($connectionPool)
            ->build();

For more details, please see the dedicated page on configuring connection pools.

Setting the Connection Selector

The connection pool manages the connections to your cluster, but the Selector is the logic that decides which connection should be used for the next API request. There are several selectors that you can choose from. Selectors can be changed via the setSelector() method:

$selector = '\Elasticsearch\ConnectionPool\Selectors\StickyRoundRobinSelector';
$client = ClientBuilder::create()
            ->setSelector($selector)
            ->build();

For more details, please see the dedicated page on configuring selectors.

Setting the Serializer

Requests are given to the client in the form of associative arrays, but Elasticsearch expects JSON. The Serializer’s job is to serialize PHP objects into JSON. It also de-serializes JSON back into PHP arrays. This seems trivial, but there are a few edgecases which make it useful for the serializer to remain modular.

The majority of people will never need to change the default serializer (SmartSerializer), but if you need to, it can be done via the setSerializer() method:

$serializer = '\Elasticsearch\Serializers\SmartSerializer';
$client = ClientBuilder::create()
            ->setSerializer($serializer)
            ->build();

For more details, please see the dedicated page on configuring serializers.

Setting a custom ConnectionFactory

The ConnectionFactory instantiates new Connection objects when requested by the ConnectionPool. A single Connection represents a single node. Since the client hands actual networking work over to RingPHP, the Connection’s main job is book-keeping: Is this node alive? Did it fail a ping request? What is the host and port?

There is little reason to provide your own ConnectionFactory, but if you need to do so, you need to supply an intact ConnectionFactory object to the setConnectionFactory() method. The object should implement the ConnectionFactoryInterface interface.

class MyConnectionFactory implements ConnectionFactoryInterface
{

    public function __construct($handler, array $connectionParams,
                                SerializerInterface $serializer,
                                LoggerInterface $logger,
                                LoggerInterface $tracer)
    {
       // Code here
    }


    /**
     * @param $hostDetails
     *
     * @return ConnectionInterface
     */
    public function create($hostDetails)
    {
        // Code here...must return a Connection object
    }
}


$connectionFactory = new MyConnectionFactory(
    $handler,
    $connectionParams,
    $serializer,
    $logger,
    $tracer
);

$client = ClientBuilder::create()
            ->setConnectionFactory($connectionFactory);
            ->build();

As you can see, if you decide to inject your own ConnectionFactory, you take over the responsibiltiy of wiring it correctly. The ConnectionFactory requires a working HTTP handler, serializer, logger and tracer.

Set the Endpoint closure

The client uses an Endpoint closure to dispatch API requests to the correct Endpoint object. A namespace object will construct a new Endpoint via this closure, which means this is a handy location if you wish to extend the available set of API endpoints available

For example, we could add a new endpoint like so:

$transport = $this->transport;
$serializer = $this->serializer;

$newEndpoint = function ($class) use ($transport, $serializer) {
    if ($class == 'SuperSearch') {
        return new MyProject\SuperSearch($transport);
    } else {
        // Default handler
        $fullPath = '\\Elasticsearch\\Endpoints\\' . $class;
        if ($class === 'Bulk' || $class === 'Msearch' || $class === 'MPercolate') {
            return new $fullPath($transport, $serializer);
        } else {
            return new $fullPath($transport);
        }
    }
};

$client = ClientBuilder::create()
            ->setEndpoint($newEndpoint)
            ->build();

Obviously, by doing this you take responsibility that all existing endpoints still function correctly. And you also assume the responsibility of correctly wiring the Transport and Serializer into each endpoint.

Building the client from a configuration hash

To help ease automated building of the client, all configurations can be provided in a setting hash instead of calling the individual methods directly. This functionality is exposed through the ClientBuilder::FromConfig() static method, which accepts an array of configurations and returns a fully built client.

Array keys correspond to the method name, e.g. retries key corresponds to setRetries() method.

$params = [
    'hosts' => [
        'localhost:9200'
    ],
    'retries' => 2,
    'handler' => ClientBuilder::singleHandler()
];
$client = ClientBuilder::fromConfig($params);

Unknown parameters will throw an exception, to help the user find potential problems. If this behavior is not desired (e.g. you are using the hash for other purposes, and may have keys unrelated to the Elasticsearch client), you can set $quiet = true in fromConfig() to silence the exceptions.

$params = [
    'hosts' => [
        'localhost:9200'
    ],
    'retries' => 2,
    'imNotReal' => 5
];

// Set $quiet to true to ignore the unknown `imNotReal` key
$client = ClientBuilder::fromConfig($params, true);