Percolation requests can be executed before all percolator queries are loaded #10722

antonha · 2015-04-22T12:54:15Z

The percolator can sometimes fail to match queries right after shard recovery. Version observed in: 1.5.1.

The percolator keeps all queries in an in-memory collection (shard by shard), which it reads from the index at startup. This is done by registering a listener to the IndicesLifecycle, with the listener loads the queries when afterIndexShardPostRecovery is called.

This seems to not block the shard to be reported as initialised, so sometimes this is not completed before the cluster returns a yellow status. Thus, if a request comes in before all queries have been loaded into the in-memory structure, the response will erroneously say that there were no matches.

I've been unable to create a predictively failing test for this. This test sometimes exposes the error (by not passing). For me, it fails every 5-6th time:

package org.elasticsearch.test.integration;

import org.elasticsearch.action.admin.cluster.health.ClusterHealthResponse;
import org.elasticsearch.action.percolate.PercolateRequestBuilder;
import org.elasticsearch.action.percolate.PercolateResponse;
import org.elasticsearch.action.percolate.PercolateSourceBuilder;
import org.elasticsearch.client.Client;
import org.elasticsearch.client.Requests;
import org.elasticsearch.common.settings.ImmutableSettings;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.percolator.PercolatorService;
import org.testng.annotations.AfterClass;
import org.testng.annotations.Test;

import java.io.IOException;
import java.util.concurrent.ExecutionException;

import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder;
import static org.elasticsearch.index.query.QueryBuilders.matchQuery;
import static org.hamcrest.CoreMatchers.equalTo;
import static org.hamcrest.CoreMatchers.is;
import static org.hamcrest.MatcherAssert.assertThat;

/**
 *
 */
public class RecoveryTests extends AbstractNodesTests {
    @AfterClass
    public void closeNodes() {
        closeAllNodesAndClear();
    }

    @Test(enabled = true)
    public void testRestartNode() throws IOException, ExecutionException, InterruptedException {
        Settings extraSettings = ImmutableSettings.settingsBuilder()
                .put("index.gateway.type", "local").build();

        logger.info("--> Starting one nodes");
        startNode("node1", extraSettings);
        Client client = client("node1");

        logger.info("--> Add dummy doc");
        client.admin().indices().prepareDelete("_all").execute().actionGet();
        client.prepareIndex("test", "type", "1").setSource("field", "value").execute().actionGet();

        logger.info("--> Register query");
        client.prepareIndex("test", PercolatorService.TYPE_NAME, "1")
                .setSource(jsonBuilder()
                                .startObject()
                                .field("query", matchQuery("field", "b"))
                                .field("id", 1)
                                .field("group", "g1")
                                .field("query_hash", "hash1")
                                .endObject()
                ).setRefresh(true).execute().actionGet();
        logger.info("--> Restarting node");
        closeNode("node1");
        startNode("node1", extraSettings);
        client = client("node1");
        logger.info("Waiting for cluster health to be yellow");
        waitForYellowIndices(client);

        logger.info("--> Percolate doc with field=b");
        PercolateResponse response = new PercolateRequestBuilder(client).setIndices("test").setDocumentType("type")
                .setSource(new PercolateSourceBuilder().setDoc(PercolateSourceBuilder.docBuilder().setDoc(jsonBuilder().startObject().field("_id", "1").field("field", "b").endObject())))
                .execute().actionGet();

        assertThat(response.getCount(), is(1l));

        logger.info("--> Restarting node again (This will trigger another code-path since translog is flushed)");
        closeNode("node1");
        startNode("node1", extraSettings);
        client = client("node1");
        logger.info("Waiting for cluster health to be yellow");
        waitForYellowIndices(client);

        logger.info("--> Percolate doc with field=b");
        response = new PercolateRequestBuilder(client).setIndices("test").setDocumentType("type")
                .setSource(new PercolateSourceBuilder().setDoc(PercolateSourceBuilder.docBuilder().setDoc(jsonBuilder().startObject().field("_id", "1").field("field", "b").endObject())))
                .execute().actionGet();

        assertThat(response.getCount(), is(1l));

    }

    private void waitForYellowIndices(Client client) {
        ClusterHealthResponse health = client.admin().cluster().health(Requests.clusterHealthRequest(new String[]{}).waitForYellowStatus().waitForActiveShards(5)).actionGet();
        assertThat(health.isTimedOut(), equalTo(false));
    }
}

There are tests similar to this one for the percolator, which are supposed to test the same thing. From what I understand though, those are very particular about the cluster setup.. might it be that they can't catch this issue?

The text was updated successfully, but these errors were encountered:

ekesken · 2015-05-12T14:47:58Z

We live same problem during auto-scaling of our elasticsearch cluster in batch operations.

I described our problem and shared a script to reproduce problem in following stackoverflow post:

http://stackoverflow.com/questions/30194246/percolate-returns-empty-matches-under-heavy-load-during-elasticsearch-cluster-re

Is there any workaround that we can apply? It's critical for our scenario not to miss any content during percolation.

ekesken · 2015-05-13T05:40:37Z

Is there a way to check every shard is really OK, before sending percolation requests? obviously checking green status does not work.

clintongormley · 2015-05-27T09:23:38Z

@martijnvg can we somehow not mark a percolation shard as active until the percolation requests have been loaded?

clintongormley · 2015-05-27T09:24:31Z

(Note: this is not new in 1.5.1 - it has worked this way since the beginning)

clintongormley · 2015-06-23T18:12:47Z

Closed by #11799

clintongormley added discuss :Search/Percolator Reverse search: find queries that match a document labels Apr 25, 2015

clintongormley assigned martijnvg May 29, 2015

s1monw mentioned this issue Jun 23, 2015

Load percolator queries before shard is marked POST_RECOVERY #11799

Merged

clintongormley closed this as completed Jun 23, 2015

clintongormley removed the discuss label Jun 23, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Percolation requests can be executed before all percolator queries are loaded #10722

Percolation requests can be executed before all percolator queries are loaded #10722

antonha commented Apr 22, 2015

ekesken commented May 12, 2015

ekesken commented May 13, 2015

clintongormley commented May 27, 2015

clintongormley commented May 27, 2015

clintongormley commented Jun 23, 2015

Percolation requests can be executed before all percolator queries are loaded #10722

Percolation requests can be executed before all percolator queries are loaded #10722

Comments

antonha commented Apr 22, 2015

ekesken commented May 12, 2015

ekesken commented May 13, 2015

clintongormley commented May 27, 2015

clintongormley commented May 27, 2015

clintongormley commented Jun 23, 2015