You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The percolator can sometimes fail to match queries right after shard recovery. Version observed in: 1.5.1.
The percolator keeps all queries in an in-memory collection (shard by shard), which it reads from the index at startup. This is done by registering a listener to the IndicesLifecycle, with the listener loads the queries when afterIndexShardPostRecovery is called.
This seems to not block the shard to be reported as initialised, so sometimes this is not completed before the cluster returns a yellow status. Thus, if a request comes in before all queries have been loaded into the in-memory structure, the response will erroneously say that there were no matches.
I've been unable to create a predictively failing test for this. This test sometimes exposes the error (by not passing). For me, it fails every 5-6th time:
package org.elasticsearch.test.integration;
import org.elasticsearch.action.admin.cluster.health.ClusterHealthResponse;
import org.elasticsearch.action.percolate.PercolateRequestBuilder;
import org.elasticsearch.action.percolate.PercolateResponse;
import org.elasticsearch.action.percolate.PercolateSourceBuilder;
import org.elasticsearch.client.Client;
import org.elasticsearch.client.Requests;
import org.elasticsearch.common.settings.ImmutableSettings;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.percolator.PercolatorService;
import org.testng.annotations.AfterClass;
import org.testng.annotations.Test;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder;
import static org.elasticsearch.index.query.QueryBuilders.matchQuery;
import static org.hamcrest.CoreMatchers.equalTo;
import static org.hamcrest.CoreMatchers.is;
import static org.hamcrest.MatcherAssert.assertThat;
/**
*
*/
public class RecoveryTests extends AbstractNodesTests {
@AfterClass
public void closeNodes() {
closeAllNodesAndClear();
}
@Test(enabled = true)
public void testRestartNode() throws IOException, ExecutionException, InterruptedException {
Settings extraSettings = ImmutableSettings.settingsBuilder()
.put("index.gateway.type", "local").build();
logger.info("--> Starting one nodes");
startNode("node1", extraSettings);
Client client = client("node1");
logger.info("--> Add dummy doc");
client.admin().indices().prepareDelete("_all").execute().actionGet();
client.prepareIndex("test", "type", "1").setSource("field", "value").execute().actionGet();
logger.info("--> Register query");
client.prepareIndex("test", PercolatorService.TYPE_NAME, "1")
.setSource(jsonBuilder()
.startObject()
.field("query", matchQuery("field", "b"))
.field("id", 1)
.field("group", "g1")
.field("query_hash", "hash1")
.endObject()
).setRefresh(true).execute().actionGet();
logger.info("--> Restarting node");
closeNode("node1");
startNode("node1", extraSettings);
client = client("node1");
logger.info("Waiting for cluster health to be yellow");
waitForYellowIndices(client);
logger.info("--> Percolate doc with field=b");
PercolateResponse response = new PercolateRequestBuilder(client).setIndices("test").setDocumentType("type")
.setSource(new PercolateSourceBuilder().setDoc(PercolateSourceBuilder.docBuilder().setDoc(jsonBuilder().startObject().field("_id", "1").field("field", "b").endObject())))
.execute().actionGet();
assertThat(response.getCount(), is(1l));
logger.info("--> Restarting node again (This will trigger another code-path since translog is flushed)");
closeNode("node1");
startNode("node1", extraSettings);
client = client("node1");
logger.info("Waiting for cluster health to be yellow");
waitForYellowIndices(client);
logger.info("--> Percolate doc with field=b");
response = new PercolateRequestBuilder(client).setIndices("test").setDocumentType("type")
.setSource(new PercolateSourceBuilder().setDoc(PercolateSourceBuilder.docBuilder().setDoc(jsonBuilder().startObject().field("_id", "1").field("field", "b").endObject())))
.execute().actionGet();
assertThat(response.getCount(), is(1l));
}
private void waitForYellowIndices(Client client) {
ClusterHealthResponse health = client.admin().cluster().health(Requests.clusterHealthRequest(new String[]{}).waitForYellowStatus().waitForActiveShards(5)).actionGet();
assertThat(health.isTimedOut(), equalTo(false));
}
}
There are tests similar to this one for the percolator, which are supposed to test the same thing. From what I understand though, those are very particular about the cluster setup.. might it be that they can't catch this issue?
The text was updated successfully, but these errors were encountered:
The percolator can sometimes fail to match queries right after shard recovery. Version observed in: 1.5.1.
The percolator keeps all queries in an in-memory collection (shard by shard), which it reads from the index at startup. This is done by registering a listener to the IndicesLifecycle, with the listener loads the queries when afterIndexShardPostRecovery is called.
This seems to not block the shard to be reported as initialised, so sometimes this is not completed before the cluster returns a yellow status. Thus, if a request comes in before all queries have been loaded into the in-memory structure, the response will erroneously say that there were no matches.
I've been unable to create a predictively failing test for this. This test sometimes exposes the error (by not passing). For me, it fails every 5-6th time:
There are tests similar to this one for the percolator, which are supposed to test the same thing. From what I understand though, those are very particular about the cluster setup.. might it be that they can't catch this issue?
The text was updated successfully, but these errors were encountered: