Relocation of shards causes bulk indexing client to hang #1839

snazy · 2012-04-03T13:18:29Z

I have set up 4 big servers (lots of cores, lots of disks, lots of ram) - each running an elasticsearch node.
One client reads rows from a database and continuously submits indexing requests to the cluster. Indexing requests are bundled into bulk requests with 2500 indexing requests.
The index has 32 shards.
My client is using the Java client API.

So far so good.

I just wanted to know what happens, if I shutdown a node and restart it again.
Shutdown works fine (except: see below).
Restart works fine...
Until the cluster starts to relocate shards.

When a bulk request "hits" a shard being relocated, the cliend hangs forever.

I have tried several networking settings, transport client vs. node client - nothing helped.

One thing fixed the issue for me:
Previously, the code was:

            Client client = (TransportClient)...
            BulkRequestBuilder bulk = client.prepareBulk();
            for ( 1 to 2500 ) {
                IndexRequestBuilder request = buildIndexingRequest();
                request = request.setReplicationType(ReplicationType.ASYNC); // no effect
                bulk.add(request);
            }
            BulkResponse response;
            response = bulk.execute().actionGet(); // <--- RETURNS NEVER, IF SHARD IS RELOCATED
            if (response != null && response.hasFailures()) {
                // some error handling...
            }

When I use actionGet(timeout) with a timeout, the method throws a ElasticSearchTimeoutException in such a situation and I can submit the bulk request again.

                    while (true) {
                        try {
                            response = bulk.execute().actionGet(getRetryTimeout()); // <--- TIMES OUT, IF SHARD IS RELOCATED
                            break;
                        }
                        catch (ElasticSearchTimeoutException timeout) {
                            warning("TIMEOUT", timeout, null, null);
                        }
                    }

In such a situation I see no activity in the elasticsearch threads and no activity in "my" calling thread - it just waits in org.elasticsearch.common.util.concurrent.BaseFuture.Sync#acquireSharedInterruptibly forever.

None of the cluster log files indicate an error.

I do not know if this behaviour affects searches.

The text was updated successfully, but these errors were encountered:

snazy · 2012-04-03T13:21:39Z

Verified with elasticsearch 0.18.7 and 0.19.1

kimchy · 2012-04-04T14:22:07Z

Hey, can you help write a standalone test case, and the scenario (i.e. start 4 nodes, restart one node while test case is bulk indexing data), that recreates it? It will help speed things up to see where the problem is.

snazy · 2012-04-05T10:45:05Z

OK

snazy · 2012-04-05T11:47:48Z

OK - here it is.

Just edit and execute the class (with a dependency to elasticsearch 0.19.1 jar).

Setup a cluster with 4 nodes
Start the class (it will automatically re-create index and mapping)
Wait until the index has about 1000000 docs (means: relocation of the shards will take some time)
Gracefully stop one node (I did it using elasticsearch head's SHUTDOWN functionality)
Main class still runs
Restart the stopped node
Node appears (without any shards)
Some shards will be relocated (purple color in elasticsearch head)
Bang - indexer hangs ... forever

Here's the code:

package org.elasticsearch.issue1839;

import org.elasticsearch.action.bulk.BulkRequestBuilder;
import org.elasticsearch.action.index.IndexRequestBuilder;
import org.elasticsearch.client.IndicesAdminClient;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.ImmutableSettings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.indices.IndexMissingException;
import org.elasticsearch.node.NodeBuilder;

import java.util.Date;
import java.util.Random;

public final class Main {
public static void main(String[] args) {
new Main().doit();
}

static final int THREADS = 4;
static final String CLUSTER = "elasticsearch";
static final String NAME = "issue1839";

private void doit() {

    NodeBuilder nodeBuilder = NodeBuilder.nodeBuilder().client(true);
    ImmutableSettings.Builder settings = nodeBuilder.settings();
    settings = settings.
            put("cluster.name", CLUSTER).//
            //
            put("client.transport.sniff", "false").//
            put("transport.tcp.compress", true).//
            put("transport.tcp.connect_timeout", "10s").//
            put("network.tcp.keep_alive", "true").//
            put("network.tcp.send_buffer_size", "64k").//
            put("network.tcp.receive_buffer_size", "64k");

    TransportClient client = new TransportClient(settings);

    client.addTransportAddress(new InetSocketTransportAddress("10.40.101.211", 9300));
    client.addTransportAddress(new InetSocketTransportAddress("10.40.101.212", 9300));
    client.addTransportAddress(new InetSocketTransportAddress("10.40.101.213", 9300));
    client.addTransportAddress(new InetSocketTransportAddress("10.40.101.214", 9300));

    IndicesAdminClient indicesAdmin = client.admin().indices();

    try {
        indicesAdmin.prepareDelete(NAME).execute().actionGet();
    }
    catch (IndexMissingException ignore) {
        //
    }

    String indexSettings = "{\"index\": {" +//
            "    \"number_of_shards\" : \"32\"," +//
            "    \"number_of_replicas\" : \"1\"" +//
            "  }" +//
            '}';
    indicesAdmin.prepareCreate(NAME).setSettings(indexSettings).execute().actionGet();

    String indexMapping = "{\"" + NAME + "\": {" +//
            "  \"properties\": {" +//
            "    \"my_text_1\" : {" +//
            "      \"type\" : \"string\"," +//
            "      \"store\" : \"yes\"," +//
            "      \"index\" : \"analyzed\"," +//
            "      \"include_in_all\" : \"true\"" +//
            "    }," +//
            "    \"my_text_2\" : {" +//
            "      \"type\" : \"string\"," +//
            "      \"store\" : \"yes\"," +//
            "      \"index\" : \"analyzed\"," +//
            "      \"include_in_all\" : \"true\"" +//
            "    }," +//
            "    \"my_text_3\" : {" +//
            "      \"type\" : \"string\"," +//
            "      \"store\" : \"yes\"," +//
            "      \"index\" : \"analyzed\"," +//
            "      \"include_in_all\" : \"true\"" +//
            "    }," +//
            "    \"when\" : {" +//
            "      \"type\" : \"date\"," +//
            "      \"store\" : \"yes\"," +//
            "      \"include_in_all\" : \"true\"" +//
            "    }" +//
            "  }" +// properties
            '}' +// issue1839
            '}';
    indicesAdmin.preparePutMapping(NAME).setType(NAME).setSource(indexMapping).execute().actionGet();

    for (int n = 0; n < THREADS; n++)
        new Thread(new Indexer(client), "indexer#" + n).start();

    while (true)
        try {
            Thread.sleep(500);
        }
        catch (InterruptedException e) {
            break;
        }
}

@SuppressWarnings("UseOfSystemOutOrSystemErr")
static final class Indexer implements Runnable {
    private final TransportClient client;
    private final Random rand = new Random(System.currentTimeMillis() + System.nanoTime());

    Indexer(TransportClient client) {
        this.client = client;
    }

    @Override
    public void run() {
        while (true) {
            try {
                Thread.sleep(10);
            }
            catch (InterruptedException e) {
                break;
            }

            BulkRequestBuilder bulk = client.prepareBulk();
            for (int n = 0; n < 2500; n++)
                bulk.add(createIndexRequest());
            System.out.println(new Date().toString() + " : " + Thread.currentThread().getName() + " indexing 2500 docs");
            bulk.execute().actionGet();
            System.out.println(new Date().toString() + " : " + Thread.currentThread().getName() + " indexed 2500 docs");
        }
    }

    private IndexRequestBuilder createIndexRequest() {
        StringBuilder document = new StringBuilder().//
                append('{').//
                append("  \"").append(NAME).append("\" : {").//
                append("    \"my_text_1\" : \"").append(createSomeText()).append("\", ").//
                append("    \"my_text_2\" : \"").append(createSomeText()).append("\", ").//
                append("    \"my_text_3\" : \"").append(createSomeText()).append("\", ").//
                append("    \"when\" : \"").append(System.currentTimeMillis()).append("\" ").//
                append("  }").//
                append('}');
        return client.prepareIndex(NAME, NAME).setSource(document.toString());
    }

    private String createSomeText() {
        StringBuilder text = new StringBuilder();
        for (int n = 0; n < rand.nextInt(40) + 3; n++) {
            for (int m = 0; m < rand.nextInt(15) + 3; m++)
                text.append((char) (rand.nextInt(26) + 65));
            text.append(' ');
        }
        return text.append(rand.nextInt(10000000)).toString();
    }
}

}

kimchy · 2012-04-05T17:24:49Z

Hi, thanks for the recreation, I managed to recreate it locally as well. I found the problem, it revolves around not properly handling a relocation of a primary shard when just when the one we relocated from gets closed. I will post a fix in both 0.19 and master branches (closing this issue in the commit, so we can keep track of the change). If you can check it yourself as well it would be great.

snazy · 2012-04-05T19:48:18Z

Cool - that was fast :-)

I'll try it when 0.19.3 is released - so I can rollback my "timeout loop".

…tic#1839.

kimchy closed this as completed in 824b0bd Apr 5, 2012

madheshr mentioned this issue Nov 20, 2013

bulkRequest.execute().actionGet() Does not return #4214

Closed

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

Relocation of shards causes bulk indexing client to hang, closes elas…

65ddfe8

…tic#1839.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relocation of shards causes bulk indexing client to hang #1839

Relocation of shards causes bulk indexing client to hang #1839

snazy commented Apr 3, 2012

snazy commented Apr 3, 2012

kimchy commented Apr 4, 2012

snazy commented Apr 5, 2012

snazy commented Apr 5, 2012

kimchy commented Apr 5, 2012

snazy commented Apr 5, 2012

Relocation of shards causes bulk indexing client to hang #1839

Relocation of shards causes bulk indexing client to hang #1839

Comments

snazy commented Apr 3, 2012

snazy commented Apr 3, 2012

kimchy commented Apr 4, 2012

snazy commented Apr 5, 2012

snazy commented Apr 5, 2012

kimchy commented Apr 5, 2012

snazy commented Apr 5, 2012