New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ArrayIndexOutOfBoundsException in murmur hash with has_child #8396
Comments
This looks to me like field data for the _parent field data is buggy and returns an invalid BytesRef that has an |
I wrote a basic test hoping it could reproduce the issue, but got no luck. Here it is in case someone would like to iterate on it: public class PCTests extends ElasticsearchIntegrationTest {
private static final int NUM_PARENTS = 50000;
private static final int NUM_CHILDREN = 500000;
public void addOrUpdate() {
final boolean parent = randomBoolean();
if (parent) {
final String parentId = Integer.toString(randomInt(NUM_PARENTS));
client().prepareIndex("test", "parent", parentId).setSource(Collections.<String, Object>emptyMap()).get();
} else {
final int id = randomInt(NUM_CHILDREN);
final int parentId = MathUtils.mod(MurmurHash3.hash(id), NUM_PARENTS);
client().prepareIndex("test", "child", Integer.toString(id)).setParent(Integer.toString(parentId)).setSource("i", randomInt(1000)).get();
}
}
public void query() {
SearchResponse resp = client().prepareSearch("test").setSize(1).setQuery(QueryBuilders.hasChildQuery("child", QueryBuilders.rangeQuery("i").to(randomInt(1000)))).get();
System.out.println(resp);
}
public void test() throws Exception {
createIndex("test");
client().admin().indices().preparePutMapping("test").setType("parent").setSource(JsonXContent.contentBuilder().startObject().startObject("parent").endObject().endObject()).get();
client().admin().indices().preparePutMapping("test").setType("child").setSource(JsonXContent.contentBuilder().startObject().startObject("child")
.startObject("_parent")
.field("type", "parent")
.endObject().endObject().endObject()).get();
for (int i = 0; i <= NUM_PARENTS; ++i) {
client().prepareIndex("test", "parent", Integer.toString(i)).setSource(Collections.<String, Object>emptyMap()).get();
}
final Thread[] indexingThreads = new Thread[3];
final AtomicInteger running = new AtomicInteger(indexingThreads.length);
for (int i = 0; i < indexingThreads.length; ++i) {
indexingThreads[i] = new Thread() {
public void run() {
for (int i = 0; i < 5 * (NUM_CHILDREN + NUM_PARENTS); ++i) {
addOrUpdate();
}
running.decrementAndGet();
}
};
}
for (Thread t : indexingThreads) {
t.start();
}
while (running.get() > 0) {
query();
Thread.sleep(200);
}
}
} Can you share more information about the setup that reproduces the issue:
If sharing the index that triggers this behaviour is possible and the issue is easy to reproduce, that would be great. |
I'm working with the reporter to get working code to reproduce. |
@aleph-zero any news on this? |
@aleph-zero ping? |
@clintongormley @s1monw I never got a working example from the reporter. I'll ping him back and see if I can at least get the general steps to reproduce. |
@clintongormley @s1monw Still trying to get a proof-of-concept working |
Original reporter here... @jpountz if you can provide some instruction how to actually run your test above (I have the ES repo cloned locally) I can fix it for you. It looks like you're on the right track. I've provided Chris Earle @ ES a set of shell scripts that recreates the problem (internal support number 6313) by poking at ES with libcurl. Looking at your code above, you want 1 thread doing updates and two (or more) threads querying (the reverse of what you have now), and you'll want to run the query threads for a few minutes before the exceptions start to appear. To answer your questions from above:
The shell scripts I've provided to Chris Earle know how to create the exact index involved, with proper mappings, settings, and data. |
So I figured out how to run the test. :) Looks like this re-creates it: import com.carrotsearch.hppc.hash.MurmurHash3;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.common.math.MathUtils;
import org.elasticsearch.common.xcontent.json.JsonXContent;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.test.ElasticsearchIntegrationTest;
import org.junit.Test;
import java.util.Collections;
import java.util.concurrent.atomic.AtomicInteger;
public class PCTests extends ElasticsearchIntegrationTest {
private static final int NUM_PARENTS = 50000;
private static final int NUM_CHILDREN = 500000;
private static final int MAX_QUERIES = 32768;
public void addOrUpdate() {
final boolean parent = randomBoolean();
if (parent) {
final String parentId = Integer.toString(randomInt(NUM_PARENTS));
client().prepareIndex("test", "parent", parentId).setSource(Collections.<String, Object>emptyMap()).get();
} else {
final int id = randomInt(NUM_CHILDREN);
final int parentId = MathUtils.mod(MurmurHash3.hash(id), NUM_PARENTS);
client().prepareIndex("test", "child", Integer.toString(id)).setParent(Integer.toString(parentId)).setSource("i", randomInt(1000)).get();
}
}
public void query() {
SearchResponse resp = client().prepareSearch("test").setSize(1).setQuery(QueryBuilders.hasChildQuery("child", QueryBuilders.rangeQuery("i").to(randomInt(1000)))).get();
System.out.println(resp);
}
@Test
public void test() throws Exception {
createIndex("test");
client().admin().indices().preparePutMapping("test").setType("parent").setSource(JsonXContent.contentBuilder().startObject().startObject("parent").endObject().endObject()).get();
client().admin().indices().preparePutMapping("test").setType("child").setSource(JsonXContent.contentBuilder().startObject().startObject("child")
.startObject("_parent")
.field("type", "parent")
.endObject().endObject().endObject()).get();
for (int i = 0; i <= NUM_PARENTS; ++i) {
client().prepareIndex("test", "parent", Integer.toString(i)).setSource(Collections.<String, Object>emptyMap()).get();
}
logger.info("Bootstrapping index...");
// bootstrap index with some data
for (int i = 0; i <NUM_PARENTS; ++i) {
addOrUpdate();
}
final Thread[] indexingThreads = new Thread[3];
final AtomicInteger running = new AtomicInteger(indexingThreads.length);
for (int i = 0; i < indexingThreads.length; ++i) {
indexingThreads[i] = new Thread() {
public void run() {
for (int i = 0; i < MAX_QUERIES; ++i) {
query();
}
running.decrementAndGet();
}
};
}
logger.info("Starting test");
for (Thread t : indexingThreads) {
t.start();
}
while (running.get() > 0) {
addOrUpdate();
Thread.sleep(200);
}
}
} I think IDEA's output window truncated the exception, but you can see that it's there: {
"took" : 15,
"timed_out" : false,
"_shards" : {
"total" : 10,
"successful" : 9,
"failed" : 1,
"failures" : [ {
"index" : "test",
"shard" : 9,
"status" : 500,
"reason" : "RemoteTransportException[[node_2][local[3]][indices:data/read/search[phase/query/id]]]; nested: QueryPhaseExecutionException[[test][9]: query[child_filter[child/parent](filtered(i:[* TO 204])->_type:child)],from[0],size[1]: Query Failed [Failed to execute main query]]; nested: RuntimeException[java.lang.AssertionError]; nested: AssertionError; "
} ]
},
"hits" : {
"total" : 4254,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test",
"_type" : "parent",
"_id" : "9280",
"_score" : 1.0,
"_source" : { }
} ]
}
} |
Oh, it's worth mentioning that ES does not hang. My original report to @aleph-zero said it did, but further diagnosis on my end found that it was my code that was looping indefinitely when it encountered the side effects of this exception (scan+scroll not returning the expected # of results, for example). The AIOOB is very real, however. :) |
`_parent` field data mistakenly shared some stateful data-structures across threads. Close elastic#8396
Thanks for making the bug reproducible, it helped a lot! |
I built your fix/ branch and can confirm this indeed fixes up my standalone test cases. I'm glad you wrapped ParentChildIndexFieldData.java in a threaded test as it definitely missed the concurrency boat. Makes me wonder how it ever worked? |
This is a bug that I introduced in elasticsearch 1.4 since we almost completely rewrote fielddata to be more in-line with Lucene (this removed a lot of useless wrapping when using doc values, and helped improve the performance of doc values when used from elasticsearch). |
That makes sense. I didn't dig through the commit history but I was thinking that the concurrency responsibly had to be happening elsewhere at some point in the past. Thanks for taking care of this so quickly. Happy Holidays! |
You're welcome, happy holidays to you too! |
`_parent` field data mistakenly shared some stateful data-structures across threads. Close #8396
`_parent` field data mistakenly shared some stateful data-structures across threads. Close #8396
`_parent` field data mistakenly shared some stateful data-structures across threads. Close elastic#8396
Running 1.4.0 it is possible to hang ES.
Steps to reproduce:
Three processes in a while(true) loop. One is using the _bulk API to insert (and update) a small random number of documents. The other two processes are executing fairly complex filteredQuery's.
I can run any of the two processes concurrently without problems, but when a the third one starts, ES fails with the exception below.
Other times, the exception doesn't occur but instead ES just hangs and I have to kill -9 its jvm.
The text was updated successfully, but these errors were encountered: