Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArrayIndexOutOfBoundsException in murmur hash with has_child #8396

Closed
aleph-zero opened this issue Nov 7, 2014 · 15 comments
Closed

ArrayIndexOutOfBoundsException in murmur hash with has_child #8396

aleph-zero opened this issue Nov 7, 2014 · 15 comments
Assignees
Labels
>bug :Search/Search Search-related issues that do not fall into other categories

Comments

@aleph-zero
Copy link
Contributor

Running 1.4.0 it is possible to hang ES.

Steps to reproduce:
Three processes in a while(true) loop. One is using the _bulk API to insert (and update) a small random number of documents. The other two processes are executing fairly complex filteredQuery's.

I can run any of the two processes concurrently without problems, but when a the third one starts, ES fails with the exception below.

Other times, the exception doesn't occur but instead ES just hangs and I have to kill -9 its jvm.

[2014-11-07 16:46:36,175][DEBUG][action.search.type       ] [Hardcore] [2848] Failed to execute query phase
org.elasticsearch.search.query.QueryPhaseExecutionException: [xxx.public.test.idxtest][2]: query[filtered(ConstantScore(++cache(_xmin:[2882514 TO 2882514]) +cache(_cmin:[* TO 0}) +cache(_xmax:[0 TO 0]) +cache(_xmax:[2882514 TO 2882514]) +cache(_cmax:[0 TO *]) +cache(_xmin_is_committed:T) +cache(_xmax:[0 TO 0]) +cache(_xmax:[2882514 TO 2882514]) +cache(_cmax:[0 TO *]) +NotFilter(cache(_xmax:[2882514 TO 2882514])) +cache(_xmax_is_committed:F) +CustomQueryWrappingFilter(child_filter[data/xact](filtered(ConstantScore(cache(BooleanFilter(_field_names:id))))->cache(_type:data)))))->cache(+_type:xact +org.elasticsearch.index.search.nested.NonNestedDocsFilter@38f048bd)],from[0],size[32768]: Query Failed [Failed to execute main query]
    at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:163)
    at org.elasticsearch.search.SearchService.executeScan(SearchService.java:245)
    at org.elasticsearch.search.action.SearchServiceTransportAction$21.call(SearchServiceTransportAction.java:520)
    at org.elasticsearch.search.action.SearchServiceTransportAction$21.call(SearchServiceTransportAction.java:517)
    at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:559)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 13417
    at org.elasticsearch.search.internal.ContextIndexSearcher.createNormalizedWeight(ContextIndexSearcher.java:136)
    at org.elasticsearch.index.search.child.CustomQueryWrappingFilter.getDocIdSet(CustomQueryWrappingFilter.java:72)
    at org.elasticsearch.common.lucene.search.AndFilter.getDocIdSet(AndFilter.java:54)
    at org.elasticsearch.common.lucene.search.ApplyAcceptedDocsFilter.getDocIdSet(ApplyAcceptedDocsFilter.java:46)
    at org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:157)
    at org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:542)
    at org.apache.lucene.search.FilteredQuery$1.scorer(FilteredQuery.java:136)
    at org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:542)
    at org.apache.lucene.search.FilteredQuery$FilterStrategy.filteredBulkScorer(FilteredQuery.java:504)
    at org.apache.lucene.search.FilteredQuery$1.bulkScorer(FilteredQuery.java:150)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
    at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:309)
    at org.elasticsearch.search.scan.ScanContext.execute(ScanContext.java:52)
    at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:120)
    ... 7 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 13417
    at org.apache.lucene.util.StringHelper.murmurhash3_x86_32(StringHelper.java:205)
    at org.apache.lucene.util.StringHelper.murmurhash3_x86_32(StringHelper.java:229)
    at org.apache.lucene.util.BytesRef.hashCode(BytesRef.java:143)
    at org.elasticsearch.common.util.BytesRefHash.add(BytesRefHash.java:151)
    at org.elasticsearch.index.search.child.ParentIdsFilter.createShortCircuitFilter(ParentIdsFilter.java:67)
    at org.elasticsearch.index.search.child.ChildrenConstantScoreQuery.createWeight(ChildrenConstantScoreQuery.java:127)
    at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:684)
    at org.elasticsearch.search.internal.ContextIndexSearcher.createNormalizedWeight(ContextIndexSearcher.java:133)
    ... 21 more
@jpountz
Copy link
Contributor

jpountz commented Nov 9, 2014

This looks to me like field data for the _parent field data is buggy and returns an invalid BytesRef that has an offset+end that is greater than the length of the wrapped array. I will look into it.

@jpountz
Copy link
Contributor

jpountz commented Nov 10, 2014

I wrote a basic test hoping it could reproduce the issue, but got no luck. Here it is in case someone would like to iterate on it:

public class PCTests extends ElasticsearchIntegrationTest {

    private static final int NUM_PARENTS = 50000;
    private static final int NUM_CHILDREN = 500000;

    public void addOrUpdate() {
        final boolean parent = randomBoolean();
        if (parent) {
            final String parentId = Integer.toString(randomInt(NUM_PARENTS));
            client().prepareIndex("test", "parent", parentId).setSource(Collections.<String, Object>emptyMap()).get();
        } else {
            final int id = randomInt(NUM_CHILDREN);
            final int parentId = MathUtils.mod(MurmurHash3.hash(id), NUM_PARENTS);
            client().prepareIndex("test", "child", Integer.toString(id)).setParent(Integer.toString(parentId)).setSource("i", randomInt(1000)).get();
        }
    }

    public void query() {
        SearchResponse resp = client().prepareSearch("test").setSize(1).setQuery(QueryBuilders.hasChildQuery("child", QueryBuilders.rangeQuery("i").to(randomInt(1000)))).get();
        System.out.println(resp);
    }

    public void test() throws Exception {
        createIndex("test");
        client().admin().indices().preparePutMapping("test").setType("parent").setSource(JsonXContent.contentBuilder().startObject().startObject("parent").endObject().endObject()).get();
        client().admin().indices().preparePutMapping("test").setType("child").setSource(JsonXContent.contentBuilder().startObject().startObject("child")
                .startObject("_parent")
                .field("type", "parent")
            .endObject().endObject().endObject()).get();
        for (int i = 0; i <= NUM_PARENTS; ++i) {
            client().prepareIndex("test", "parent", Integer.toString(i)).setSource(Collections.<String, Object>emptyMap()).get();
        }

        final Thread[] indexingThreads = new Thread[3];
        final AtomicInteger running = new AtomicInteger(indexingThreads.length);
        for (int i = 0; i < indexingThreads.length; ++i) {
            indexingThreads[i] = new Thread() {
                public void run() {
                    for (int i = 0; i < 5 * (NUM_CHILDREN + NUM_PARENTS); ++i) {
                        addOrUpdate();
                    }
                    running.decrementAndGet();
                }
            };
        }
        for (Thread t : indexingThreads) {
            t.start();
        }
        while (running.get() > 0) {
            query();
            Thread.sleep(200);
        }
    }

}

Can you share more information about the setup that reproduces the issue:

  • how many parent/child relations are configured in the mappings, are there some types that are both parent and child?
  • how many parents and children in the index?
  • what does the query look like? Given the stack trace, I guess it uses the has_child query?
  • are there custom plugins installed?
  • is the issue still reproducible after restarting elasticsearch?
  • can you try to reproduce the failure with assertions enabled? (the -ea JVM option)

If sharing the index that triggers this behaviour is possible and the issue is easy to reproduce, that would be great.

@aleph-zero
Copy link
Contributor Author

I'm working with the reporter to get working code to reproduce.

@clintongormley
Copy link

@aleph-zero any news on this?

@s1monw
Copy link
Contributor

s1monw commented Nov 23, 2014

@aleph-zero ping?

@aleph-zero
Copy link
Contributor Author

@clintongormley @s1monw I never got a working example from the reporter. I'll ping him back and see if I can at least get the general steps to reproduce.

@aleph-zero
Copy link
Contributor Author

@clintongormley @s1monw Still trying to get a proof-of-concept working

@eeeebbbbrrrr
Copy link

Original reporter here...

@jpountz if you can provide some instruction how to actually run your test above (I have the ES repo cloned locally) I can fix it for you. It looks like you're on the right track.

I've provided Chris Earle @ ES a set of shell scripts that recreates the problem (internal support number 6313) by poking at ES with libcurl.

Looking at your code above, you want 1 thread doing updates and two (or more) threads querying (the reverse of what you have now), and you'll want to run the query threads for a few minutes before the exceptions start to appear.

To answer your questions from above:

The shell scripts I've provided to Chris Earle know how to create the exact index involved, with proper mappings, settings, and data.

@eeeebbbbrrrr
Copy link

So I figured out how to run the test. :) Looks like this re-creates it:

import com.carrotsearch.hppc.hash.MurmurHash3;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.common.math.MathUtils;
import org.elasticsearch.common.xcontent.json.JsonXContent;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.test.ElasticsearchIntegrationTest;
import org.junit.Test;

import java.util.Collections;
import java.util.concurrent.atomic.AtomicInteger;

public class PCTests extends ElasticsearchIntegrationTest {

    private static final int NUM_PARENTS = 50000;
    private static final int NUM_CHILDREN = 500000;
    private static final int MAX_QUERIES = 32768;

    public void addOrUpdate() {
        final boolean parent = randomBoolean();
        if (parent) {
            final String parentId = Integer.toString(randomInt(NUM_PARENTS));
            client().prepareIndex("test", "parent", parentId).setSource(Collections.<String, Object>emptyMap()).get();
        } else {
            final int id = randomInt(NUM_CHILDREN);
            final int parentId = MathUtils.mod(MurmurHash3.hash(id), NUM_PARENTS);
            client().prepareIndex("test", "child", Integer.toString(id)).setParent(Integer.toString(parentId)).setSource("i", randomInt(1000)).get();
        }
    }

    public void query() {
        SearchResponse resp = client().prepareSearch("test").setSize(1).setQuery(QueryBuilders.hasChildQuery("child", QueryBuilders.rangeQuery("i").to(randomInt(1000)))).get();
        System.out.println(resp);
    }

    @Test
    public void test() throws Exception {
        createIndex("test");
        client().admin().indices().preparePutMapping("test").setType("parent").setSource(JsonXContent.contentBuilder().startObject().startObject("parent").endObject().endObject()).get();
        client().admin().indices().preparePutMapping("test").setType("child").setSource(JsonXContent.contentBuilder().startObject().startObject("child")
                .startObject("_parent")
                .field("type", "parent")
                .endObject().endObject().endObject()).get();
        for (int i = 0; i <= NUM_PARENTS; ++i) {
            client().prepareIndex("test", "parent", Integer.toString(i)).setSource(Collections.<String, Object>emptyMap()).get();
        }

        logger.info("Bootstrapping index...");
        // bootstrap index with some data
        for (int i = 0; i <NUM_PARENTS; ++i) {
            addOrUpdate();
        }

        final Thread[] indexingThreads = new Thread[3];
        final AtomicInteger running = new AtomicInteger(indexingThreads.length);
        for (int i = 0; i < indexingThreads.length; ++i) {
            indexingThreads[i] = new Thread() {
                public void run() {
                    for (int i = 0; i < MAX_QUERIES; ++i) {
                        query();
                    }
                    running.decrementAndGet();
                }
            };
        }

        logger.info("Starting test");
        for (Thread t : indexingThreads) {
            t.start();
        }
        while (running.get() > 0) {
            addOrUpdate();
            Thread.sleep(200);
        }
    }

}

I think IDEA's output window truncated the exception, but you can see that it's there:

{
  "took" : 15,
  "timed_out" : false,
  "_shards" : {
    "total" : 10,
    "successful" : 9,
    "failed" : 1,
    "failures" : [ {
      "index" : "test",
      "shard" : 9,
      "status" : 500,
      "reason" : "RemoteTransportException[[node_2][local[3]][indices:data/read/search[phase/query/id]]]; nested: QueryPhaseExecutionException[[test][9]: query[child_filter[child/parent](filtered(i:[* TO 204])->_type:child)],from[0],size[1]: Query Failed [Failed to execute main query]]; nested: RuntimeException[java.lang.AssertionError]; nested: AssertionError; "
    } ]
  },
  "hits" : {
    "total" : 4254,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "test",
      "_type" : "parent",
      "_id" : "9280",
      "_score" : 1.0,
      "_source" : { }
    } ]
  }
}

@eeeebbbbrrrr
Copy link

Oh, it's worth mentioning that ES does not hang. My original report to @aleph-zero said it did, but further diagnosis on my end found that it was my code that was looping indefinitely when it encountered the side effects of this exception (scan+scroll not returning the expected # of results, for example).

The AIOOB is very real, however. :)

jpountz added a commit to jpountz/elasticsearch that referenced this issue Dec 22, 2014
`_parent` field data mistakenly shared some stateful data-structures across
threads.

Close elastic#8396
@jpountz
Copy link
Contributor

jpountz commented Dec 22, 2014

Thanks for making the bug reproducible, it helped a lot!

@eeeebbbbrrrr
Copy link

I built your fix/ branch and can confirm this indeed fixes up my standalone test cases.

I'm glad you wrapped ParentChildIndexFieldData.java in a threaded test as it definitely missed the concurrency boat. Makes me wonder how it ever worked?

@jpountz
Copy link
Contributor

jpountz commented Dec 22, 2014

This is a bug that I introduced in elasticsearch 1.4 since we almost completely rewrote fielddata to be more in-line with Lucene (this removed a lot of useless wrapping when using doc values, and helped improve the performance of doc values when used from elasticsearch).

@eeeebbbbrrrr
Copy link

That makes sense. I didn't dig through the commit history but I was thinking that the concurrency responsibly had to be happening elsewhere at some point in the past.

Thanks for taking care of this so quickly. Happy Holidays!

@jpountz
Copy link
Contributor

jpountz commented Dec 24, 2014

You're welcome, happy holidays to you too!

jpountz added a commit that referenced this issue Dec 24, 2014
`_parent` field data mistakenly shared some stateful data-structures across
threads.

Close #8396
jpountz added a commit that referenced this issue Dec 24, 2014
`_parent` field data mistakenly shared some stateful data-structures across
threads.

Close #8396
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
`_parent` field data mistakenly shared some stateful data-structures across
threads.

Close elastic#8396
@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Parent/Child labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants