Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrolling with has_child filter returns no hits on 2nd request #4703

Closed
joedj opened this issue Jan 13, 2014 · 4 comments
Closed

Scrolling with has_child filter returns no hits on 2nd request #4703

joedj opened this issue Jan 13, 2014 · 4 comments

Comments

@joedj
Copy link

joedj commented Jan 13, 2014

When using scroll with a has_child filter, the initial request returns the correct total number of hits, but subsequent requests return no hits.

It looks like this problem was introduced in 0.90.6, and still occurs in 0.90.10. 0.90.5 works as expected.

The number of documents seems to play a part - in my initial test cases with only 2 parent documents, I couldn't reproduce the issue. However, creating 100 parents does reliably reproduce it. In my testing, 8 parent documents worked fine, but 9 did not.

It sounds very similar to the issue mentioned here: http://elasticsearch-users.115913.n3.nabble.com/No-hit-using-scan-scroll-with-has-parent-filter-td4047236.html

Here's a test script (requires jq(1) to grab the scroll ID from the first JSON result):

#!/bin/sh

HOST='localhost:9200'
INDEX='test_scroll_jj'
CURL="curl -q --ipv4 --silent --show-error --fail"

$CURL -XDELETE "$HOST/${INDEX}?pretty=true" >/dev/null
$CURL -XPOST "$HOST/${INDEX}/?pretty=true" -d '
{
    "mappings": {
        "homes":{
            "_parent":{
                "type" : "person"
            }
        }
    }
}' >/dev/null

for x in {1..100}; do # in my testing, 8 docs works, 9 fails
    $CURL -XPUT "$HOST/${INDEX}/person/$x/?pretty=true" -d '{}' >/dev/null
    $CURL -XPOST "$HOST/${INDEX}/homes?parent=$x&pretty=true" -d '{}' >/dev/null
done

$CURL -XPOST "$HOST/${INDEX}/_refresh?pretty=true" >/dev/null

echo "REQUEST ONE:"
SCROLL_RESULT=$($CURL -v -XPOST "http://$HOST/${INDEX}/person/_search?pretty=true&scroll=30s" -d'
{
    "size" : 1,
    "fields" : ["_id"],
    "query" : {
        "filtered" : {
            "filter" : {
                "has_child" : {
                    "type" : "homes",
                    "query" : {
                        "match_all" : {}
                    }
                }
            }
        }
    }
}')
echo $SCROLL_RESULT

scroll_id=$(echo $SCROLL_RESULT | jq -r '.["_scroll_id"]')

echo
echo "REQUEST TWO:"
$CURL -v "http://$HOST/_search/scroll?scroll=30s&scroll_id=$scroll_id&pretty=true"

The failing output on 0.90.10:

/tmp|⇒  /tmp/scrollbug.sh
REQUEST ONE:
* About to connect() to localhost port 9200 (#0)
*   Trying 127.0.0.1...
* Adding handle: conn: 0x7fb832006e00
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7fb832006e00) send_pipe: 1, recv_pipe: 0
* Connected to localhost (127.0.0.1) port 9200 (#0)
> POST /test_scroll_jj/person/_search?pretty=true&scroll=30s HTTP/1.1
> User-Agent: curl/7.32.0
> Host: localhost:9200
> Accept: */*
> Content-Length: 321
> Content-Type: application/x-www-form-urlencoded
>
} [data not shown]
* upload completely sent off: 321 out of 321 bytes
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=UTF-8
< Content-Length: 520
<
{ [data not shown]
* Connection #0 to host localhost left intact
{ "_scroll_id" : "cXVlcnlUaGVuRmV0Y2g7NTs2OkM5SXlBenNyU0lXR21uX3JsN25XcHc7NzpDOUl5QXpzclNJV0dtbl9ybDduV3B3Ozg6QzlJeUF6c3JTSVdHbW5fcmw3bldwdzs5OkM5SXlBenNyU0lXR21uX3JsN25XcHc7MTA6QzlJeUF6c3JTSVdHbW5fcmw3bldwdzswOw==", "took" : 5, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 100, "max_score" : 1.0, "hits" : [ { "_index" : "test_scroll_jj", "_type" : "person", "_id" : "2", "_score" : 1.0 } ] } }

REQUEST TWO:
* About to connect() to localhost port 9200 (#0)
*   Trying 127.0.0.1...
* Adding handle: conn: 0x7f8589806e00
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7f8589806e00) send_pipe: 1, recv_pipe: 0
* Connected to localhost (127.0.0.1) port 9200 (#0)
> GET /_search/scroll?scroll=30s&scroll_id=cXVlcnlUaGVuRmV0Y2g7NTs2OkM5SXlBenNyU0lXR21uX3JsN25XcHc7NzpDOUl5QXpzclNJV0dtbl9ybDduV3B3Ozg6QzlJeUF6c3JTSVdHbW5fcmw3bldwdzs5OkM5SXlBenNyU0lXR21uX3JsN25XcHc7MTA6QzlJeUF6c3JTSVdHbW5fcmw3bldwdzswOw==&pretty=true HTTP/1.1
> User-Agent: curl/7.32.0
> Host: localhost:9200
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=UTF-8
< Content-Length: 410
<
{
  "_scroll_id" : "cXVlcnlUaGVuRmV0Y2g7NTs2OkM5SXlBenNyU0lXR21uX3JsN25XcHc7NzpDOUl5QXpzclNJV0dtbl9ybDduV3B3Ozg6QzlJeUF6c3JTSVdHbW5fcmw3bldwdzs5OkM5SXlBenNyU0lXR21uX3JsN25XcHc7MTA6QzlJeUF6c3JTSVdHbW5fcmw3bldwdzswOw==",
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

And the expected output as per 0.90.5:

/tmp|⇒  /tmp/scrollbug.sh
REQUEST ONE:
* About to connect() to localhost port 9200 (#0)
*   Trying 127.0.0.1...
* Adding handle: conn: 0x7fd04a006e00
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7fd04a006e00) send_pipe: 1, recv_pipe: 0
* Connected to localhost (127.0.0.1) port 9200 (#0)
> POST /test_scroll_jj/person/_search?pretty=true&scroll=30s HTTP/1.1
> User-Agent: curl/7.32.0
> Host: localhost:9200
> Accept: */*
> Content-Length: 321
> Content-Type: application/x-www-form-urlencoded
>
} [data not shown]
* upload completely sent off: 321 out of 321 bytes
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=UTF-8
< Content-Length: 523
<
{ [data not shown]
* Connection #0 to host localhost left intact
{ "_scroll_id" : "cXVlcnlUaGVuRmV0Y2g7NTsyMTpILV9IUWU2MlRrUzQyd2JRYzZLS3dROzIzOkgtX0hRZTYyVGtTNDJ3YlFjNktLd1E7MjI6SC1fSFFlNjJUa1M0MndiUWM2S0t3UTsyNDpILV9IUWU2MlRrUzQyd2JRYzZLS3dROzI1OkgtX0hRZTYyVGtTNDJ3YlFjNktLd1E7MDs=", "took" : 9, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 100, "max_score" : 1.0, "hits" : [ { "_index" : "test_scroll_jj", "_type" : "person", "_id" : "2", "_score" : 1.0 } ] } }

REQUEST TWO:
* About to connect() to localhost port 9200 (#0)
*   Trying 127.0.0.1...
* Adding handle: conn: 0x7f8a92006e00
* Adding handle: send: 0
* Adding handle: recv: 0
* Curl_addHandleToPipeline: length: 1
* - Conn 0 (0x7f8a92006e00) send_pipe: 1, recv_pipe: 0
* Connected to localhost (127.0.0.1) port 9200 (#0)
> GET /_search/scroll?scroll=30s&scroll_id=cXVlcnlUaGVuRmV0Y2g7NTsyMTpILV9IUWU2MlRrUzQyd2JRYzZLS3dROzIzOkgtX0hRZTYyVGtTNDJ3YlFjNktLd1E7MjI6SC1fSFFlNjJUa1M0MndiUWM2S0t3UTsyNDpILV9IUWU2MlRrUzQyd2JRYzZLS3dROzI1OkgtX0hRZTYyVGtTNDJ3YlFjNktLd1E7MDs=&pretty=true HTTP/1.1
> User-Agent: curl/7.32.0
> Host: localhost:9200
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=UTF-8
< Content-Length: 523
<
{
  "_scroll_id" : "cXVlcnlUaGVuRmV0Y2g7NTsyMTpILV9IUWU2MlRrUzQyd2JRYzZLS3dROzIzOkgtX0hRZTYyVGtTNDJ3YlFjNktLd1E7MjI6SC1fSFFlNjJUa1M0MndiUWM2S0t3UTsyNDpILV9IUWU2MlRrUzQyd2JRYzZLS3dROzI1OkgtX0hRZTYyVGtTNDJ3YlFjNktLd1E7MDs=",
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 100,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "test_scroll_jj",
      "_type" : "person",
      "_id" : "7",
      "_score" : 1.0
    } ]
  }
}
@deverton
Copy link
Contributor

I've written this up as a an integration test and used git bisect to try and track down where this broke between 0.90.5 and 0.90.6. It looks like this commit 9950e44 seems to be the culprit.

This is the test method I'm using.

    @Test
    public void simpleScrolledHasChildFilteredQuery() throws Exception {
        client().admin().indices().prepareCreate("test")
                .setSettings(ImmutableSettings.settingsBuilder().put("index.number_of_shards", 1).put("index.number_of_replicas", 0))
                .execute().actionGet();
        client().admin().cluster().prepareHealth().setWaitForEvents(Priority.LANGUID).setWaitForGreenStatus().execute().actionGet();
        client().admin()
                .indices()
                .preparePutMapping("test")
                .setType("child")
                .setSource(
                        jsonBuilder().startObject().startObject("child").startObject("_parent").field("type", "parent").endObject()
                                .endObject().endObject()).execute().actionGet();


        for (int i = 0; i < 10; i++) {
            client().prepareIndex("test", "parent", "p" + i).setSource("{}").execute().actionGet();
            client().prepareIndex("test", "child", "c" + i).setSource("{}").setParent("p" + i).execute().actionGet();
        }

        client().admin().indices().prepareRefresh().execute().actionGet();

        final SearchResponse scrollResponse = client().prepareSearch("test")
                .setScroll(TimeValue.timeValueSeconds(30))
                .setSize(1)
                .addField("_id")
                .setTypes("parent")
                .setQuery(filteredQuery(matchAllQuery(), FilterBuilders.hasChildFilter("child", matchAllQuery())))
                .execute()
                .actionGet();

        final SearchResponse firstScroll = client().prepareSearchScroll(scrollResponse.getScrollId()).setScroll(TimeValue.timeValueSeconds(30)).execute().actionGet();
        final SearchResponse secondScroll = client().prepareSearchScroll(firstScroll.getScrollId()).setScroll(TimeValue.timeValueSeconds(30)).execute().actionGet();

        client().prepareClearScroll().addScrollId(secondScroll.getScrollId()).execute().actionGet();

        assertThat(scrollResponse.getFailedShards(), equalTo(0));
        assertThat(scrollResponse.getHits().totalHits(), equalTo(10l));

        assertThat(firstScroll.getFailedShards(), equalTo(0));
        assertThat(firstScroll.getHits().getHits().length, equalTo(1));

        assertThat(secondScroll.getFailedShards(), equalTo(0));
        assertThat(secondScroll.getHits().getHits().length, equalTo(1));
    }

@ghost ghost assigned martijnvg Jan 14, 2014
@martijnvg
Copy link
Member

Nice catch!

I further looked into this issue and this error only seems to occur with the has_child or has_parent filter, but not with the has_child / has_parent query.

@govindm
Copy link

govindm commented Jan 15, 2014

Thanks for fixing it.

martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Jan 16, 2014
…nts instead of keeping the weight around and build a DocIdSet when a segment is being processed. This fixes issues where the has_child / has_parent filter produce no results or errors on subsequent scan requests.

Also made CustomQueryWrappingFilter implement Releasable in order to cleanup the pre-computed DocIdSets.

Closes elastic#4703
martijnvg added a commit that referenced this issue Jan 20, 2014
…nts instead of keeping the weight around and build a DocIdSet when a s

 Also made CustomQueryWrappingFilter implement Releasable in order to cleanup the pre-computed DocIdSets.

 Closes #4703
mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015
…nts instead of keeping the weight around and build a DocIdSet when a s

 Also made CustomQueryWrappingFilter implement Releasable in order to cleanup the pre-computed DocIdSets.

 Closes elastic#4703
@gcabero
Copy link

gcabero commented Jun 13, 2018

This issue seems to still happening in 6.2? Different API but not results are fetched when combining post filter with scrolling

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants