Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alias with filter is not reliable #10135

Closed
mikiot opened this issue Mar 18, 2015 · 15 comments
Closed

Alias with filter is not reliable #10135

mikiot opened this issue Mar 18, 2015 · 15 comments
Labels
>bug :Data Management/Indices APIs APIs to create and manage indices and templates

Comments

@mikiot
Copy link

mikiot commented Mar 18, 2015

I am using elasticsearch 1.4.4 and I want to create a filtered alias.
The index contains 3 types A, B and C. B and C are children to A. The filter extracts from the index all the types applying some conditions.For one of the types (let set it's A) the filter contains a nested and a has_child filter.
The alias gets created and works fine until I try to update the filter. I need to update the filter because
some entities are soft deleted, but the alias should not return them. In this case the filter changes to something like extract all documents with type A where B deleted date > some date.

After I update the filter and try to query for A with a simple search it sometimes return all A (applying the filter) and sometimes it returns only some of the documents, excluding others that should be returned. This happens on consecutive queries sent seconds apart. Do you have any ideas on what happens? Would it be helpful to post the filter?

The filter works fine if not used in the alias context.

@javanna
Copy link
Member

javanna commented Mar 18, 2015

Hi @mikiot can you please post a complete curl recreation of the issue you are facing so we can have a look at what's causing it?

@mikiot
Copy link
Author

mikiot commented Mar 20, 2015

  1. I notices that sometimes when I run a search on the alias I receive this
    "QueryPhaseExecutionException[[esbug][0]: query[ConstantScore(+cache(org.elasticsearch.index.search.nested.NonNestedDocsFilter@16c808f0) +BooleanFilter(BooleanFilter(+cache(_type:person) +QueryWrapperFilter(ToParentBlockJoinQuery (filtered(ConstantScore(cache(address.number:[2 TO 2])))->random_access(_type:__address))) +CustomQueryWrappingFilter(child_filter[order/person](filtered%28ConstantScore%28BooleanFilter%28+cache%28_type:order%29 NotFilter%28cache%28_parent:person#AUw3EnRZSNvEdeURvB8t%29%29 BooleanFilter%28+cache%28_parent:person#AUw3EnRZSNvEdeURvB8t%29 +org.elasticsearch.index.query.QueryParseContext$1@3c7dd88c%29%29%29%29->cache%28_type:order%29))) BooleanFilter(+cache(_type:order) NotFilter(cache(_parent:person#AUw3EnRZSNvEdeURvB8t)) BooleanFilter(+cache(_parent:person#AUw3EnRZSNvEdeURvB8t) +org.elasticsearch.index.query.QueryParseContext$1@6a13ee5a))))],from[0],size[10]: Query Failed [Failed to execute main query]]; nested: NullPointerException; "
  2. I have an index which I called it esbug with the following mapping
{
    "mappings":
    {
        "person" : {
            "properties": {
                "name": {"type": "string", "index": "not_analyzed"},
                "address": {
                 "type" : "nested",
                    "properties" : {
                        "street": {"type": "string", "index": "not_analyzed"},
                        "number": {"type": "integer"}
                    }
                }
            }
        },
        "order" : {
            "_routing": {"store": false},
            "_parent" : { "type" : "person" },
            "properties" : {
                "name" : { "type": "string"},
                "created" : { "type": "date"}
            }
        }
    }
} 

I add the following documents to it

POST esbug/person/AUw3EnRZSNvEdeURvB8t
{
"name": "Person One",
"address": {
        "street": "NA",
        "number": "2"
    }
}

POST esbug/person/AUw3EiZpSNvEdeURvB8s
{
"name": "Person Two",
"address": {
        "street": "NA",
        "number": "2"
    }
}

POST esbug/order?parent=AUw3EnRZSNvEdeURvB8t
{
    "name": "Order One",
    "created": "2015-03-20T08:58:50.941Z"
}


POST esbug/order?parent=AUw3EnRZSNvEdeURvB8t
{
    "name": "Order Two",
    "created": "2015-03-20T09:01:50.941Z"
}

POST esbug/order?parent=AUw3FHJUSNvEdeURvB8w
{
    "name": "Order Three",
    "created": "2015-03-20T09:01:50.941Z"
}

Then I create the index alias

POST _aliases
{
    "actions" : [
        {
            "add" : {
                 "index" : "esbug",
                 "alias" : "esbug_123",
                 "filter":{
               "bool": {
                  "should": [
                     {
                        "bool": {
                           "must": [
                              {
                                 "term": {
                                    "_type": [
                                       "person"
                                    ]
                                 }
                              },
                              {
                                 "nested": {
                                    "path": "address",
                                    "filter": {
                                       "term": {
                                          "number": 2
                                       }
                                    }
                                 }
                              },
                              {
                                    "has_child": {
                                        "type": "order",
                                        "filter": {
                                            "bool": {
                                               "must": [
                                                  {
                                                     "term": {
                                                        "_type": "order"
                                                     }
                                                  }
                                               ],
                                               "should": [
                                                  {
                                                     "not": {
                                                        "terms": {
                                                           "_parent": [
                                                              "AUw3EnRZSNvEdeURvB8t"
                                                           ]
                                                        }
                                                     }
                                                  },
                                                  {
                                                     "bool": {
                                                        "must": [
                                                           {
                                                              "term": {
                                                                 "_parent": "AUw3EnRZSNvEdeURvB8t"
                                                              }
                                                           },
                                                           {
                                                              "range": {
                                                                 "created": {
                                                                    "gt": "2015-03-20T09:00:50.941Z"
                                                                 }
                                                              }
                                                           }
                                                        ]
                                                     }
                                                  }
                                               ]
                                            }
                                        }
                                    }
                                } 
                           ]
                        }
                     },
                     {
                        "bool": {
                           "must": [
                              {
                                 "term": {
                                    "_type": "order"
                                 }
                              }
                           ],
                           "should": [
                              {
                                 "not": {
                                    "terms": {
                                       "_parent": [
                                          "AUw3EnRZSNvEdeURvB8t"
                                       ]
                                    }
                                 }
                              },
                              {
                                 "bool": {
                                    "must": [
                                       {
                                          "term": {
                                             "_parent": "AUw3EnRZSNvEdeURvB8t"
                                          }
                                       },
                                       {
                                          "range": {
                                             "created": {
                                                "gt": "2015-03-20T09:00:50.941Z"
                                             }
                                          }
                                       }
                                    ]
                                 }
                              }
                           ]
                        }
                     }
                  ]
               }
            }
            }
        }
    ]
}

Performing a simple query like GET esbug_123/_search produces inconsistent results.

@javanna javanna self-assigned this Mar 20, 2015
@javanna
Copy link
Member

javanna commented Mar 24, 2015

Hi @mikiot I'm having troubles parsing your alias request, the filter doesn't fit in my screen :) Did you try to trim it down a bit and see which part is causing the problem that you see exactly? Did you try executing the flter by itself, without going through the alias?

@mikiot
Copy link
Author

mikiot commented Mar 24, 2015

What do you mean by it does not fit in the screen?
The problem is caused by the has_child filter. The filter works fine by it's on.
I removed some parts of the filter and removed indentation so you can see the filter.

{"actions":[{"add":{"index":"esbug","alias":"esbug_123","filter":{"bool":{"should":[{"bool":{"must":[{"term":{"_type":["person"]}},{"has_child":{"type":"order","filter":{"bool":{"must":[{"term":{"_type":"order"}}],"should":[{"not":{"terms":{"_parent":["AUw3EnRZSNvEdeURvB8t"]}}},{"bool":{"must":[{"term":{"_parent":"AUw3EnRZSNvEdeURvB8t"}},{"range":{"created":{"gt":"2015-03-20T09:00:50.941Z"}}}]}}]}}}}]}}]}}}}]}

@javanna
Copy link
Member

javanna commented Mar 25, 2015

sorry @mikiot about the irony, I just meant that the filter is very big and hard to read. Much better now, thanks a lot for trimming it down. I managed to reproduce the problem, it does seem like a bug.

@martijnvg can you have a look please? here is the stacktrace (ran from current 1.x), seems like docIdsSet is null here. I miss why though, might it be that the filter gets closed before the end of its execution?

Caused by: java.lang.NullPointerException
    at org.elasticsearch.index.search.child.CustomQueryWrappingFilter.getDocIdSet(CustomQueryWrappingFilter.java:82)
    at org.elasticsearch.common.lucene.search.XBooleanFilter.getDocIdSet(XBooleanFilter.java:83)
    at org.elasticsearch.common.lucene.search.XBooleanFilter.getDocIdSet(XBooleanFilter.java:59)
    at org.elasticsearch.common.lucene.search.AndFilter.getDocIdSet(AndFilter.java:54)
    at org.elasticsearch.common.lucene.search.ApplyAcceptedDocsFilter.getDocIdSet(ApplyAcceptedDocsFilter.java:46)
    at org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:157)
    at org.apache.lucene.search.Weight.bulkScorer(Weight.java:131)
    at org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.bulkScorer(ConstantScoreQuery.java:141)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:618)
    at org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:191)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:491)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:448)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269)
    at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:157)

@javanna javanna added the >bug label Mar 25, 2015
@javanna javanna assigned martijnvg and unassigned javanna Mar 25, 2015
@mikiot
Copy link
Author

mikiot commented Mar 25, 2015

@javanna You know you can always use a json formatter like this one https://www.jsoneditoronline.org/ for big jsons. This is what I used to remove the indentation. ;)

@javanna
Copy link
Member

javanna commented Mar 25, 2015

@mikiot I know and I spent some time doing it yesterday but somehow elasticsearch would refuse that filter although the json seemed valid, now it's better, thanks again

@javanna javanna assigned javanna and unassigned martijnvg and javanna Mar 28, 2015
@javanna
Copy link
Member

javanna commented Mar 30, 2015

Talked with @martijnvg (parent-child wizard) and he says that has_child & has_parent don't work properly as part of alias filters and percolator. There is no currently way to support them properly, thus we are leaning towards rejecting them completely when presented as part of alias filters or percolator queries.

@jpountz
Copy link
Contributor

jpountz commented Mar 30, 2015

+1 to reject p/c filters from aliases for now.

One issue with those filters is that the set of documents that match on a segment also depends on data that are stored in other segments because of the join. However, filters work per segment and it is expected that they always produce the same set of matches per segment (otherwise we could never cache). This does not work for p/c queries. We have some hacks in place to make sure when using p/c filters that they are never cached (NoCacheFilter) and that the join is performed only once (the horrible CustomQueryWrappingFilter) but it does not work with aliases which enforce caching.

We might have opportunities to make it better in 2.x with the queries/filter merge, but that would still be a lot of work and until then it is probably wiser to just reject such filters instead of adding on the existing hacks.

@martijnvg
Copy link
Member

What I realized was that the actually issue here with CustomQueryWrappingFilter is that it keeps state and that alias filters produce a Lucene filter that is shared on the index level. So many shard level requests use the same CustomQueryWrappingFilter instance and each request modifies the docIdSets field...

I think the only sane thing to do here is to prohibit the use of p/c filters in index aliases. Also when #8134 gets fixed then p/c filters can no longer be used in index aliases.

@mikiot
Copy link
Author

mikiot commented Apr 3, 2015

So you say it is impossible to be fixed for now? Should I find another solution for problem?

@jpountz
Copy link
Contributor

jpountz commented Apr 3, 2015

@mikiot Sadly yes. A work-around is to provide the parent/child filter on every query execution instead of relying on the alias.

@martijnvg
Copy link
Member

If alias filter were parsed at search time instead of alias creation time then this problem wouldn't exist. Maybe for certain alias filter this would make sense? Also this not so nice work around can be removed: #8534

@kimchy
Copy link
Member

kimchy commented Apr 4, 2015

@martijnvg ++ on parsing alias filters on search time each time, its cheap and safer

@clintongormley
Copy link

Closing in favour of #10485

@clintongormley clintongormley added :Data Management/Indices APIs APIs to create and manage indices and templates and removed :Aliases labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Indices APIs APIs to create and manage indices and templates
Projects
None yet
Development

No branches or pull requests

6 participants