Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support min_children & max_children for nested docs #10043

Open
jsuchal opened this issue Mar 9, 2015 · 18 comments
Open

Support min_children & max_children for nested docs #10043

jsuchal opened this issue Mar 9, 2015 · 18 comments
Labels
>feature help wanted adoptme :Search/Search Search-related issues that do not fall into other categories stalled Team:Search Meta label for search team

Comments

@jsuchal
Copy link

jsuchal commented Mar 9, 2015

I am opening this as a separate issue since the previous issue was closed with support for parent-child docs (#6019 (comment)).

We would love to have support for min_children & max_children or similar also for nested filters/docs. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html#_min_max_children_2

Thanks a keep up the great work.

@martijnvg
Copy link
Member

+1 we should add this! I think we should also open an issue for this in Lucene, because the nested query uses the ToParentBlockJoinQuery Lucene query to do the actual work.

@martijnvg
Copy link
Member

I opened: https://issues.apache.org/jira/browse/LUCENE-6354 to get this in Lucene

@gmenegatti
Copy link

+1

@kmcs
Copy link

kmcs commented Apr 16, 2015

+1

5 similar comments
@lonre
Copy link

lonre commented Sep 19, 2016

+1

@guilherme-santos
Copy link

+1

@asuiu
Copy link

asuiu commented Apr 11, 2017

+1

@conan
Copy link

conan commented May 10, 2017

+1

@voran
Copy link

voran commented Aug 14, 2017

+1

@voran
Copy link

voran commented Aug 16, 2017

This would be a great feature to have. The only reason why we are using parent/child instead of nested mapping is the lack of min_children/max_children options in the nested query. Considering that:

  1. Nested queries are much faster than has_child queries;
  2. Elasticsearch is moving in the one-type-per-index direction;

I would very much like to see this implemented. Please let me know if there's anything I can do to help.

@turp1twin
Copy link

Any update on this? Would be really, really good to have this feature!

@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Nested Docs labels Feb 14, 2018
@andyb-elastic
Copy link
Contributor

@elastic/es-search-aggs

@colings86
Copy link
Contributor

Stalled waiting for https://issues.apache.org/jira/browse/LUCENE-6354 to be completed and merged

@bw2
Copy link

bw2 commented Oct 26, 2018

+1

@thaDude
Copy link

thaDude commented Nov 16, 2018

+1

I'd love to see this.

To give some context - while the main reason for us to migrate to a parent/child model from nested was indexing speed we also did so because of the min_children and max_children feature.

However, we have become painfully aware of the cost of has_child queries (joins) as the number of child documents and/or complexity of queries increases. OOM exceptions have become too frequent for comfort.

For stability reasons, we are re-considering the nested model even if it means decreased indexing speed. Knowing, that min_children and max_children for that model are still being planned would re-assure us.

Thank you!

@xethorn
Copy link

xethorn commented Apr 28, 2019

Note: for comments and last updates, please refer to: GIST 290e31176f493814823a20f281e82fd4.

Alternative solution

To support min_children and max_children for your nested query, all you have to do is to use a function_score query. To make this more concrete: you have an index called Person with the following mapping:

  • first_name (text)
  • email (text)
  • children (nested)
    • first_name (text)
    • last_name (text)

Cases

To effectively support min_children and max_children, there are multiple queries you need to consider:

  • Find all Persons who don't have children.
  • Find all Persons who have more than n children:
    • requires: n > 0
    • if n is 0, it means you're asking for any Person, regardless if they have or don't have children.
  • Find all Persons who have between n and m children:
    • requires: n > 0
    • if n is 0, it means you're asking to find all Persons who have less than m children.
  • Find all persons who have less than n children.

Depending on the scenario, the request will look different.

Notes about function_score:

  • function_score supports min_score. It filters out any document where the score is lower than the min_score.
  • function_score has a max_boost. This doesn't filter documents returned, it simply caps the score to a specific value. For instance: if after calculating the score, you end up with 500, and the max_boost is 50, 50 will be returned.
  • If you don't want this function_score to pollute the overall score of the document, apply a boost of 0.

Find all persons have no children

Explanation: easiest query, you simply have to verify there are no nested documents. It is significantly faster than using the function_score.

{
    "query": {
        "bool": {
            "must_not": {
                "nested": {
                    "path": "children",
                    "query": {
                        "exists": {
                            "field": "children.first_name"
                        }
                    }
                }
            }    
        }
    }
}

Find all persons have a minimum of n children

Explanation: each matching document is boosted by 10 and the nested query sums them. The function_score filters out any document that is less than what is expected.

Example: Find all persons who have a minimum of 2 children: the boost applied here is 10 (you can set any number you want here), as such the min_score is 20 (2 * 10).

Before Elastic 7:

{
    "query": {
        "function_score": {
            "min_score": 20,
            "boost": 1,
            "query": {
                "nested": {
                    "path": "children",
                    "query": {
                        "exists": {
                            "field": "children.first_name"
                        }
                    },
                    "boost": 10,
                    "score_mode": "sum"
                }
            }
        }
    }
}

Elastic 7+:

{
    "query": {
        "function_score": {
            "min_score": 20,
            "boost": 1,
            "score_mode": "multiply",
            "boost_mode": "replace",
            "query": {
                "nested": {
                    "path": "children",
                    "boost": 10,
                    "score_mode": "sum",
                    "query": {
                        "constant_score": {
                            "boost": 1,
                            "filter": {
                                "exists": {
                                    "field": "children.first_name"
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Find all persons who have between n and m children (n > 0).

Explanation: same as above, except here we apply a script to alter the score. This script checks that the sum of all boosts is not exceeding m * boost, if it does, it returns 0 which automatically guarantee the document will be excluded (0 < min_score).

Example: Find all persons who have a minimum of 2 children and a maximum of 5 children (inclusive).

Before Elastic 7:

{
    "query": {
        "function_score": {
            "min_score": 20,
            "boost": 1,
            "functions": {
                "script_score": {
                    "script": {
                        "source": "if (_score > 50) { return 0; } return _score;",
                        "lang": "painless"
                    }
                }
            },
            "query": {
                "nested": {
                    "path": "children",
                    "query": {
                        "exists": {
                            "field": "children.first_name"
                        }
                    },
                    "boost": 10,
                    "score_mode": "sum"
                }
            }
        }
    }
}

Elastic 7+:

{
    "query": {
        "function_score": {
            "min_score": 20,
            "boost": 1,
            "score_mode": "multiply",
            "boost_mode": "replace",
            "functions": [
                {
                    "filter": {
                        "match_all": {
                            "boost": 1
                        }
                    },
                    "script_score": {
                        "filter": {
                            "match_all": {
                                "boost": 1
                            }
                        },
                        "script": {
                            "source": "if (_score > 50) { return 0; } return _score;",
                            "lang": "painless"
                        }
                    }
                }
            ],
            "query": {
                "nested": {
                    "path": "children",
                    "boost": 10,
                    "score_mode": "sum",
                    "query": {
                        "constant_score": {
                            "boost": 1,
                            "filter": {
                                "exists": {
                                    "field": "children.first_name"
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Find all persons who have less than n children (n > 0).

Explanation: this request means you are asking for persons who have no children and persons who have been 1 to n children. Expressing this with elastic can be tricky, so taking the negation makes it easier: you're asking to not find all persons who have more than n + 1 children.

Example: find all persons who have less than 2 children.

Before Elastic 7:

{
    "query": {
        "bool": {
            "must_not": {
                "function_score": {
                    "min_score": 30,
                    "boost": 1,
                    "query": {
                        "nested": {
                            "path": "children",
                            "query": {
                                "exists": {
                                    "field": "children.first_name"
                                }
                            },
                            "boost": 10,
                            "score_mode": "sum"
                        }
                    }
                }
            }
        }
    }
}

Elastic 7+:

{
    "query": {
        "bool": {
            "must_not": {
                "function_score": {
                    "min_score": 30,
                    "boost": 1,
                    "score_mode": "multiply",
                    "boost_mode": "replace",
                    "query": {
                        "nested": {
                            "path": "children",
                            "boost": 10,
                            "score_mode": "sum",
                            "query": {
                                "constant_score": {
                                    "boost": 1,
                                    "filter": {
                                        "exists": {
                                            "field": "children.first_name"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Hope this helps.

Note: for comments and last updates, please refer to: GIST 290e31176f493814823a20f281e82fd4.

@rjernst rjernst added the Team:Search Meta label for search team label May 4, 2020
@DustinJSilk
Copy link

Any update on this?

I'm trying to filter my results based on an exact length. @xethorn I can't seem to get your solution working with filters, could you point me in the right direction?

Here's my search with filters, which don't support scoring:

GET /test/_search
{
  "query" : {
    "function_score": {
      "min_score": 20,
      "boost": 1,
      "functions": [
        {
          "script_score": {
            "script": {
                "source": "if (_score > 20) { return - 1; } return _score;"
            }
          }
        }
      ],
      "query": {
        "bool" : {
          "filter": [
            { "range": { "distance": { "lt": 5 }}},
            {
              "nested": {
                "score_mode": "sum",
                "boost": 10,
                "path": "dates",
                "query": {
                  "bool": {
                    "filter": [
                      { "range": { "dates.rooms": { "gte": 1 } } },
                      { "range": { "dates.timestamp": { "lte": 2 }}},
                      { "range": { "dates.timestamp": { "gte": 1 }}}
                    ]
                  }
                }
              }
            }
          ]
        }
      }
    }
  }
}

A few more details here: https://stackoverflow.com/questions/63226805/filter-query-by-length-of-nested-objects-ie-min-child

@xethorn
Copy link

xethorn commented Nov 8, 2020

Question was answered on slack overflow. For comments and last updates, please refer to: GIST 290e31176f493814823a20f281e82fd4. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature help wanted adoptme :Search/Search Search-related issues that do not fall into other categories stalled Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests