Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering on results of an aggregation #4404

Closed
jrick1977 opened this issue Dec 10, 2013 · 38 comments
Closed

Filtering on results of an aggregation #4404

jrick1977 opened this issue Dec 10, 2013 · 38 comments
Assignees

Comments

@jrick1977
Copy link

The case here can best be described by referring to the HAVING clause in SQL. Basically we need to be able do an aggregation but limit the results of the query based on the value of the aggregation. For example if I wanted to look at what age bands have an average height greater that 5 feet, my query would look something like this:

 "aggs": {
        "genders": {
            "terms": {
                "field": "gender"
            },
            "aggs": {
                "age_groups" : {
                    "range" : {
                        "field" : "age",
                        "ranges" : [
                            { "to" : 5 },
                            { "from" : 5, "to" : 10 },
                            { "from" : 10, "to" : 15 },
                            { "from" : 15}
                        ]
                    },
                    "aggs" : {
                        "avg_height" : { 
                            "avg" : { "field" : "height" } ,
                            "having" : { "from" : 60 }
                        }
                    }
                }
            }
        }
    }
@lusid
Copy link

lusid commented Jan 16, 2014

We also need a way to filter a multi valued aggregate down to a single value so we don't have to get so much data back. We retrieve values like biggest month, smallest month, busiest month, slowest month, etc, and I was hoping to at least be able to do something like below, but there is no "size" field on the histograms, so even though the results are ordered the way I want, I have to get back thousands of results just so I can print a single number on a website.

{
    "query": { "match_all": {} },
    "aggs": {
        "biggest_month": {
            "date_histogram": {
                "field": "PO.IssuedDate",
                "interval": "month",
                "order": { "sum_of_po_totals": "desc" },
                "format": "yyyy-MM",
                "size": 1  // no size???
            },
            "aggs": {
                "sum_of_po_totals": {
                    "sum": { "field": "PO.IssuedAmount" }
                }
            }
        },
        "smallest_month": {
            "date_histogram": {
                "field": "PO.IssuedDate",
                "interval": "month",
                "order": { "sum_of_po_totals": "asc" },
                "format": "yyyy-MM",
                "size": 1  // no size???
            },
            "aggs": {
                "sum_of_po_totals": {
                    "sum": { "field": "PO.IssuedAmount" }
                }
            }
        },
        "busiest_month": {
            "date_histogram": {
                "field": "PO.IssuedDate",
                "interval": "month",
                "order": { "_count": "desc" },
                "format": "yyyy-MM",
                "size": 1  // no size???
            }
        },
        "slowest_month": {
            "date_histogram": {
                "field": "PO.IssuedDate",
                "interval": "month",
                "order": { "_count": "asc" },
                "format": "yyyy-MM",
                "size": 1  // no size???
            }
        }
    },
    "size": 0
}

Better yet, I would love to see the single value aggregations (or something like them) work on multi value aggregations as a filter (which is currently not being used for anything), so something like this:

{
    "query": { "match_all": {} },
    "aggs": {
        "biggest_month": {
            "max": { "field": "by_month.sum_of_po_totals" }
        },
        "smallest_month": {
            "min": { "field": "by_month.sum_of_po_totals" }
        },
        "busiest_month": {
            "max": { "field": "by_month.count_of_pos" }
        },
        "slowest_month": {
            "min": { "field": "by_month.count_of_pos" }
        },
        "aggs": {
            "by_month": {
                "date_histogram": {
                    "field": "PO.IssuedDate",
                    "interval": "month",
                    "format": "yyyy-MM"
                },
                "aggs": {
                    "sum_of_po_totals": {
                        "sum": { "field": "PO.IssuedAmount" }
                    },
                    "count_of_pos": {
                        "value_count": { "field": "PO.ID" }
                    }
                }
            }
        }
    },
    "size": 0
}

Of course, the "having" modifier mentioned originally would also be fantastic for doing range queries, etc on these aggregate results as well, and would also solve our issue if we could use a "max" aggregate in a "having" clause (even though I personally find it a little less intuitive to have it at the bottom of the hierarchy, but I'm guessing it would probably be easier to implement that way since it is still a single valued aggregate operating under a multi-valued bucket aggregate).

No matter what the solution, this is still a pretty massive hole for our needs in an otherwise awesome aggregations framework, which is unfortunate.

@SeyZ
Copy link

SeyZ commented Feb 13, 2014

If I well understand this issue, you would like to have the equivalent of the facet_filter of the facets but for aggregations?

@lusid
Copy link

lusid commented Feb 13, 2014

The filter functionality already provides something like facet_filter, in that it filters the results prior to aggregating them. http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filter-aggregation.html

I think what we are looking for is a way to further filter the result of the aggregation after the fact (i.e. the MAX of the SUM).

@kajal23
Copy link

kajal23 commented Apr 7, 2014

Also, would it be possible to apply aggregation in the query filter? We need to limit search result based on the aggregation result. For Example, return all the employees in a department "dep1" whose salary is less than the average salary of a department "dep1".

@kyle-stachowiak-relativity

A filter similar to "having" would be useful, I have been trying to find groups of documents with more than a certain number of matches overall. (i.e. SELECT id ... GROUP BY id HAVING COUNT(id) > 4)

@xenji
Copy link

xenji commented Jul 29, 2014

Are there any plans to implement such a feature? We really need this for one of our "big data" projects. Are there any approaches to implement this via a plugin? I would be very thankful for any hints.

@xenji
Copy link

xenji commented Jul 30, 2014

We are willing to provide financial support for this feature if that helps. Just let me know how to get in contact.

@hjz
Copy link

hjz commented Jul 30, 2014

+1. This feature is very much needed. Happy to provide resources to help if necessary.

@flowbehappy
Copy link

Mark. We also want to migrate our database to ES but stopped by this issue.

@mrdanadams
Copy link

+1

7 similar comments
@boboland
Copy link

+1

@nicollette
Copy link

+1

@hiteshagja
Copy link

+1

@deepak-mohanakrishnan
Copy link

+1

@headstar
Copy link

+1

@Jeedey
Copy link

Jeedey commented Sep 2, 2014

+1

@ja-ilija
Copy link

ja-ilija commented Sep 5, 2014

+1

@Schnouki
Copy link

Schnouki commented Sep 8, 2014

+1 as well.

@villadora
Copy link

+1

@jan-molak
Copy link

+1 this functionality would be very useful

@fehmicansaglam
Copy link

+1

1 similar comment
@aaronlevin
Copy link

+1

@jswartsel
Copy link

+1

2 similar comments
@brupm
Copy link

brupm commented Oct 10, 2014

👍

@DanielNill
Copy link

+1

@colings86
Copy link
Contributor

Closing in favour of #8110

@paullo0106
Copy link

+1

5 similar comments
@ignaciovazquez
Copy link

+1

@nealvarner
Copy link

+1

@troodon
Copy link

troodon commented May 5, 2015

+1

@bogdanovich
Copy link

+1

@kishoreannapureddy
Copy link

+1

@emilgerebo
Copy link

1+

@GabrielKast
Copy link

+1

1 similar comment
@osykora
Copy link

osykora commented Jul 3, 2015

+1

@yehosef
Copy link

yehosef commented Jul 20, 2015

@clintongormley
Copy link

@yehosef have a look at the bucket selector agg

@yehosef
Copy link

yehosef commented Jul 20, 2015

@clintongormley - great, thanks for the reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests