Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting on SingleBucketAggregation's should be possible #5253

Closed
thanodnl opened this issue Feb 25, 2014 · 2 comments · Fixed by #5340
Closed

Sorting on SingleBucketAggregation's should be possible #5253

thanodnl opened this issue Feb 25, 2014 · 2 comments · Fixed by #5340

Comments

@thanodnl
Copy link
Contributor

Aggregations are a really great feature to create tableviews of analytics with ease. And the support for sorting on sub-aggregations gives the user the freedom to sort on any given column like he/she is used to. Although not every metric thinkable of can be used in sorts.

For example, if you have two columns where you show the number of males and females occuring with a term you would get a facet like this:

{
  "size": 0,
  "aggs": {
    "someterm": {
      "terms": {
        "field": "somefield"
      },
      "aggs": {
        "male": {
          "filter": {
            "term": {
              "gender": "male"
            }
          }
        },
        "female": {
          "filter": {
            "term": {
              "gender": "female"
            }
          }
        }
      }
    }
  }   
}

Unfortunately as soon as you want to sort the terms on a specific gender you get a message that it is unable to do so since the filter aggregation is not of a metric type.

Looking in to the code I think it would be technically possible to sort on the count of a filtered aggregation but is not yet implemented. I purpose a way to sort on sub aggregations of terms by its path.

Take the following aggregation:

{
  "size": 0,
  "aggs": {
    "someterm": {
      "terms": {
        "field": "somefield",
        "order": {
          "male._count": "desc" // would count on the doc_count of the 'male' sub-aggregation in a descending way
        }
      },
      "aggs": {
        "male": {
          "filter": {
            "term": {
              "gender": "male"
            }
          }
        },
        "female": {
          "filter": {
            "term": {
              "gender": "female"
            }
          }
        }
      }
    }
  }   
}

Currently it responds with an error explaining the sort can only be on metric aggregations. This seems a limitation which is not technical. The only limitation is that comparators for MetricsAggregation are implemented in InternalOrder, and for that matter in MultiBucketsAggregation.

By implementing a sorter for instances of SingleBucketAggregation we can sort not only on the count, but also on aggregations within this filtered aggregation (eg. an average of a filtered aggregation) by specifying a path to the aggregation the same way a path is provided now but with an extra element in the path.

{
  "size": 0,
  "aggs": {
    "some_agg": {
      "terms": {
        "field": "some_field",
        "order": {
          "count_of_a.again_some_field.avg": "desc"
        }
      },
      "aggs": {
        "count_of_a": {
          "filter": {
            "term": {
              "some_other_field": "a"
            }
          },
          "aggs": {
            "again_some_field": {
              "stats": {
                "field": "again_some_field"
              }
            }
          }
        }
      }
    }
  }
}

In this example you would sort on the average value of again_some_field filtered where some_other_field has a value of a.

To show that it is possible I hacked together a POC, but it turned out to be more complex than I imagened at first. It supports atleast the sorting on the count of a filter-aggregation, and even the sub-metrics of such a filter. I hope you can pick this up in the roadmap of Elasticsearch v1.*

@uboness uboness closed this as completed in 9d0fc76 Mar 5, 2014
uboness added a commit that referenced this issue Mar 5, 2014
 Supports sorting on sub-aggs down the current hierarchy. This is supported as long as the aggregation in the specified order path are of a single-bucket type, where the last aggregation in the path points to either a single-bucket aggregation or a metrics one. If it's a single-bucket aggregation, the sort will be applied on the document count in the bucket (i.e. doc_count), and if it is a metrics type, the sort will be applied on the pointed out metric (in case of a single-metric aggregations, such as avg, the sort will be applied on the single metric value)

 NOTE: this commit adds a constraint on what should be considered a valid aggregation name. Aggregations names must be alpha-numeric and may contain '-' and '_'.

 Closes #5253
@felixbarny
Copy link
Member

From release notes:

Aggregations: aggregation names can now only contain alpha-numeric, hyphen (“-”) and underscore (“_”) characters, due to the enhancement which allows sub-aggregation sorting #5253

Not understaning the reason why only alpa-numeric chars are allowed, wouldn't it be possible to allow any char by internally working with hexadecimal encoded strings? I would really wish, that arbitrary chars would still be allowed. My current workaround is to do the hex encoding manually. My aggregation names are computed dynamically and are displayed on the frontend, so avoiding non-alpha-numerics is not an option for me.

@colings86
Copy link
Contributor

Opened issue #6702 to address the above comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants