Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small Feature Request: Add a flag to prevent aggs and postAggs from being sent back. #95

Closed
vogievetsky opened this issue Mar 1, 2013 · 11 comments

Comments

@vogievetsky
Copy link
Contributor

I would like to have a flag that I can set on aggs and postAggs that would remove them from the output.

Preposed:

"suppress": true 

(false by default)
This can be added with perfect backward computability

Why?

Sometimes I need to have helper aggs and postAggs that I need for the calculation but do not want to show to the user. (i.e. if the user downloads the data those columns need to be removed)

Use case for aggs:

Let's say that I want to compute the average bucketing for wiki_edits
I do not care about count or sum!
Everything with starting with a _ I do not want to have sent back.

Query:

{
  "dataSource": "wikipedia_editstream",
  "intervals": [
    "2013-02-26T00:00:00.000/2013-02-27T00:00:00.000"
  ],
  "queryType": "timeseries",
  "granularity": "all",
  "filter": {
    "type": "selector",
    "dimension": "namespace",
    "value": "article"
  },
  "dimension": {
    "type": "default",
    "dimension": "page",
    "outputName": "Page"
  },
  "aggregations": [
    {
      "type": "count",
      "name": "_count"
    },
    {
      "type": "doubleSum",
      "name": "_sum_count",
      "fieldName": "count"
    }
  ],
  "postAggregations": [
    {
      "type": "arithmetic",
      "name": "AvgCount",
      "fn": "/",
      "fields": [
        {
          "type": "fieldAccess",
          "fieldName": "_sum_count"
        },
        {
          "type": "fieldAccess",
          "fieldName": "_count"
        }
      ]
    }
  ]
}

Result:

[ {
  "timestamp" : "2013-02-26T00:00:00.000Z",
  "result" : {
    "_count" : 492635,
    "AvgCount" : 1.0816080871233267,
    "_sum_count" : 532838.0
  }
} ]

Conclusion

By being able to suppress everything that currently starts with and "_" we could save network bandwidth and save me work by having to remove these columns explicitly before sending it to the user.

@ghost ghost assigned cheddar Mar 1, 2013
@xvrl xvrl removed this from the Druid Release -- Next milestone Apr 14, 2014
@fjy fjy added this to the Druid 0.7.x milestone Apr 14, 2014
@xvrl xvrl removed this from the 0.7.0 milestone Feb 13, 2015
@stale
Copy link

stale bot commented Jun 21, 2019

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Jun 21, 2019
@vogievetsky
Copy link
Contributor Author

How are we treating issues that still make sense for native Druid queries but are solved by DruidSQL? Probably can be closed right?

@stale
Copy link

stale bot commented Jun 21, 2019

This issue is no longer marked as stale.

@stale stale bot removed the stale label Jun 21, 2019
@reemasaluja
Copy link

@vogievetsky , any update on this ticket?

maytasm pushed a commit to maytasm/druid that referenced this issue May 20, 2020
* fix license registry for com.nimbusds lang-tag (apache#9860)

* Fail incorrectly constructed join queries (apache#9830)

* Fail incorrectly constructed join queries

* wip annotation for equals implementations

* Add equals tests

* fix tests

* Actually fix the tests

* Address review comments

* prohibit Pattern.hashCode()

Co-authored-by: Clint Wylie <cwylie@apache.org>
@cswarth
Copy link

cswarth commented Apr 28, 2021

I'd love to see some movement on this issue (unless I'm missing some way to suppress aggregations in native queries).
I'm aggregating over large objects then applying a post aggregator to produce a small derived metric. The aggregated objects can be quite large when base64 encoded in the response (like 80K+) and I don't actually need them in the response.

@benkrug
Copy link
Contributor

benkrug commented Sep 17, 2021

I haven't tried yet, but is it possible to wrap the query into an inner query, and select the desired fields from that in an outer query? (As a workaround until there's a flag.)

@Fryuni
Copy link

Fryuni commented Jan 3, 2022

We're having the same problem.
For now most of our queries are wrapped on secondary queries just to hide the intermediary fields, but not all query types can be wrapped in other queries (e.g. the movingAverage query).

Some of our queries are timeseries with about 240 entries and each entry contains a 50KB thetaSketch that is generate by an intersection postAgg that we need. That is 12MB of useless data for each chart on our dashboards while the actual data is less than 2KB.

DruidSQL is not a solution since it does not support many of the agg and postAgg that we use (some from extensions that may not even try to support DruidSQL)

@jrobin42
Copy link

Same thing here, especially when using the second as granularity, unused fields waste too much bandwidth. Is anyone working on this issue ? Thanks !

@github-actions
Copy link

This issue has been marked as stale due to 280 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If this issue is still
relevant, please simply write any comment. Even if closed, you can still revive the
issue at any time or discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label May 13, 2023
@github-actions
Copy link

This issue has been closed due to lack of activity. If you think that
is incorrect, or the issue requires additional review, you can revive the issue at
any time.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 10, 2023
@Fryuni
Copy link

Fryuni commented Jun 16, 2023

AFAICT this is still a problem with only the whacky workaround mentioned before

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants