Skip to content

Web console: Estimate rollup ratio for a datasource#8727

Closed
renevan10 wants to merge 24 commits intoapache:masterfrom
implydata:rollup-estimator
Closed

Web console: Estimate rollup ratio for a datasource#8727
renevan10 wants to merge 24 commits intoapache:masterfrom
implydata:rollup-estimator

Conversation

@renevan10
Copy link
Contributor

@renevan10 renevan10 commented Oct 23, 2019

This PR adds a dialog to preview and estimate rollup of a datasource under the datasources section. The top 20 rows of the datasource will be previewed. The user may get the estimated rollup ratio by:

  • changing the interval of the data,
  • selecting columns to exclude for the rollup(which will highlight the entire column in grey)
  • changing the granularity

If the ingested data was previously rolled up, the original rollup ratio will also be displayed.

image

Note that you must leave at least one column de-selected for the calculation.

@fjy
Copy link
Contributor

fjy commented Oct 23, 2019

@renevan10 this supporting arbitrarily selecting multiple columns to exclude right?

@renevan10
Copy link
Contributor Author

@fjy yes! You can select multiple columns.

@vogievetsky vogievetsky changed the title Estimate rollup ratio for a datasource Web console: Estimate rollup ratio for a datasource Oct 23, 2019
@lgtm-com
Copy link

lgtm-com bot commented Oct 24, 2019

This pull request introduces 2 alerts when merging 8298a79b8ec8a4983625302225618f767cd5b68a into a8b674e - view on LGTM.com

new alerts:

  • 2 for Unused or undefined state property

@fjy
Copy link
Contributor

fjy commented Oct 24, 2019

@renevan10 @vogievetsky does the algorithm only look at the first 20 rolls? I don't think that is going to accurately estimate the rollup ratio

@renevan10
Copy link
Contributor Author

@fjy No it will calculate the ratio for the entire datasource based on the interval, the first 20 rows is just to serve as a preview for the user to look at.

@fjy
Copy link
Contributor

fjy commented Oct 24, 2019

@fjy No it will calculate the ratio for the entire datasource based on the interval, the first 20 rows is just to serve as a preview for the user to look at.

What happens if the interval covers a lot of data?

@vogievetsky
Copy link
Contributor

I think you are missing a snapshot test here

@vogievetsky
Copy link
Contributor

@fjy I think if there is a lot of data for the selected interval then the tool will be slow. It should probably mention it in the blurb

@fjy
Copy link
Contributor

fjy commented Oct 24, 2019

@fjy I think if there is a lot of data for the selected interval then the tool will be slow. It should probably mention it in the blurb

Uh... it seems like we should think about this more.

@vogievetsky
Copy link
Contributor

@fjy what do you mean?

@vogievetsky
Copy link
Contributor

FYI this view is powered by a cardinality(byRow) query. There should be some tweaks around making the slow state work nice and also around setting good priorities and timeouts - so more UI work. But the idea is solid.

@renevan10 renevan10 marked this pull request as ready for review October 24, 2019 18:54
@stale
Copy link

stale bot commented Jan 7, 2020

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Jan 7, 2020
@stale
Copy link

stale bot commented Feb 4, 2020

This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@stale stale bot closed this Feb 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants