Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ExpressionVirtualColumn capabilities; fix groupBy's improper uses of StorageAdapter#getColumnCapabilities. #8013

Merged
merged 3 commits into from
Jul 5, 2019

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Jul 2, 2019

  1. A usage in "isArrayAggregateApplicable" that would potentially incorrectly use
    array-based aggregation on a virtual column that shadows a real column.
  2. A usage in "process" that would potentially use the more expensive multi-value
    aggregation path on a singly-valued virtual column. (No correctness issue, but
    a performance issue.)

Also makes ExpressionVirtualColumn always report that it is multi-valued. Previously,
it always set its capabilities to be singly-valued, which was bug ever since #7588, since
it might actually be multi-valued.

1) A usage in "isArrayAggregateApplicable" that would potentially incorrectly use
   array-based aggregation on a virtual column that shadows a real column.
2) A usage in "process" that would potentially use the more expensive multi-value
   aggregation path on a singly-valued virtual column. (No correctness issue, but
   a performance issue.)
Copy link
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this. +1 after CI.

@fjy
Copy link
Contributor

fjy commented Jul 3, 2019

@gianm I think this is failing UT

@gianm
Copy link
Contributor Author

gianm commented Jul 4, 2019

It looks like the tests were failing because ExpressionVirtualColumn always set its capabilities to be singly-valued, which is a bug, since ever since #7588 they might be multi-valued. However, that bug was probably not detected since it was masked by this bug (which prevents groupBy from using its all-singly-valued-dimension optimization if some of the columns involved are virtual columns).

I pushed a fix for the ExpressionVirtualColumn issue and updated the top comment. In this fix I just set it to always be "true". This isn't ideal, since it means singly-valued optimizations won't work on top of it, but I didn't see an easy way for the ExpressionVirtualColumn to determine upfront if it will be singly-valued or not. I think this should be possible in the future as we add more upfront type info to the expression system, so I added a comment saying as much.

/cc @clintropolis

@gianm gianm changed the title GroupBy: Fix improper uses of StorageAdapter#getColumnCapabilities. Fix ExpressionVirtualColumn capabilities; fix groupBy's improper uses of StorageAdapter#getColumnCapabilities. Jul 4, 2019
@clintropolis
Copy link
Member

I pushed a fix for the ExpressionVirtualColumn issue and updated the top comment. In this fix I just set it to always be "true". This isn't ideal, since it means singly-valued optimizations won't work on top of it, but I didn't see an easy way for the ExpressionVirtualColumn to determine upfront if it will be singly-valued or not. I think this should be possible in the future as we add more upfront type info to the expression system, so I added a comment saying as much.

👍 I think this makes sense for now, I will follow this up with a fix to allow us to determine when a single input column will produce an scalar or array output so we can have this optimization again where possible.

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤘

@clintropolis clintropolis merged commit 9b499df into apache:master Jul 5, 2019
gianm added a commit to implydata/druid-public that referenced this pull request Jul 5, 2019
… of StorageAdapter#getColumnCapabilities. (apache#8013)

* GroupBy: Fix improper uses of StorageAdapter#getColumnCapabilities.

1) A usage in "isArrayAggregateApplicable" that would potentially incorrectly use
   array-based aggregation on a virtual column that shadows a real column.
2) A usage in "process" that would potentially use the more expensive multi-value
   aggregation path on a singly-valued virtual column. (No correctness issue, but
   a performance issue.)

* Add addl javadoc.

* ExpressionVirtualColumn: Set multi-value flag.
@gianm gianm deleted the fix-groupby-cc branch July 5, 2019 20:27
clintropolis pushed a commit that referenced this pull request Jul 24, 2019
… of StorageAdapter#getColumnCapabilities. (#8013)

* GroupBy: Fix improper uses of StorageAdapter#getColumnCapabilities.

1) A usage in "isArrayAggregateApplicable" that would potentially incorrectly use
   array-based aggregation on a virtual column that shadows a real column.
2) A usage in "process" that would potentially use the more expensive multi-value
   aggregation path on a singly-valued virtual column. (No correctness issue, but
   a performance issue.)

* Add addl javadoc.

* ExpressionVirtualColumn: Set multi-value flag.
@clintropolis clintropolis added this to the 0.15.1 milestone Jul 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants