New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add virtual column types, holder serde, and safety features. #3823
Conversation
Virtual columns: - add long, float, dimension selectors - put cache IDs in VirtualColumnCacheHelper - adjust serde so VirtualColumns can be the holder object for Jackson - add fail-fast validation for cycle detection and duplicates - add expression virtual column in core Storage adapters: - move virtual column hooks before checking base columns, to prevent surprises when a new base column is added that happens to have the same name as a virtual column.
{ | ||
final VirtualColumn virtualColumn = getVirtualColumn(dimensionSpec.getDimension()); | ||
if (virtualColumn == null) { | ||
return dimensionSpec.decorate(NullDimensionSelector.instance()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this behaviour (returning NullDimensionSelector
if virtualColumn is not found) hide mistakes, or as intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an extension of an existing convention where reading from a non-existent column is treated as reading from a column full of nulls, e.g.:
final Column columnDesc = index.getColumn(dimension);
if (columnDesc == null) {
return NULL_DIMENSION_SELECTOR;
}
from QueryableIndexStorageAdapter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intended, like @jon-wei said, the idea is that if you query a column that doesn't exist, it's treated like a column full of nulls. This makes it easy to make queries that span across segments that do have a column and segments that don't have a column.
{ | ||
final VirtualColumn virtualColumn = getVirtualColumn(columnName); | ||
if (virtualColumn == null) { | ||
return ZeroFloatColumnSelector.instance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same answer as above!
{ | ||
final VirtualColumn virtualColumn = getVirtualColumn(columnName); | ||
if (virtualColumn == null) { | ||
return ZeroLongColumnSelector.instance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same answer as above
|
||
final VirtualColumn dependency = getVirtualColumn(columnName); | ||
if (dependency != null) { | ||
detectCycles(dependency, ImmutableSet.copyOf(nextSet)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Shouldn't virtualColumn's name be added to nextSet before going into recursion?
- copyOf() not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It's added above on the line
if (!nextSet.add(columnName)) {
- Reworked to avoid copying
|
||
package io.druid.segment; | ||
|
||
public class ZeroFloatColumnSelector implements FloatColumnSelector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add final
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
|
||
package io.druid.segment; | ||
|
||
public class ZeroLongColumnSelector implements LongColumnSelector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add final
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
@Override | ||
public int getValueCardinality() | ||
{ | ||
return DimensionSelector.CARDINALITY_UNKNOWN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me that either this method should return 1, i. e. this DimensionSelector returns the same value on each iteration, or getRow()
shouldn't return ZeroIndexedInts
, if DimensionSelector could return different values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it looks OK to me, withdrawing this
import io.druid.segment.data.IndexedInts; | ||
import io.druid.segment.data.ZeroIndexedInts; | ||
|
||
public abstract class BaseSingleValueDimensionSelector implements DimensionSelector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just SingleValueDimensionSelector
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's Base because impls need to override getValue()
. The idea is this selector always returns 0 as the int, but that int will return different strings via lookupName
on different rows.
Cardinality being CARDINALITY_UNKNOWN
is a signal that callers should expect this behavior (see javadocs for DimensionSelector.getValueCardinality).
{ | ||
@Override | ||
public long get() | ||
{ | ||
final Number number = baseSelector.get(); | ||
return number != null ? number.longValue() : nullValue; | ||
} | ||
}; | ||
} | ||
return new ExpressionLongColumnSelector(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for giving names :)
@Override | ||
protected String getValue() | ||
{ | ||
return extractionFn.apply(baseSelector.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If baseSelector could return different values on each iteration, it might be not possible to extend BaseSingleValueDimensionSelector
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it looks OK to me, withdrawing this
@leventov, thanks for review. I pushed a new commit addressing your comments. |
Fixed conflicts with master. |
Pushed fixes to tests that were failing due to that last merge with master. |
👍 |
…3823) * Add virtual column types, holder serde, and safety features. Virtual columns: - add long, float, dimension selectors - put cache IDs in VirtualColumnCacheHelper - adjust serde so VirtualColumns can be the holder object for Jackson - add fail-fast validation for cycle detection and duplicates - add expression virtual column in core Storage adapters: - move virtual column hooks before checking base columns, to prevent surprises when a new base column is added that happens to have the same name as a virtual column. * Fix ExtractionDimensionSpecs with virtual dimensions. * Fix unused imports. * CR comments * Merge one more time, with feeling.
Similar (ish) to #3758 but less aggressive. This PR does not change the aggregator api (they still support expressions). It also does not VCs to any new query types. It also doesn't remove makeMathExpressionSelector, but that was already done in #3815.
@navis I know you commented on #3758 that you thought some of these were good ideas, and that you also had some of your own PRs you wanted to submit that would conflict. I added you as a reviewer for this, please help me out by letting me know if the stuff here is aligned with what you're doing.
Virtual columns:
Storage adapters:
surprises when a new base column is added that happens to have the same name as a
virtual column.
future work (whoever does it) would likely involve adding virtual columns to all query types, allowing filtering on virtual columns, and allowing grouping on virtual columns.