Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add virtual column types, holder serde, and safety features. #3823

Merged
merged 7 commits into from Jan 27, 2017

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Jan 6, 2017

Similar (ish) to #3758 but less aggressive. This PR does not change the aggregator api (they still support expressions). It also does not VCs to any new query types. It also doesn't remove makeMathExpressionSelector, but that was already done in #3815.

@navis I know you commented on #3758 that you thought some of these were good ideas, and that you also had some of your own PRs you wanted to submit that would conflict. I added you as a reviewer for this, please help me out by letting me know if the stuff here is aligned with what you're doing.

Virtual columns:

  • add long, float, dimension selectors
  • put cache IDs in VirtualColumnCacheHelper
  • adjust serde so VirtualColumns can be the holder object for Jackson
  • safety feature: add fail-fast validation for cycle detection and duplicates
  • add expression virtual column in core

Storage adapters:

  • safety feature: move virtual column hooks before checking base columns, to prevent
    surprises when a new base column is added that happens to have the same name as a
    virtual column.

future work (whoever does it) would likely involve adding virtual columns to all query types, allowing filtering on virtual columns, and allowing grouping on virtual columns.

Virtual columns:
- add long, float, dimension selectors
- put cache IDs in VirtualColumnCacheHelper
- adjust serde so VirtualColumns can be the holder object for Jackson
- add fail-fast validation for cycle detection and duplicates
- add expression virtual column in core

Storage adapters:
- move virtual column hooks before checking base columns, to prevent surprises
  when a new base column is added that happens to have the same name as a
  virtual column.
@leventov leventov self-assigned this Jan 11, 2017
{
final VirtualColumn virtualColumn = getVirtualColumn(dimensionSpec.getDimension());
if (virtualColumn == null) {
return dimensionSpec.decorate(NullDimensionSelector.instance());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this behaviour (returning NullDimensionSelector if virtualColumn is not found) hide mistakes, or as intended?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an extension of an existing convention where reading from a non-existent column is treated as reading from a column full of nulls, e.g.:

final Column columnDesc = index.getColumn(dimension);
if (columnDesc == null) {
  return NULL_DIMENSION_SELECTOR;
}

from QueryableIndexStorageAdapter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended, like @jon-wei said, the idea is that if you query a column that doesn't exist, it's treated like a column full of nulls. This makes it easy to make queries that span across segments that do have a column and segments that don't have a column.

{
final VirtualColumn virtualColumn = getVirtualColumn(columnName);
if (virtualColumn == null) {
return ZeroFloatColumnSelector.instance();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer as above!

{
final VirtualColumn virtualColumn = getVirtualColumn(columnName);
if (virtualColumn == null) {
return ZeroLongColumnSelector.instance();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer as above


final VirtualColumn dependency = getVirtualColumn(columnName);
if (dependency != null) {
detectCycles(dependency, ImmutableSet.copyOf(nextSet));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Shouldn't virtualColumn's name be added to nextSet before going into recursion?
  2. copyOf() not needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It's added above on the line if (!nextSet.add(columnName)) {
  2. Reworked to avoid copying


package io.druid.segment;

public class ZeroFloatColumnSelector implements FloatColumnSelector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add final?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure


package io.druid.segment;

public class ZeroLongColumnSelector implements LongColumnSelector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add final?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@Override
public int getValueCardinality()
{
return DimensionSelector.CARDINALITY_UNKNOWN;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to me that either this method should return 1, i. e. this DimensionSelector returns the same value on each iteration, or getRow() shouldn't return ZeroIndexedInts, if DimensionSelector could return different values.

Copy link
Member

@leventov leventov Jan 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it looks OK to me, withdrawing this

import io.druid.segment.data.IndexedInts;
import io.druid.segment.data.ZeroIndexedInts;

public abstract class BaseSingleValueDimensionSelector implements DimensionSelector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just SingleValueDimensionSelector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's Base because impls need to override getValue(). The idea is this selector always returns 0 as the int, but that int will return different strings via lookupName on different rows.

Cardinality being CARDINALITY_UNKNOWN is a signal that callers should expect this behavior (see javadocs for DimensionSelector.getValueCardinality).

{
@Override
public long get()
{
final Number number = baseSelector.get();
return number != null ? number.longValue() : nullValue;
}
};
}
return new ExpressionLongColumnSelector();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for giving names :)

@Override
protected String getValue()
{
return extractionFn.apply(baseSelector.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If baseSelector could return different values on each iteration, it might be not possible to extend BaseSingleValueDimensionSelector here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it looks OK to me, withdrawing this

@gianm
Copy link
Contributor Author

gianm commented Jan 18, 2017

@leventov, thanks for review. I pushed a new commit addressing your comments.

@gianm
Copy link
Contributor Author

gianm commented Jan 26, 2017

Fixed conflicts with master.

@gianm
Copy link
Contributor Author

gianm commented Jan 26, 2017

Pushed fixes to tests that were failing due to that last merge with master.

@jon-wei
Copy link
Contributor

jon-wei commented Jan 27, 2017

👍

@jon-wei jon-wei merged commit d3a3b7b into apache:master Jan 27, 2017
dgolitsyn pushed a commit to metamx/druid that referenced this pull request Feb 14, 2017
…3823)

* Add virtual column types, holder serde, and safety features.

Virtual columns:
- add long, float, dimension selectors
- put cache IDs in VirtualColumnCacheHelper
- adjust serde so VirtualColumns can be the holder object for Jackson
- add fail-fast validation for cycle detection and duplicates
- add expression virtual column in core

Storage adapters:
- move virtual column hooks before checking base columns, to prevent surprises
  when a new base column is added that happens to have the same name as a
  virtual column.

* Fix ExtractionDimensionSpecs with virtual dimensions.

* Fix unused imports.

* CR comments

* Merge one more time, with feeling.
@gianm gianm deleted the vc-types branch September 23, 2022 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants