Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add output type information to ExpressionPostAggregator #11818

Merged
merged 5 commits into from
Oct 22, 2021

Conversation

clintropolis
Copy link
Member

Description

Currently, the PostAggregator interface defines a method getType which returns a ColumnType for use when populating a RowSignature. Many PostAggregator implementations have fixed return types, and are so independent of the underlying RowSignature, but some, such as FieldAccessPostAggregator and ExpressionAggregator, are sensitive to their inputs.

Currently these dynamic return type PostAggregator implementations are unable to produce an output type until they have been decorated which provides access to a map of column names to AggregatorFactory, which provide their own type information. The only place using the output type of PostAggregators is RowSignature, so if we modify the getType method to accept a ColumnInspector, we can compute the output prior to decoration, if the RowSignature provides itself as the ColumnInspector.

The ExpressionPostAggregator was missing an output type completely, despite the fact that Druid expression system for some time now has been able to infer the output type of an expression if the inputs to all of its free variables are known, so it has been updated to support producing the output type from either the inspector, or from the decorate method, if it has been called.

I also updated FieldAccessPostAggregator to be able to compute it's output type from the ColumnInspector if decorate has not been called, but did not yet do FinalizingFieldAccessPostAggregator, as I think we would need a clearer split between intermediary and finalized RowSignature that is not yet in place.

I left the 'decoration computes type' pattern in place, and the code prefers those over the ColumnInspector computed types because I wasn't completely sure if there would be any consequences to this, but it might very be possible to remove them from these aggregators in the future if it isn't needed.


Key changed/added classes in this PR
  • PostAggregator
  • ExpressionPostAggregator
  • FieldAccessPostAggregator

This PR has:

  • been self-reviewed.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

…aggs to compute their output type based on input types
}

@Override
public ExpressionPostAggregator decorate(final Map<String, AggregatorFactory> aggregators)
{
final ColumnInspector aggInspector = AggregatorUtil.inspectorForAggregatorFactoryMap(aggregators);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look necessary… how about dropping it and also the new function in AggregatorUtil?

if (type != null) {
return type;
}
final ColumnCapabilities capabilities = signature.getColumnCapabilities(fieldName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offtopic: it'd be nice to have signature.getColumnType(fieldName).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RowSignature already has a getColumnType though it returns an optional instead of nullable, so we would have to match it (or i guess give another name).

Also, I still have a dream that someday ExpressionType and ColumnType will be the same so then getType would serve that purpose, but not quite there yet.

@@ -277,7 +277,8 @@ public Builder addPostAggregators(final List<PostAggregator> postAggregators)
);

// unlike aggregators, the type we see here is what we get, no further finalization will occur
add(name, postAggregator.getType());
// feed it the existing RowSignature for PostAggregator implementations whose output is dependent on input types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternate comment:

        // It's OK to call getType in the order that post-aggregators appear, because post-aggregators are only
        // allowed to refer to *earlier* post-aggregators (not later ones; the order is meaningful).

@gianm
Copy link
Contributor

gianm commented Oct 22, 2021

All CI passed except for "(Compile=openjdk8, Run=openjdk8, Cluster Build On K8s) ITNestedQueryPushDownTest integration test", which seems broken (a python issue?) and unrelated. So I'll merge this.

@gianm gianm merged commit 741b4ed into apache:master Oct 22, 2021
@clintropolis clintropolis deleted the expr-postagg-output-type branch October 22, 2021 21:08
@abhishekagarwal87 abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants