New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce SQL interface for distinct count extension #13927
Introduce SQL interface for distinct count extension #13927
Conversation
} | ||
if (lhs == null) { | ||
return ((Number) rhs).longValue(); | ||
} | ||
return ((Number) lhs).longValue() + ((Number) rhs).longValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes combine
no longer work on nulls; was that not needed for some reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted
OperandTypes.ANY, | ||
OperandTypes.and( | ||
OperandTypes.sequence(SIGNATURE, OperandTypes.ANY, OperandTypes.LITERAL), | ||
OperandTypes.family(SqlTypeFamily.ANY, SqlTypeFamily.STRING) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the LITERAL STRING argument being used in the function body. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a look back at some other classes that extend SqlAggFunction
, particularly ApproxCountDistinctSqlAggFunction
, and noticed that doesn't take the bitmap factory argument. So we decided to simplify SEGMENT_DISTINCT in the same way. Is that OK?
.../java/org/apache/druid/query/aggregation/distinctcount/sql/SegmentDistinctSqlAggregator.java
Fixed
Show resolved
Hide resolved
final ColumnType inputType = Calcites.getColumnTypeForRelDataType(dataType); | ||
|
||
if (inputType == null) { | ||
throw new ISE( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use org.apache.druid.sql.calcite.planner.UnsupportedSQLQueryException
instead of ISE. Please refer to the class documentation why the former is preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can you please explain why this inputType check is required? If we don't create the dimensionSpec below (as mentioned in another comment of mine), we probably won't run into an error with inputType being null in this code.
Would nullity of inputType cause any issue in the aggregation, and if so can you please update with a comment?
@@ -45,6 +45,7 @@ public void aggregate() | |||
IndexedInts row = selector.getRow(); | |||
for (int i = 0, rowSize = row.size(); i < rowSize; i++) { | |||
int index = row.get(i); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We can revert this change
dimensionSpec = new DefaultDimensionSpec(virtualColumnName, null, inputType); | ||
} | ||
|
||
aggregatorFactory = new DistinctCountAggregatorFactory(name, dimensionSpec.getDimension(), null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems slightly counter-intuitive that we are creating a dimension spec in the above cases just to get dimensionSpec.getDimension()
while creating the final aggregator.
Instead of Line#116, can we do dimensionName = columnArg.getSimpleExtraction().getColumn
(since its a direct column access0, and in Line#122 we do dimensionName = virtualColumnName
and pass that to the aggregator factory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, will try that out!
final ColumnType inputType = Calcites.getColumnTypeForRelDataType(dataType); | ||
|
||
if (inputType == null) { | ||
throw new ISE( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, can you please explain why this inputType check is required? If we don't create the dimensionSpec below (as mentioned in another comment of mine), we probably won't run into an error with inputType being null in this code.
Would nullity of inputType cause any issue in the aggregation, and if so can you please update with a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After re-reviewing the PR, here are a few high-level comments:
- Since the extension only works when certain pre-conditions are met, there should be some form of validation in the aggregator or the SQL function that errors out when pre-conditions are met.
- A test run is failing probably because the test cases don't take into account the behaviour for nulls. Can you fix those as well?
Hi, @sduffey-partnerize! Did you make progress on the PR? |
This pull request has been marked as stale due to 60 days of inactivity. |
This pull request/issue has been closed due to lack of activity. If you think that |
Description
Introduce a SQL interface for the distinctcount extension, via a new function
SEGMENT_DISTINCT
.Added
calcite
anddruid-sql
as dependencies of distinctcount, then introducedSegmentDistinctSqlAggregator
, an implementation of calcite'sSqlAggregator
Need some direction on documentation. For example, would we want to see the SQL equivalents of the examples that already exist here? Anything else?
Release note
New: You can now use distinct count in a SQL query with SEGMENT_DISTINCT
Key changed/added classes in this PR
org.apache.druid.query.aggregation.distinctcount.sql.SegmentDistinctSqlAggregator
This PR has: