SQL timeseries no longer skip empty buckets with all granularity #11188

clintropolis · 2021-05-02T23:53:58Z

Fixes #11145 (maybe others since this has been an issue for quite a while now, but I couldn't find any while searching)

Description

This PR modifies the Druid SQL planner to no longer skip empty buckets when it generates a timeseries query with 'all' granularity, which should produce query result behavior that is more SQL compatible.

In SQL when running a query with only aggregators, such as SELECT COUNT(*), SUM(x) FROM y WHERE z = 'someval', then the aggregation is done as if there is a single universal grouping key, and so this example query should return something even if there are no values which match 'someval', and should produce [0, null] in this case.

We plan queries like this example into a Druid native timeseries query, and prior to this patch would always set skipEmptyBuckets query context parameter to 'true', because a timeseries query with a granularity other than 'ALL' is equivalent to grouping by a time floor expression, so empty results should be removed in this case. But when using 'ALL' granularity, such as we do in the example query when there is no time floor expression to be grouping on, this setting isn't appropriate, because it makes druid return empty results, instead of the initialized aggregators for the universal group like it correctly should.

I ran into this issue again when trying to write test for #11157 to test that something like SELECT ARRAY_AGG(x) FROM y WHERE z = 'missing value' produces a single [null] result, but then remembered that I couldn't write that test because of this issue 😅 .

The actual changes are small, ~5 lines in DruidQuery when constructing a timeseries query to check if 'ALL' granularity before setting 'skipEmptyBuckets' context parameter, everything else is test changes.

I went through and added tests for I think almost all of our SQL aggregator functions to capture their current outputs in both default mode and SQL compatible null handling mode for this empty timeseries result case, and got a bit carried away refactoring these tests to be simplified and extend BaseCalciteQueryTest which is why the PR is sort of large (but is a nice improvement for all of these SQL tests).

I'm not actually sure all of these aggregator functions behave correctly, notably some of the quantiles sketches which return Double.NaN in both modes, and some other sketches which return an empty sketch in both modes, but I'll save changing the behavior of any of these aggregators for another day.

Since this changes the default behavior of query results in that SQL queries which plan to timeseries queries with ALL granularity will now always produce a single result row instead of nothing. The old behavior can be retained by explicitly setting skipEmptyBuckets to true, which will overrule the planner behavior.

Key changed/added classes in this PR

DruidQuery.java

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

clintropolis · 2021-05-03T00:32:35Z

...rc/test/java/org/apache/druid/query/aggregation/histogram/sql/QuantileSqlAggregatorTest.java

-                      "(\"m1\" * 2)",
-                      ValueType.FLOAT,
-                      TestExprMacroTable.INSTANCE
+    cannotVectorize();


I think this is a bug that this cannot vectorize btw, the aggregator canVectorize method takes a ColumnInspector, but (in TimeseriesQueryEngine at least) this is the segment adapter, and not wrapped with the virtual columns, so it does not find column capabilities for v0 since it doesn't exist on the actual segment and so reports that it is not able to vectorize. The engine itself will confirm that v0 can vectorize, so the aggregator doesn't need to worry about that I think, but I think we might want a way to wrap the adapter with a new VirtualizedColumnInspector or something (think VirtualColumns.wrap but for a ColumnInspector instead of ColumnSelectorFactory) so that the column capabilities of the virtual columns are available to the aggregator to help make its decision on whether or not it can vectorize.

clintropolis · 2021-05-03T04:22:59Z

If #11157 goes in before this PR, this PR should be updated to add ARRAY_AGG to the non-vectorized built-in aggs test case. If #11182 is merged before this PR, the test case should be moved to the 'vectorized' built-in aggs test case. Likewise, those PRs should consider doing these things separately should this be merged first.

gianm · 2021-05-03T15:31:35Z

docs/querying/sql.md

+|`COUNT(DISTINCT expr)`|Counts distinct values of expr, which can be string, numeric, or hyperUnique. By default this is approximate, using a variant of [HyperLogLog](http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf). To get exact counts set "useApproximateCountDistinct" to "false". If you do this, expr must be string or numeric, since exact counts are not possible using hyperUnique columns. See also `APPROX_COUNT_DISTINCT(expr)`. In exact mode, only one distinct count per query is permitted unless `useGroupingSetForExactDistinct` is set to true in query contexts or broker configurations.|`0`|
+|`SUM(expr)`|Sums numbers.|`0` in 'default' mode, `null` in SQL compatible mode|
+|`MIN(expr)`|Takes the minimum of numbers.|`Long.MAX_VALUE` in 'default' mode, `null` in SQL compatible mode|
+|`MAX(expr)`|Takes the maximum of numbers.|`Long.MIN_VALUE` in 'default' mode, `null` in SQL compatible mode|


We should spell out what the value is. Druid users aren't necessarily going to be familiar with Java, so we shouldn't reference Java programming concepts.

gianm · 2021-05-03T15:33:48Z

docs/querying/sql.md

+|--------|-----|-------|
+|`COUNT(*)`|Counts the number of rows.|`0`|
+|`COUNT(DISTINCT expr)`|Counts distinct values of expr, which can be string, numeric, or hyperUnique. By default this is approximate, using a variant of [HyperLogLog](http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf). To get exact counts set "useApproximateCountDistinct" to "false". If you do this, expr must be string or numeric, since exact counts are not possible using hyperUnique columns. See also `APPROX_COUNT_DISTINCT(expr)`. In exact mode, only one distinct count per query is permitted unless `useGroupingSetForExactDistinct` is set to true in query contexts or broker configurations.|`0`|
+|`SUM(expr)`|Sums numbers.|`0` in 'default' mode, `null` in SQL compatible mode|


Hmm, I think we should consider using different phrases for these modes. For some reason, "default mode" doesn't evoke the proper concept for me. Maybe this instead:

null when druid.generic.useDefaultValueForNull = false, 0 when true

changed, and also flipped so druid.generic.useDefaultValueForNull=false is described first since the behavior is more sane on a page talking about SQL, though I hope it isn't too confusing putting it first to readers, since it isn't yet the default behavior.

gianm · 2021-05-03T15:35:45Z

docs/querying/sql.md

+|`APPROX_COUNT_DISTINCT_DS_HLL(expr, [lgK, tgtHllType])`|Counts distinct values of expr, which can be a regular column or an [HLL sketch](../development/extensions-core/datasketches-hll.md) column. The `lgK` and `tgtHllType` parameters are described in the HLL sketch documentation. This is always approximate, regardless of the value of "useApproximateCountDistinct". See also `COUNT(DISTINCT expr)`. The [DataSketches extension](../development/extensions-core/datasketches-extension.md) must be loaded to use this function.|`0`|
+|`APPROX_COUNT_DISTINCT_DS_THETA(expr, [size])`|Counts distinct values of expr, which can be a regular column or a [Theta sketch](../development/extensions-core/datasketches-theta.md) column. The `size` parameter is described in the Theta sketch documentation. This is always approximate, regardless of the value of "useApproximateCountDistinct". See also `COUNT(DISTINCT expr)`. The [DataSketches extension](../development/extensions-core/datasketches-extension.md) must be loaded to use this function.|`0`|
+|`DS_HLL(expr, [lgK, tgtHllType])`|Creates an [HLL sketch](../development/extensions-core/datasketches-hll.md) on the values of expr, which can be a regular column or a column containing HLL sketches. The `lgK` and `tgtHllType` parameters are described in the HLL sketch documentation. The [DataSketches extension](../development/extensions-core/datasketches-extension.md) must be loaded to use this function.|`'0'` (STRING)|
+|`DS_THETA(expr, [size])`|Creates a [Theta sketch](../development/extensions-core/datasketches-theta.md) on the values of expr, which can be a regular column or a column containing Theta sketches. The `size` parameter is described in the Theta sketch documentation. The [DataSketches extension](../development/extensions-core/datasketches-extension.md) must be loaded to use this function.|`'0.0'` (STRING)|


Err, is this good? Why does it return a string instead of a theta sketch?

Hmm, it actually returns a double, but we don't examine the finalized type so calcite thinks it is complex which is I guess how it ends up as a string instead of double because it just tries to serialize complex values (and I assume the same is true of the other sketches that return a string result). Theta sketch aggregator, along with many of the quantiles finalize their sketches by default, producing the estimate, instead of serializing the sketch like some of the other complex types do. A bit more discussion about it here #9419, but doesn't really look like it went anywhere.

I guess this also brings up the question of if we need to describe the difference between intermediary types and finalized types here (since this table is currently showing finalized types is why I called it the default value since it is what people will see, instead of like the initialized value of the aggregator which for all of these complex types is going to be an empty sketch).

Hmm, it actually returns a double, but we don't examine the finalized type so calcite thinks it is complex which is I guess how it ends up as a string instead of double because it just tries to serialize complex values (and I assume the same is true of the other sketches that return a string result).

Hmm, weird, but, OK. We might want to change it to something saner later, but we don't have to do that right now.

I guess this also brings up the question of if we need to describe the difference between intermediary types and finalized types here

I think the way you did it is right.

I don't think we need to describe the intermediary / finalized type difference in user-facing documentation. That should be an internal detail. IMO if there's cases where users might benefit from seeing the non-finalized values, it'd be better to expose them through postaggregators that have well defined semantics (like sketch_bytes_as_base64 or something).

gianm · 2021-05-03T15:36:06Z

docs/querying/sql.md

+|`APPROX_COUNT_DISTINCT_DS_THETA(expr, [size])`|Counts distinct values of expr, which can be a regular column or a [Theta sketch](../development/extensions-core/datasketches-theta.md) column. The `size` parameter is described in the Theta sketch documentation. This is always approximate, regardless of the value of "useApproximateCountDistinct". See also `COUNT(DISTINCT expr)`. The [DataSketches extension](../development/extensions-core/datasketches-extension.md) must be loaded to use this function.|`0`|
+|`DS_HLL(expr, [lgK, tgtHllType])`|Creates an [HLL sketch](../development/extensions-core/datasketches-hll.md) on the values of expr, which can be a regular column or a column containing HLL sketches. The `lgK` and `tgtHllType` parameters are described in the HLL sketch documentation. The [DataSketches extension](../development/extensions-core/datasketches-extension.md) must be loaded to use this function.|`'0'` (STRING)|
+|`DS_THETA(expr, [size])`|Creates a [Theta sketch](../development/extensions-core/datasketches-theta.md) on the values of expr, which can be a regular column or a column containing Theta sketches. The `size` parameter is described in the Theta sketch documentation. The [DataSketches extension](../development/extensions-core/datasketches-extension.md) must be loaded to use this function.|`'0.0'` (STRING)|
+|`APPROX_QUANTILE(expr, probability, [resolution])`|Computes approximate quantiles on numeric or [approxHistogram](../development/extensions-core/approximate-histograms.md#approximate-histogram-aggregator) exprs. The "probability" should be between 0 and 1 (exclusive). The "resolution" is the number of centroids to use for the computation. Higher resolutions will give more precise results but also have higher overhead. If not provided, the default resolution is 50. The [approximate histogram extension](../development/extensions-core/approximate-histograms.md) must be loaded to use this function.|`Double.NaN`|


Same comment as above: we shouldn't reference Java concepts. Let's just go with NaN.

gianm · 2021-05-03T15:47:22Z

docs/querying/sql.md

-|`ANY_VALUE(expr)`|Returns any value of `expr` including null. `expr` must be numeric. This aggregator can simplify and optimize the performance by returning the first encountered value (including null)|
-|`ANY_VALUE(expr, maxBytesPerString)`|Like `ANY_VALUE(expr)`, but for strings. The `maxBytesPerString` parameter determines how much aggregation space to allocate per string. Strings longer than this limit will be truncated. This parameter should be set as low as possible, since high values will lead to wasted memory.|
-|`GROUPING(expr, expr...)`|Returns a number to indicate which groupBy dimension is included in a row, when using `GROUPING SETS`. Refer to [additional documentation](aggregations.md#grouping-aggregator) on how to infer this number.|
+|Function|Notes|Default|


I hate to ask, but have you double-checked that for all these aggregators, the three implementations return the same values in the two null-handling modes? There are six cases to check: aggregator, buffer aggregator, vector aggregator vs. sql-compliant, replace-with-default.

Or does it not matter because we'll always use one of the impls and never use the other two? (If that's the case, I suggest adding a comment somewhere discussing that, perhaps in TimeseriesQueryQueryToolChest or in the javadocs for the aggregators themselves.)

I did eyeball it, but that sounds lame when I say it out loud. I guess the unit tests added will cover heap and vector aggregators for vectorized agg so will confirm at least those match, but the ones which cannot be vectorized only test heap aggs.

I'll add a similar group by query test that uses filtered aggs to match nothing to cover the buffer agg path to make sure everything matches up, which seems pretty useful to have anyway.

Added near duplicates of the timeseries tests using group by queries with filtered aggregations that matched nothing to get additional coverage on buffer aggregators and luckily everything matched up.

gianm · 2021-05-03T15:50:34Z

The concept LGTM (I'm pleasantly surprised it seems so easy), my comments at this point are all about the docs.

…nt-skip-for-all-gran

gianm · 2021-05-06T19:28:08Z

docs/querying/sql.md

+|`COUNT(DISTINCT expr)`|Counts distinct values of expr, which can be string, numeric, or hyperUnique. By default this is approximate, using a variant of [HyperLogLog](http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf). To get exact counts set "useApproximateCountDistinct" to "false". If you do this, expr must be string or numeric, since exact counts are not possible using hyperUnique columns. See also `APPROX_COUNT_DISTINCT(expr)`. In exact mode, only one distinct count per query is permitted unless `useGroupingSetForExactDistinct` is set to true in query contexts or broker configurations.|`0`|
+|`SUM(expr)`|Sums numbers.|`null` if `druid.generic.useDefaultValueForNull=false`, otherwise `0`|
+|`MIN(expr)`|Takes the minimum of numbers.|`null` if `druid.generic.useDefaultValueForNull=false`, otherwise `9223372036854775807` (maximum LONG value)|
+|`MAX(expr)`|Takes the maximum of numbers.|`0` in 'default' mode, `null` in SQL compatible mode|


Missed this one?

gianm · 2021-05-06T20:36:37Z

docs/querying/sql.md

+|`APPROX_COUNT_DISTINCT_DS_HLL(expr, [lgK, tgtHllType])`|Counts distinct values of expr, which can be a regular column or an [HLL sketch](../development/extensions-core/datasketches-hll.md) column. The `lgK` and `tgtHllType` parameters are described in the HLL sketch documentation. This is always approximate, regardless of the value of "useApproximateCountDistinct". See also `COUNT(DISTINCT expr)`. The [DataSketches extension](../development/extensions-core/datasketches-extension.md) must be loaded to use this function.|`0`|
+|`APPROX_COUNT_DISTINCT_DS_THETA(expr, [size])`|Counts distinct values of expr, which can be a regular column or a [Theta sketch](../development/extensions-core/datasketches-theta.md) column. The `size` parameter is described in the Theta sketch documentation. This is always approximate, regardless of the value of "useApproximateCountDistinct". See also `COUNT(DISTINCT expr)`. The [DataSketches extension](../development/extensions-core/datasketches-extension.md) must be loaded to use this function.|`0`|
+|`DS_HLL(expr, [lgK, tgtHllType])`|Creates an [HLL sketch](../development/extensions-core/datasketches-hll.md) on the values of expr, which can be a regular column or a column containing HLL sketches. The `lgK` and `tgtHllType` parameters are described in the HLL sketch documentation. The [DataSketches extension](../development/extensions-core/datasketches-extension.md) must be loaded to use this function.|`'0'` (STRING)|
+|`DS_THETA(expr, [size])`|Creates a [Theta sketch](../development/extensions-core/datasketches-theta.md) on the values of expr, which can be a regular column or a column containing Theta sketches. The `size` parameter is described in the Theta sketch documentation. The [DataSketches extension](../development/extensions-core/datasketches-extension.md) must be loaded to use this function.|`'0.0'` (STRING)|


Hmm, it actually returns a double, but we don't examine the finalized type so calcite thinks it is complex which is I guess how it ends up as a string instead of double because it just tries to serialize complex values (and I assume the same is true of the other sketches that return a string result).

Hmm, weird, but, OK. We might want to change it to something saner later, but we don't have to do that right now.

I guess this also brings up the question of if we need to describe the difference between intermediary types and finalized types here

I think the way you did it is right.

I don't think we need to describe the intermediary / finalized type difference in user-facing documentation. That should be an internal detail. IMO if there's cases where users might benefit from seeing the non-finalized values, it'd be better to expose them through postaggregators that have well defined semantics (like sketch_bytes_as_base64 or something).

gianm · 2021-05-06T23:03:09Z

docs/querying/sql.md

@@ -313,48 +313,51 @@ possible for two aggregators in the same SQL query to have different filters.

 Only the COUNT and ARRAY_AGG aggregations can accept DISTINCT.

+When no rows are selected, aggregate functions will return their initialized value for the grouping they belong to. What this value is exactly for a given aggregator is dependent on the configuration of Druid's SQL compatible null handling mode, controlled by `druid.generic.useDefaultValueForNull`. The table below defines the initial values for all aggregate functions in both modes.


Hmm this paragraph reads oddly to me for two reasons:

At first blush II think a typical user would think it's impossible for groups to exist that do not have any rows. (It isn't typical SQLy behavior.) So we should list some examples of when the default value will show up. I can think of two cases: grand total (aggregations with no group by) and filtered aggregators where the filter does not match any rows within the group.

Not all aggregators have behavior dependent on druid.generic.useDefaultValueForNull, so it's not technically correct to say this categorically. I don't think we need to mention druid.generic.useDefaultValueForNull at all here, actually, because the individual aggregators in the table below call it out when appropriate. Or, if we do mention it, we could just say that it "may depend on" rather than "is dependent on".

Welcome to opinions from others about how to express this most clearly.

gianm · 2021-05-06T23:05:02Z

@clintropolis If you want to merge this patch and then update the docs in a followup, that's fine by me.

abhishekagarwal87

LGTM. Thank you @clintropolis for adding these tests and documentation for these aggregators.

abhishekagarwal87 · 2021-05-06T10:20:51Z

docs/querying/sql.md

+and querying, so for correct behavior, it should be set on all Druid service types to be available at both ingestion time and query time. There is some overhead
 associated with the ability to handle NULLs; see the [segment internals](../design/segments.md#sql-compatible-null-handling)


Suggested change

and querying, so for correct behavior, it should be set on all Druid service types to be available at both ingestion time and query time. There is some overhead

associated with the ability to handle NULLs; see the [segment internals](../design/segments.md#sql-compatible-null-handling)

and querying, so for correct behavior, it should be set on all Druid service types to be available at both ingestion time and query time. There is some overhead associated with the ability to handle NULLs;

see the [segment internals](../design/segments.md#sql-compatible-null-handling)

oops, I missed this one earlier, is it just line break changes? Since line breaks on this file are already sort of a mess, i'm going to hold off on adding this so i don't have to churn through CI again, unless I have to make any more changes then will add this.

oof, I do have to make changes because changes in master have caused soft conflicts leading to compilation failure on TIMESERIES_CONTEXT_DEFAULT not being a thing anymore after this PR...

hmm, we're going to need to watch out for this for a while in any PR that is adding these SQL tests that use this context parameter.

…nt-skip-for-all-gran

…ery (#12065) Related to #11188 The above mentioned PR allowed timeseries queries to return a default result, when queries of type: select count(*) from table where dim1="_not_present_dim_" were executed. Before the PR, it returned no row, after the PR, it would return a row with value of count(*) as 0 (as expected by SQL standards of different dbs). In Grouping#applyProject, we can sometimes perform optimization of a groupBy query to a timeseries query if possible (when the keys of the groupBy are constants, as generated by automated tools). For example, in select count(*) from table where dim1="_present_dim_" group by "dummy_key", the groupBy clause can be removed. However, in the case when the filter doesn't return anything, i.e. select count(*) from table where dim1="_not_present_dim_" group by "dummy_key", the behavior of general databases would be to return nothing, while druid (due to above change) returns an empty row. This PR aims to fix this divergence of behavior. Example cases: select count(*) from table where dim1="_not_present_dim_" group by "dummy_key". CURRENT: Returns a row with count(*) = 0 EXPECTED: Return no row select 'A', dim1 from foo where m1 = 123123 and dim1 = '_not_present_again_' group by dim1 CURRENT: Returns a row with ('A', 'wat') EXPECTED: Return no row To do this, a boolean droppedDimensionsWhileApplyingProject has been added to Grouping which is true whenever we make changes to the original shape with optimization. Hence if a timeseries query has a grouping with this set to true, we set skipEmptyBuckets=true in the query context (i.e. donot return any row).

SQL timeseries no longer skip empty buckets with all granularity

89f0655

clintropolis added Area - Querying Release Notes Compatibility Area - SQL Area - Null Handling labels May 2, 2021

clintropolis commented May 3, 2021

View reviewed changes

add comment, fix tests

e60072f

clintropolis mentioned this pull request May 3, 2021

sql count(*) returns no results instead of 0 when no rows meet filter #11145

Closed

clintropolis added 4 commits May 2, 2021 18:23

the ol switcheroo

f3d3889

revert unintended change

404ab8a

docs and more tests

c1e7280

style

c91d83b

make checkstyle happy

b05b45c

gianm reviewed May 3, 2021

View reviewed changes

clintropolis added 2 commits May 3, 2021 20:27

Merge remote-tracking branch 'upstream/master' into sql-timeseries-do…

9447516

…nt-skip-for-all-gran

docs fixes and more tests

1a2f831

clintropolis mentioned this pull request May 4, 2021

ARRAY_AGG sql aggregator function #11157

Merged

7 tasks

clintropolis added 4 commits May 3, 2021 22:18

Merge remote-tracking branch 'upstream/master' into sql-timeseries-do…

aa1b32e

…nt-skip-for-all-gran

add docs, tests for array_agg

ac5f65d

fixes

a0d1073

oops

7bbf7aa

clintropolis mentioned this pull request May 6, 2021

add single input string expression dimension vector selector and better expression planning #11213

Merged

5 tasks

gianm reviewed May 6, 2021

View reviewed changes

abhishekagarwal87 approved these changes May 7, 2021

View reviewed changes

clintropolis added 2 commits May 7, 2021 02:08

doc stuffs

8ce3b9c

Merge remote-tracking branch 'upstream/master' into sql-timeseries-do…

9520873

…nt-skip-for-all-gran

fix compile, match doc style

875df59

gianm approved these changes May 7, 2021

View reviewed changes

clintropolis merged commit 691d7a1 into apache:master May 10, 2021

clintropolis deleted the sql-timeseries-dont-skip-for-all-gran branch May 10, 2021 17:13

clintropolis mentioned this pull request May 12, 2021

Vector Filtered Aggregates return the default buffer init values when no rows match the filter predicate #10644

Closed

clintropolis mentioned this pull request Jun 15, 2021

fix a bugs related to SQL type inference return type nullability #11327

Merged

4 tasks

clintropolis added this to the 0.22.0 milestone Aug 12, 2021

clintropolis mentioned this pull request Sep 3, 2021

[Draft] 0.22.0 Release Notes #11657

Closed

LakshSingla mentioned this pull request Dec 14, 2021

Return empty result when a group by gets optimized to a timeseries query #12065

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL timeseries no longer skip empty buckets with all granularity #11188

SQL timeseries no longer skip empty buckets with all granularity #11188

clintropolis commented May 2, 2021 •

edited

Loading

clintropolis May 3, 2021

clintropolis commented May 3, 2021 •

edited

Loading

gianm May 3, 2021

gianm May 3, 2021

clintropolis May 5, 2021

gianm May 3, 2021

clintropolis May 3, 2021

gianm May 6, 2021

gianm May 3, 2021

gianm May 3, 2021

gianm May 3, 2021

clintropolis May 3, 2021

clintropolis May 5, 2021

gianm commented May 3, 2021

gianm May 6, 2021

gianm May 6, 2021

gianm May 6, 2021

gianm commented May 6, 2021

abhishekagarwal87 left a comment

abhishekagarwal87 May 6, 2021

clintropolis May 7, 2021

clintropolis May 7, 2021

clintropolis May 7, 2021

		@@ -313,48 +313,51 @@ possible for two aggregators in the same SQL query to have different filters.

		Only the COUNT and ARRAY_AGG aggregations can accept DISTINCT.

		When no rows are selected, aggregate functions will return their initialized value for the grouping they belong to. What this value is exactly for a given aggregator is dependent on the configuration of Druid's SQL compatible null handling mode, controlled by `druid.generic.useDefaultValueForNull`. The table below defines the initial values for all aggregate functions in both modes.

		and querying, so for correct behavior, it should be set on all Druid service types to be available at both ingestion time and query time. There is some overhead
		associated with the ability to handle NULLs; see the [segment internals](../design/segments.md#sql-compatible-null-handling)

SQL timeseries no longer skip empty buckets with all granularity #11188

SQL timeseries no longer skip empty buckets with all granularity #11188

Conversation

clintropolis commented May 2, 2021 • edited Loading

Description

Key changed/added classes in this PR

Choose a reason for hiding this comment

clintropolis commented May 3, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gianm commented May 3, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gianm commented May 6, 2021

abhishekagarwal87 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clintropolis commented May 2, 2021 •

edited

Loading

clintropolis commented May 3, 2021 •

edited

Loading