Skip to content

MSQ: Subclass CalciteJoinQueryTest, other supporting changes.#14105

Merged
gianm merged 4 commits intoapache:masterfrom
gianm:msq-join-testing
Apr 25, 2023
Merged

MSQ: Subclass CalciteJoinQueryTest, other supporting changes.#14105
gianm merged 4 commits intoapache:masterfrom
gianm:msq-join-testing

Conversation

@gianm
Copy link
Contributor

@gianm gianm commented Apr 18, 2023

The main change is the new tests: we now subclass CalciteJoinQueryTest in CalciteSelectJoinQueryMSQTest twice, once for Broadcast and once for SortMerge.

Two supporting production changes for default-value mode:

  1. InputNumberDataSource is marked as concrete, to allow leftFilter to
    be pushed down to it.

  2. In default-value mode, numeric frame field readers can now return nulls.
    This is necessary when stacking joins on top of joins: nulls must be
    preserved for semantics that match broadcast joins and native queries.

  3. In default-value mode, StringFieldReader.isNull returns true on empty
    strings in addition to nulls. This is more consistent with the behavior
    of the selectors, which map empty strings to null as well in that mode.

As an effect of change (2), the InsertTimeNull change from #14020 (to replace null timestamps with default timestamps) is reverted. IMO, this is fine, as either behavior is defensible, and the change from #14020 hasn't been released yet.

The main change is the new tests: we now subclass CalciteJoinQueryTest
in CalciteSelectJoinQueryMSQTest twice, once for Broadcast and once for
SortMerge.

Two supporting production changes for default-value mode:

1) InputNumberDataSource is marked as concrete, to allow leftFilter to
   be pushed down to it.

2) In default-value mode, numeric frame field readers can now return nulls.
   This is necessary when stacking joins on top of joins: nulls must be
   preserved for semantics that match broadcast joins and native queries.

3) In default-value mode, StringFieldReader.isNull returns true on empty
   strings in addition to nulls. This is more consistent with the behavior
   of the selectors, which map empty strings to null as well in that mode.

As an effect of change (2), the InsertTimeNull change from apache#14020 (to
replace null timestamps with default timestamps) is reverted. IMO, this
is fine, as either behavior is defensible, and the change from apache#14020
hasn't been released yet.
@gianm gianm added the Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 label Apr 18, 2023
/**
* Runs {@link CalciteJoinQueryTest} but with MSQ engine.
*/
@RunWith(Enclosed.class)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice trick

+ "FROM foo LEFT JOIN lookup.lookyloo ON foo.dim2 = lookyloo.k\n"
+ "WHERE lookyloo.v <> 'xa' OR lookyloo.v IS NULL\n"
+ "GROUP BY lookyloo.v",
+ "FROM foo LEFT JOIN lookup.lookyloo ON foo.dim2 = lookyloo.k\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are lot of formatting changes hence its very easy to skip the actual change.

  • Couple of tests are marked notMsqCompatible();
  • sortIfSortBased result validation.

Are there more changes in this class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, that was the result of applying formatting in IntelliJ. I guess the last time this file was updated, the formatting wasn't applied. To help with reading the diff, Github has this "hide whitespace" feature for PR diffs. If you turn that on it looks like this:

https://github.com/apache/druid/pull/14105/files/ec47a89a785eaf5144aaf9db4752b29e002457c0?w=1#diff-4d1d101fd375322e3a765ae6aa738283f84967d508cf8afccf6f1572af2cd1c7

The substantive changes are indeed adding various notMsqCompatible and sortIfSortBased.

Copy link
Contributor

@cryptoe cryptoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since #14025 is already in 26.0, we need to merge this in 26.0 as well.

@cryptoe cryptoe added this to the 26.0 milestone Apr 25, 2023
@gianm gianm merged commit 89e7948 into apache:master Apr 25, 2023
@gianm gianm deleted the msq-join-testing branch April 25, 2023 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Documentation Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants