Skip to content

Comments

fix bug join query with require time condition#14890

Closed
EungsopYoo wants to merge 3 commits intoapache:masterfrom
EungsopYoo:fix-join-with-requireTimeCondition
Closed

fix bug join query with require time condition#14890
EungsopYoo wants to merge 3 commits intoapache:masterfrom
EungsopYoo:fix-join-with-requireTimeCondition

Conversation

@EungsopYoo
Copy link

@EungsopYoo EungsopYoo commented Aug 22, 2023

Description

Fixed bug join query with require time condition

If druid.sql.planner.requireTimeCondition is set true, the query fails but every subqueries has the filter on __time column.

SELECT a.__time, a.cnt, b.cnt
FROM (
  SELECT __time, SUM(cnt) cnt
  FROM druid.foo
  WHERE __time >= TIMESTAMP '2000-01-01 00:00:00' AND __time < TIMESTAMP '2000-01-02 00:00:00'
  GROUP BY __time
) a
JOIN (
  SELECT __time, SUM(cnt) cnt
  FROM druid.foo
  WHERE __time >= TIMESTAMP '2000-01-01 00:00:00' AND __time < TIMESTAMP '2000-01-02 00:00:00'
  GROUP BY __time
) b on (a.__time = b.__time)
Error: Unknown exception

requireTimeCondition is enabled, all queries must include a filter condition on the __time column

org.apache.druid.sql.calcite.rel.CannotBuildQueryException

This fails too.

SELECT a.__time, a.cnt, b.cnt
FROM (
  SELECT __time, SUM(cnt) cnt
  FROM druid.foo
  WHERE __time >= TIMESTAMP '2000-01-01 00:00:00' AND __time < TIMESTAMP '2000-01-02 00:00:00'
  GROUP BY __time
) a
JOIN (
  SELECT __time, SUM(cnt) cnt
  FROM druid.foo
  WHERE __time >= TIMESTAMP '2000-01-01 00:00:00' AND __time < TIMESTAMP '2000-01-02 00:00:00'
  GROUP BY __time
) b on (a.__time = b.__time)
WHERE a.__time >= TIMESTAMP '2000-01-01 00:00:00' AND a.__time < TIMESTAMP '2000-01-02 00:00:00'

So I fixed that find the base intervals of left and right data sources, instead of outer query, when the dataSource is a instance of JoinDataSource.

  • Choice of algorithms
    • If dataSource is a instance of JoinDataSource, find the base intervals of left and right data sources.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@abhishekagarwal87
Copy link
Contributor

can you explain what the bug is?

@somu-imply
Copy link
Contributor

Can you please add an example as in what was not working and what is the bug

@EungsopYoo
Copy link
Author

EungsopYoo commented Aug 23, 2023

Description updated

@EungsopYoo EungsopYoo force-pushed the fix-join-with-requireTimeCondition branch 2 times, most recently from 9777412 to 32c440d Compare August 24, 2023 06:29
@EungsopYoo EungsopYoo force-pushed the fix-join-with-requireTimeCondition branch from 32c440d to 35eca2b Compare August 24, 2023 06:30
@adarshsanjeev
Copy link
Contributor

Hey @EungsopYoo! Thanks for the PR, I'm going through it currently. Could you please resolve the conflict with master?

@github-actions github-actions bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Oct 4, 2023
Copy link
Contributor

@LakshSingla LakshSingla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I have added a review regarding the testing methodology and code practices we follow.
Though I haven't looked at the changes, why do you think JOIN queries should be exempted from a __time filter if the subqueries have the filter?
If that's the case, then say something like UNION should also be exempt from the restriction.

Comment on lines +177 to +197

@Ignore
@Override
public void testEquiJoin2()
{

}

@Ignore
@Override
public void testEquiJoin3()
{

}

@Ignore
@Override
public void testEquiJoin4()
{

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can you add a comment on why these joins are not working with MSQ? Per our knowledge, MSQ supports joins and hence the inability to work with one is concerning and should be documented.
  2. There's a msqIncompatible flag that you can use instead, which is for this purpose itself unless this error happens in the planning phase.

Comment on lines +65 to +70
@Override
@Ignore
public void testEquiJoin2()
{

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try using @DecoupledIgnore annotation on the original test method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests should be moved to CalciteJoinQueryTest

}

@Test
public void testEquiJoin()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can you please use descriptive names for the test, where they represent what the test is primarily supposed to be verifying, or documents the regression why the test was added.

For example: testEquiJoinWhereLeftHandIsConstant

@EungsopYoo
Copy link
Author

EungsopYoo commented Oct 18, 2023

Thanks for the PR. I have added a review regarding the testing methodology and code practices we follow. Though I haven't looked at the changes, why do you think JOIN queries should be exempted from a __time filter if the subqueries have the filter?

Actually join queries inherit __time filters from their sub queries, not exempt the __time filtering.

If that's the case, then say something like UNION should also be exempt from the restriction.

I think UNION ALL works basically well when requireTimeCondition=true.

SELECT __time, SUM(cnt) cnt
FROM druid.foo
WHERE __time >= TIMESTAMP '2000-01-01 00:00:00' AND __time < TIMESTAMP '2000-01-02 00:00:00'
GROUP BY __time
UNION ALL
SELECT __time, SUM(cnt) cnt
FROM druid.foo
WHERE __time >= TIMESTAMP '2000-01-01 00:00:00' AND __time < TIMESTAMP '2000-01-02 00:00:00'
GROUP BY __time

// OK
SELECT __time, SUM(cnt) cnt
FROM druid.foo
GROUP BY __time
UNION ALL
SELECT __time, SUM(cnt) cnt
FROM druid.foo
WHERE __time >= TIMESTAMP '2000-01-01 00:00:00' AND __time < TIMESTAMP '2000-01-02 00:00:00'
GROUP BY __time

// Error: Unknown exception
// requireTimeCondition is enabled, all queries must include a filter condition on the __time column
// org.apache.druid.sql.calcite.rel.CannotBuildQueryException

But, its error message is a little weird in this condition.

SELECT __time, SUM(cnt) cnt
FROM druid.foo
WHERE __time >= TIMESTAMP '2000-01-01 00:00:00' AND __time < TIMESTAMP '2000-01-02 00:00:00'
GROUP BY __time
UNION ALL
SELECT __time, SUM(cnt) cnt
FROM druid.foo
GROUP BY __time

// Query results were truncated midstream! This may indicate a server-side error or a client-side issue. Try re-running your query, or using a lower limit or a longer timeout.

@LakshSingla
Copy link
Contributor

I revisited the PR, and it seems that you have added a condition on the Join clause on the top datasource, however the join datasource can be nested however deep, and we'd need to take that into account while performing the check. Something like the following structure


     Q
      |
     Q
     ...
     J
   /.    \
  Q.    .....
  |
  Q
 /. \
Da Db

DataSource rightInner = ((QueryDataSource) right).getQuery().getDataSource();
DataSourceAnalysis rightAnlaysisInner = rightInner.getAnalysis();
if (rightInner instanceof JoinDataSource) {
return findMaxBaseDataSourceIntervals(rightInner, rightAnlaysisInner, defaults);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right thing is to ensure that both left and right have time filters. Since this method is used only to check the eternity interval, it could be refactored to checkThatDataSourceHaveFiniteInterval and then the method throws an exception wherever there is an eternity interval list. and that check should be performed on children.
what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a DruidQuery#getFiltration method that checks and creates filtration on a data source. We can refactor the whole thing and move the check to there instead as well, however, that would need further testing because we don't want the method's semantics to change. Also, it would only be applicable to SQL queries.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what's the purpose of the requireTimeCondition? If it is to prevent the query from hitting all of the segments on the historicals, then it probably doesn't do a very good job of it, and perhaps can must be changed. The problems I see with it:

  1. Doesn't take into account the data source type. IMO should only be affecting the table data source since other data sources don't involve segments.
  2. Even if the SQL query has a time filtration, it isn't guaranteed that the data source has the time filtration as well. Ideally, it should, but there was a bug in UNNEST (earlier) that didn't push down the filtration to the data source, and the query would end up reading the whole data source. Perhaps other nuances in translation might also cause the underlying data source to not have a time filtration.

@github-actions
Copy link

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Feb 28, 2024
@github-actions
Copy link

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying Bug stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants