fix bug join query with require time condition#14890
fix bug join query with require time condition#14890EungsopYoo wants to merge 3 commits intoapache:masterfrom
Conversation
|
can you explain what the bug is? |
|
Can you please add an example as in what was not working and what is the bug |
|
Description updated |
9777412 to
32c440d
Compare
32c440d to
35eca2b
Compare
|
Hey @EungsopYoo! Thanks for the PR, I'm going through it currently. Could you please resolve the conflict with master? |
LakshSingla
left a comment
There was a problem hiding this comment.
Thanks for the PR. I have added a review regarding the testing methodology and code practices we follow.
Though I haven't looked at the changes, why do you think JOIN queries should be exempted from a __time filter if the subqueries have the filter?
If that's the case, then say something like UNION should also be exempt from the restriction.
|
|
||
| @Ignore | ||
| @Override | ||
| public void testEquiJoin2() | ||
| { | ||
|
|
||
| } | ||
|
|
||
| @Ignore | ||
| @Override | ||
| public void testEquiJoin3() | ||
| { | ||
|
|
||
| } | ||
|
|
||
| @Ignore | ||
| @Override | ||
| public void testEquiJoin4() | ||
| { | ||
|
|
||
| } |
There was a problem hiding this comment.
- Can you add a comment on why these joins are not working with MSQ? Per our knowledge, MSQ supports joins and hence the inability to work with one is concerning and should be documented.
- There's a
msqIncompatibleflag that you can use instead, which is for this purpose itself unless this error happens in the planning phase.
| @Override | ||
| @Ignore | ||
| public void testEquiJoin2() | ||
| { | ||
|
|
||
| } |
There was a problem hiding this comment.
Try using @DecoupledIgnore annotation on the original test method.
There was a problem hiding this comment.
The tests should be moved to CalciteJoinQueryTest
| } | ||
|
|
||
| @Test | ||
| public void testEquiJoin() |
There was a problem hiding this comment.
Also, can you please use descriptive names for the test, where they represent what the test is primarily supposed to be verifying, or documents the regression why the test was added.
For example: testEquiJoinWhereLeftHandIsConstant
Actually join queries inherit __time filters from their sub queries, not exempt the __time filtering.
I think UNION ALL works basically well when requireTimeCondition=true. SELECT __time, SUM(cnt) cnt
FROM druid.foo
WHERE __time >= TIMESTAMP '2000-01-01 00:00:00' AND __time < TIMESTAMP '2000-01-02 00:00:00'
GROUP BY __time
UNION ALL
SELECT __time, SUM(cnt) cnt
FROM druid.foo
WHERE __time >= TIMESTAMP '2000-01-01 00:00:00' AND __time < TIMESTAMP '2000-01-02 00:00:00'
GROUP BY __time
// OKSELECT __time, SUM(cnt) cnt
FROM druid.foo
GROUP BY __time
UNION ALL
SELECT __time, SUM(cnt) cnt
FROM druid.foo
WHERE __time >= TIMESTAMP '2000-01-01 00:00:00' AND __time < TIMESTAMP '2000-01-02 00:00:00'
GROUP BY __time
// Error: Unknown exception
// requireTimeCondition is enabled, all queries must include a filter condition on the __time column
// org.apache.druid.sql.calcite.rel.CannotBuildQueryExceptionBut, its error message is a little weird in this condition. SELECT __time, SUM(cnt) cnt
FROM druid.foo
WHERE __time >= TIMESTAMP '2000-01-01 00:00:00' AND __time < TIMESTAMP '2000-01-02 00:00:00'
GROUP BY __time
UNION ALL
SELECT __time, SUM(cnt) cnt
FROM druid.foo
GROUP BY __time
// Query results were truncated midstream! This may indicate a server-side error or a client-side issue. Try re-running your query, or using a lower limit or a longer timeout. |
|
I revisited the PR, and it seems that you have added a condition on the Join clause on the top datasource, however the join datasource can be nested however deep, and we'd need to take that into account while performing the check. Something like the following structure |
| DataSource rightInner = ((QueryDataSource) right).getQuery().getDataSource(); | ||
| DataSourceAnalysis rightAnlaysisInner = rightInner.getAnalysis(); | ||
| if (rightInner instanceof JoinDataSource) { | ||
| return findMaxBaseDataSourceIntervals(rightInner, rightAnlaysisInner, defaults); |
There was a problem hiding this comment.
The right thing is to ensure that both left and right have time filters. Since this method is used only to check the eternity interval, it could be refactored to checkThatDataSourceHaveFiniteInterval and then the method throws an exception wherever there is an eternity interval list. and that check should be performed on children.
what do you think?
There was a problem hiding this comment.
There is also a DruidQuery#getFiltration method that checks and creates filtration on a data source. We can refactor the whole thing and move the check to there instead as well, however, that would need further testing because we don't want the method's semantics to change. Also, it would only be applicable to SQL queries.
There was a problem hiding this comment.
Also, what's the purpose of the requireTimeCondition? If it is to prevent the query from hitting all of the segments on the historicals, then it probably doesn't do a very good job of it, and perhaps can must be changed. The problems I see with it:
- Doesn't take into account the data source type. IMO should only be affecting the table data source since other data sources don't involve segments.
- Even if the SQL query has a time filtration, it isn't guaranteed that the data source has the time filtration as well. Ideally, it should, but there was a bug in UNNEST (earlier) that didn't push down the filtration to the data source, and the query would end up reading the whole data source. Perhaps other nuances in translation might also cause the underlying data source to not have a time filtration.
|
This pull request has been marked as stale due to 60 days of inactivity. |
|
This pull request/issue has been closed due to lack of activity. If you think that |
Description
Fixed bug join query with require time condition
If druid.sql.planner.requireTimeCondition is set true, the query fails but every subqueries has the filter on __time column.
This fails too.
So I fixed that find the base intervals of left and right data sources, instead of outer query, when the dataSource is a instance of JoinDataSource.
This PR has: