New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix filtering by virtual columns with OR filter in query #55418
Conversation
This is an automated comment for commit 769ed2e with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page Successful checks
|
probably it should fix #55288 as well |
The problem with the initial implementation ClickHouse#52653 was: - OR can have multiple arguments - It simply not correct to assume that if there are two arguments this is OK. Consider the following example: "WHERE (column_not_from_partition_by = 1) OR false OR false" Will be converted to: "WHERE false OR false" And it will simply read nothing. Yes, we could apply some optimization for bool, but this will not always work, since to optimize things like "0 = 1" we need to execute it. And the only way to make handle this correctly (with ability to ignore some commands during filtering) is to make is_constant() function return has it use something from the input block, so that we can be sure, that we have some sensible, and not just "false". Plus we cannot simply ignore the difference of the input and output arguments of handling OR, we need to add always-true (1/true) if the size is different, since otherwise it could break invariants (see comment in the code). This includes (but not limited to): - _part* filtering for MergeTree - _path/_file for various File/HDFS/... engines - _table for Merge - ... P.S. analyzer does not have this bug, since it execute expression as whole, and this is what filterBlockWithQuery() should do actually instead, but this will be a more complex patch. Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Interesting,
Looks similar to moby/moby#35888, moby/moby#38877 Maybe some oddities in the env...
|
Test failures looks unrelated now |
Is it possible to replace Docker with Podman? |
…31ead83b29094897fa0e8243041e8 Cherry pick #55418 to 23.7: Fix filtering by virtual columns with OR filter in query
…31ead83b29094897fa0e8243041e8 Cherry pick #55418 to 23.8: Fix filtering by virtual columns with OR filter in query
…31ead83b29094897fa0e8243041e8 Cherry pick #55418 to 23.9: Fix filtering by virtual columns with OR filter in query
Backport #55418 to 23.7: Fix filtering by virtual columns with OR filter in query
Backport #55418 to 23.8: Fix filtering by virtual columns with OR filter in query
Backport #55418 to 23.9: Fix filtering by virtual columns with OR filter in query
AFAIR I thought about this and it wasn't that easy, and you are right - likely it can be fixed that way (even though they use some common libraries unlikely this could be related in this particular case), but it also can have some podman-related issues that will pops up in CI. Maybe it will be simpler to find the problem in the docker itself, or maybe updating it will be enough... |
@azat it looks like |
Revert it. |
@antonio2368 you are right. That was over optimistic from my side to assume that my fix for this test #55627 addresses this problems, however the problem was completely different, since after my patch So I've resubmitted it here - #55678 with a fix for the test. Also note, that this PR had been already backported, while your revert had not, so you need to apply only the test hunk from the resubmit (and to avoid conflicts you need to apply #55627 first). |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix filtering by virtual columns with OR filter in query (
_part*
filtering forMergeTree
,_path
/_file
for variousFile
/HDFS
/... engines,_table
forMerge
)The problem with the initial implementation #52653 was:
OR can have multiple arguments
It simply not correct to assume that if there are two arguments this is OK.
Consider the following example:
"WHERE (column_not_from_partition_by = 1) OR false OR false"
Will be converted to:
"WHERE false OR false"
And it will simply read nothing.
Yes, we could apply some optimization for bool, but this will not always
work, since to optimize things like "0 = 1" we need to execute it.
And the only way I see to handle this correctly (with ability to ignore
some expressions during filtering) is to make is_constant() function return
has it use something from the input block, so that we can be sure, that
we have something sensible, and not just "false".
Plus we cannot simply ignore the difference of the input and output
arguments of handling OR, we need to add always-true (1/true) if the
size is different, since otherwise it could break invariants (see
comment in the code).
P.S. analyzer does not have this bug, since it execute expression as
whole, and this is what filterBlockWithQuery() should do actually
instead, but this will be a more complex patch.
Fixes: #52653 (cc @alexey-milovidov )
Fixes: #54779 (cc @slanderous-mambo )
Fixes: #55288