Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry pick #55418 to 23.3: Fix filtering by virtual columns with OR filter in query #55631

Conversation

robot-ch-test-poll1
Copy link
Contributor

Original pull-request #55418

This pull-request is a first step of an automated backporting.
It contains changes like after calling a local command git cherry-pick.
If you intend to continue backporting this changes, then resolve all conflicts if any.
Otherwise, if you do not want to backport them, then just close this pull-request.

The check results does not matter at this step - you can safely ignore them.

Note

This pull-request will be merged automatically as it reaches the mergeable state, do not merge it manually.

If the PR was closed and then reopened

If it stuck, check #55418 for pr-backports-created and delete it if necessary. Manually merging will do nothing, since pr-backports-created prevents the original PR #55418 from being processed.

If you want to recreate the PR: delete the pr-cherrypick label and delete this branch.
You may also need to delete the pr-backports-created label from the original PR.

azat and others added 3 commits October 10, 2023 20:47
The problem with the initial implementation #52653 was:
- OR can have multiple arguments
- It simply not correct to assume that if there are two arguments this is OK.
  Consider the following example:

    "WHERE (column_not_from_partition_by = 1) OR false OR false"

  Will be converted to:

    "WHERE false OR false"

And it will simply read nothing.

Yes, we could apply some optimization for bool, but this will not always
work, since to optimize things like "0 = 1" we need to execute it.

And the only way to make handle this correctly (with ability to ignore
some commands during filtering) is to make is_constant() function return
has it use something from the input block, so that we can be sure, that
we have some sensible, and not just "false".

Plus we cannot simply ignore the difference of the input and output
arguments of handling OR, we need to add always-true (1/true) if the
size is different, since otherwise it could break invariants (see
comment in the code).

This includes (but not limited to):
- _part* filtering for MergeTree
- _path/_file for various File/HDFS/... engines
- _table for Merge
- ...

P.S. analyzer does not have this bug, since it execute expression as
whole, and this is what filterBlockWithQuery() should do actually
instead, but this will be a more complex patch.

Signed-off-by: Azat Khuzhin <a.khuzhin@semrush.com>
Fix filtering by virtual columns with OR filter in query
@robot-ch-test-poll1 robot-ch-test-poll1 added pr-cherrypick Cherry-pick of merge-commit before backporting. Do not use manually - automated use only! do not test disable testing on pull request labels Oct 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do not test disable testing on pull request pr-cherrypick Cherry-pick of merge-commit before backporting. Do not use manually - automated use only!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants