-
Notifications
You must be signed in to change notification settings - Fork 967
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ambiguous reference error in filter plan #1925
Conversation
unsurprisingly, this change causes a failure in the I'm not sure what the correct behavior here is. Should the projected schema take precedence to resolve ambiguity and then fall back to the combined schema if a column reference is not found? |
That sounds reasonable to me at a high level @houqp wrote up the desired output name semantics here: https://arrow.apache.org/datafusion/specification/output-field-name-semantic.html which might serve to guide you in your question |
(thank you @jonmmease for raising this PR) |
thank you @jonmmease for taking on this, I agree with @alamb that the order you outlined in your comment sounds like a good order to me. |
then fall back to using all schemas
Thanks for taking a look @alamb and @houqp! I made the proposed change. This error poppup up in https://github.com/apache/arrow-datafusion/runs/5470235871?check_suite_focus=true:
Maybe something flaky? |
Thanks @alamb, not sure how that got in there 🤦 Should be reverted now |
tests now passing |
Thanks @jonmmease |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jonmmease
Fall back to the merged schema from the whole logical plan if the input schema was not sufficient to resolve the datatype of a sub-expression. This re-enables the fallback logic added in 3860cd3 (apache#1925).
Fall back to the merged schema from the whole logical plan if the input schema was not sufficient to resolve the datatype of a sub-expression. This re-enables the fallback logic added in 3860cd3 (apache#1925).
* CommonSubexprEliminate: Fix additional col schema * Use correct types in test id_array_visitor * Re-enable fall back schema for datatype resolution Fall back to the merged schema from the whole logical plan if the input schema was not sufficient to resolve the datatype of a sub-expression. This re-enables the fallback logic added in 3860cd3 (#1925). * Add comment on fall-back logic using all schemas Point out that it can likely be removed.
Which issue does this PR close?
Closes #1411.
cc @kszucs
Rationale for this change
This PR adds an initially failing test in a060b40 that reproduces the behavior described in #1411.
The change in 2042fbb may not be correct overall, but it addresses this particular failure.
The core issue seems to be that filter optimization examines fields across all plans and then fails because there is an ambiguity. At least in the DataFrame context, I would expect there to be no ambiguity since only one of the columns is projected prior to filtering.