Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ambiguous reference error in filter plan #1925

Merged
merged 5 commits into from Mar 9, 2022
Merged

Conversation

jonmmease
Copy link
Contributor

Which issue does this PR close?

Closes #1411.

cc @kszucs

Rationale for this change

This PR adds an initially failing test in a060b40 that reproduces the behavior described in #1411.

The change in 2042fbb may not be correct overall, but it addresses this particular failure.

The core issue seems to be that filter optimization examines fields across all plans and then fails because there is an ambiguity. At least in the DataFrame context, I would expect there to be no ambiguity since only one of the columns is projected prior to filtering.

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Mar 4, 2022
@jonmmease jonmmease changed the title Towards fixing ambiguous reference error 1411 Towards fixing ambiguous reference error Mar 4, 2022
@jonmmease jonmmease changed the title Towards fixing ambiguous reference error Towards fixing ambiguous reference error in filter plan Mar 4, 2022
@jonmmease
Copy link
Contributor Author

unsurprisingly, this change causes a failure in the execution::context::tests::unprojected_filter test.

I'm not sure what the correct behavior here is. Should the projected schema take precedence to resolve ambiguity and then fall back to the combined schema if a column reference is not found?

@alamb
Copy link
Contributor

alamb commented Mar 7, 2022

I'm not sure what the correct behavior here is. Should the projected schema take precedence to resolve ambiguity and then fall back to the combined schema if a column reference is not found?

That sounds reasonable to me at a high level

@houqp wrote up the desired output name semantics here: https://arrow.apache.org/datafusion/specification/output-field-name-semantic.html

which might serve to guide you in your question

@alamb
Copy link
Contributor

alamb commented Mar 7, 2022

(thank you @jonmmease for raising this PR)

@houqp
Copy link
Member

houqp commented Mar 8, 2022

thank you @jonmmease for taking on this, I agree with @alamb that the order you outlined in your comment sounds like a good order to me.

@houqp houqp added the bug Something isn't working label Mar 8, 2022
@jonmmease jonmmease changed the title Towards fixing ambiguous reference error in filter plan Fix ambiguous reference error in filter plan Mar 8, 2022
@jonmmease
Copy link
Contributor Author

jonmmease commented Mar 8, 2022

Thanks for taking a look @alamb and @houqp! I made the proposed change.

This error poppup up in https://github.com/apache/arrow-datafusion/runs/5470235871?check_suite_focus=true:

 failures:

---- datasource::file_format::avro::tests::test stdout ----
thread 'datasource::file_format::avro::tests::test' panicked at 'Local file metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', datafusion/src/datasource/object_store/local.rs:178:40


failures:
    datasource::file_format::avro::tests::test

Maybe something flaky?

@alamb
Copy link
Contributor

alamb commented Mar 8, 2022

Maybe something flaky?

I think this is related to the fact that somehow this PR has changes to the testing submodule which I suspect was not intended:

Screen Shot 2022-03-08 at 3 06 16 PM

@jonmmease
Copy link
Contributor Author

Thanks @alamb, not sure how that got in there 🤦 Should be reverted now

@jonmmease
Copy link
Contributor Author

tests now passing

@houqp houqp requested a review from alamb March 9, 2022 04:04
@houqp
Copy link
Member

houqp commented Mar 9, 2022

Thanks @jonmmease

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jonmmease

@alamb alamb merged commit 3860cd3 into apache:master Mar 9, 2022
alex-spies pushed a commit to alex-spies/arrow-datafusion that referenced this pull request Oct 5, 2022
Fall back to the merged schema from the whole logical plan if the input
schema was not sufficient to resolve the datatype of a sub-expression.
This re-enables the fallback logic added in 3860cd3 (apache#1925).
alex-spies pushed a commit to alex-spies/arrow-datafusion that referenced this pull request Oct 5, 2022
Fall back to the merged schema from the whole logical plan if the input
schema was not sufficient to resolve the datatype of a sub-expression.
This re-enables the fallback logic added in 3860cd3 (apache#1925).
alamb pushed a commit that referenced this pull request Oct 11, 2022
* CommonSubexprEliminate: Fix additional col schema

* Use correct types in test id_array_visitor

* Re-enable fall back schema for datatype resolution

Fall back to the merged schema from the whole logical plan if the input
schema was not sufficient to resolve the datatype of a sub-expression.
This re-enables the fallback logic added in 3860cd3 (#1925).

* Add comment on fall-back logic using all schemas

Point out that it can likely be removed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ambiguous reference to aliased column
3 participants