Skip to content

Conversation

@seddonm1
Copy link
Contributor

Which issue does this PR close?

Reopens #777.

Rationale for this change

The previous resolution strategy ignored the qualifications on the columns if they were present so it required unique column names for resolution. This one is smarter and allows out-of-order qualified fields in the join condition.

What changes are included in this PR?

This change makes better use of the information provided to the join logical plan builder to better resolve column relations. More work could be done to deal with other 'error' conditions (like referring to the same 'left' relation twice) but I wanted to get feedback first.

There are still outstanding issues unrelated that i am working through.

Are there any user-facing changes?

More queries will behave like the SQL standard.

@seddonm1 seddonm1 marked this pull request as draft July 31, 2021 02:37
@seddonm1 seddonm1 marked this pull request as draft July 31, 2021 02:37
@alamb
Copy link
Contributor

alamb commented Aug 1, 2021

Thanks @seddonm1 -- I'll plan to check this one out more carefully tomorrow

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good @seddonm1 -- thank you. I had a few comments, but this looks like it is heading in the right direction to me. 👍

(true, _, _, true) => (Ok(l), Ok(r)),
(_, true, true, _) => (Ok(r), Ok(l)),
(_, _, _, _) => (
Err(DataFusionError::Plan(format!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldn't this error also happen if the column was found in both the left and right input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was leaving this until an initial review to understand if this was completely in the wrong direction. I can add further conditions and appropriate messages.

(Some(lr), Some(rr)) => {
let l_is_left =
self.plan.all_schemas().iter().any(|schema| {
schema.fields().iter().any(|field| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be cool if we could avoid the repetition here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I will see what we can do. I was trying to avoid heavy work for non-qualified tables but will revisit.

let mut ctx = create_join_context_qualified()?;
let equivalent_sql = [
"SELECT t1.a, t2.b FROM t1 INNER JOIN t2 ON t1.a = t2.a ORDER BY t1.a",
"SELECT t1.a, t2.b FROM t1 INNER JOIN t2 ON t2.a = t1.a ORDER BY t1.a",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about something like

        "SELECT t1.a, t2.b FROM t1 INNER JOIN t2 ON a = t1.a ORDER BY t1.a",

? I would expect it to given an error about ambiguous relation name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I will add more tests.

let l_is_left =
self.plan.all_schemas().iter().any(|schema| {
schema.fields().iter().any(|field| {
field.qualifier().unwrap() == &lr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we ensuring that field.qualifier() is not None (aka how do we know that unwrap() won't panic?

Maybe this would be safer if expressed something like

field.qualifier().map(|q| q == &lr).unwrap_or(false)

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Thanks. Let me go through this more carefully.

@seddonm1
Copy link
Contributor Author

seddonm1 commented Aug 3, 2021

Thank you for the review @alamb.

I have refactored to reduce repetition (by reusing some existing methods) but still have the ambiguous column problem outstanding.

I would like to know your thoughts on merging this one first and then I will raise another issue and make a PR to address the ambiguity (which is an issue on master already).

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking good @seddonm1 👍 Addressing ambiguity as a follow on PR sounds like a great idea to me

@seddonm1
Copy link
Contributor Author

seddonm1 commented Aug 4, 2021

@alamb I have done a minor cleaning pass (removing unnecessary clone) and standardising the match. I think this is good to go.

FYI I am finding quite a few bugs in the current Datafusion JOIN behaviour that I will try to work through.

@seddonm1 seddonm1 marked this pull request as ready for review August 4, 2021 21:35
Copy link
Member

@houqp houqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @seddonm1 !

@alamb alamb merged commit db214bc into apache:master Aug 5, 2021
@alamb
Copy link
Contributor

alamb commented Aug 5, 2021

Thanks again @seddonm1

@houqp houqp added sql SQL Planner bug Something isn't working labels Aug 6, 2021
unkloud pushed a commit to unkloud/datafusion that referenced this pull request Mar 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants