Skip to content

Conversation

@liukun4515
Copy link
Contributor

Which issue does this PR close?

part of #4247

  • fix bug for right semi join with filter and add test case for this bug
  • add test for left semi join with filter
  • change the name of test cases

TODO in the next PR: fix the right anti join with filter; Meet some issue about the codebase, will file a new PR to fix that.

@alamb @Dandandan

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label Nov 22, 2022
let stream = join.execute(0, task_ctx)?;
let batches = common::collect(stream).await?;

let expected = vec![
Copy link
Contributor Author

@liukun4515 liukun4515 Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If don't fix this bug, the result will be

[
    "+----+----+----+",
    "| a1 | b1 | c1 |",
    "+----+----+----+",
    "+----+----+----+",
]

@liukun4515 liukun4515 force-pushed the hash_join_right_semi_#4247 branch from 64fa5dc to 3f95e42 Compare November 22, 2022 13:50
@liukun4515 liukun4515 force-pushed the hash_join_right_semi_#4247 branch from 3f95e42 to d884bb5 Compare November 22, 2022 13:52
}

#[tokio::test]
async fn join_left_semi_with_filter() -> Result<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add test cases to ensure left semi with filter is correct.

liukun4515 referenced this pull request Nov 22, 2022
@liukun4515
Copy link
Contributor Author

This bug is caused by this commit d6f0b12#r90728800

&keys_values,
*null_equals_null,
)? {
left_indices.append(i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good find. I missed the case where this would be used.
Thanks for adding the unit test!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @liukun4515 and @Dandandan

Column::new_with_schema("b1", &right.schema())?,
)];

// build filter right.b2 > 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the key is this filter. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the key is this filter. 👍

Yes, we can add filter to filter left row or right row, and get the new result.

@alamb alamb merged commit 92325bf into apache:master Nov 22, 2022
@ursabot
Copy link

ursabot commented Nov 22, 2022

Benchmark runs are scheduled for baseline = 94cd982 and contender = 92325bf. 92325bf is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@liukun4515
Copy link
Contributor Author

liukun4515 commented Nov 23, 2022

Thanks @liukun4515 and @Dandandan

After I go through the code of HashJoin, I am inspired by the thought in that, and find some point which need to be improved.
The refactor will help to fix the bug of right anti with filter.

@liukun4515
Copy link
Contributor Author

find a new bug for the right semi join again for break line https://github.com/liukun4515/arrow-datafusion/blob/d884bb50a312b7fb01d6e3ffd204919100ae3893/datafusion/core/src/physical_plan/joins/hash_join.rs#L857

for example

left 
a  b
1   2
1   3
2  4
right
a
1

with sql

left  `right semi` join right on left.a=right.a and left.b!=2

In the current code, we will get the empty result because of break in the first match

@mingmwang
Copy link
Contributor

Yes, it is a bug indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants