Skip to content

Conversation

@tglanz
Copy link
Contributor

@tglanz tglanz commented Nov 17, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Take correct columns

Are these changes tested?

Yes,

  • added rust tests to hit invalid code paths
  • sqltests
  • fuzz tests enhancement to fuzzify columns count

Fuzz tests are taken from @rluvaton 's #18788 , excluding those this PR doesn't fix:

    fuzz_cases::join_fuzz::test_right_anti_join_1k_binary_filtered
    fuzz_cases::join_fuzz::test_right_anti_join_1k_filtered
    fuzz_cases::join_fuzz::test_right_semi_join_1k_filtered

Are there any user-facing changes?

let mut left_columns = null_joined_batch
.columns()
.iter()
.take(right_columns_length)
Copy link
Member

@rluvaton rluvaton Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should also be changed to:

Suggested change
.take(left_columns_length)

@rluvaton
Copy link
Member

I was able to reproduce the bug when changing the sort_merge_join.slt to have 3 columns in t2 rather than 2 like in t1 (below is the diff I changed in the sort_merge_join.slt file).

Can you please update the description and update/add tests (I would update the sort_merge_join.slt like in the diff below to make sure all tests are testing that. make sure to add a comment on why 3 columns and t1 2).

Updated the slt file to reproduce the error
diff --git a/datafusion/sqllogictest/test_files/sort_merge_join.slt b/datafusion/sqllogictest/test_files/sort_merge_join.slt
--- a/datafusion/sqllogictest/test_files/sort_merge_join.slt	(revision f3980641660997345af6061dc3b34f365020bd07)
+++ b/datafusion/sqllogictest/test_files/sort_merge_join.slt	(date 1763411346050)
@@ -26,7 +26,7 @@
 CREATE TABLE t1(a text, b int) AS VALUES ('Alice', 50), ('Alice', 100), ('Bob', 1);
 
 statement ok
-CREATE TABLE t2(a text, b int) AS VALUES ('Alice', 2), ('Alice', 1);
+CREATE TABLE t2(a text, b int, c int) AS VALUES ('Alice', 2, 77), ('Alice', 1, 66);
 
 # inner join query plan with join filter
 query TT
@@ -64,83 +64,83 @@
 ----
 
 # left join without join filter
-query TITI rowsort
+query TITII rowsort
 SELECT * FROM t1 LEFT JOIN t2 ON t1.a = t2.a
 ----
-Alice 100 Alice 1
-Alice 100 Alice 2
-Alice 50 Alice 1
-Alice 50 Alice 2
-Bob 1 NULL NULL
+Alice 100 Alice 1 66
+Alice 100 Alice 2 77
+Alice 50 Alice 1 66
+Alice 50 Alice 2 77
+Bob 1 NULL NULL NULL
 
 # left join with join filter
-query TITI rowsort
+query TITII rowsort
 SELECT * FROM t1 LEFT JOIN t2 ON t1.a = t2.a AND t2.b * 50 <= t1.b
 ----
-Alice 100 Alice 1
-Alice 100 Alice 2
-Alice 50 Alice 1
-Bob 1 NULL NULL
+Alice 100 Alice 1 66
+Alice 100 Alice 2 77
+Alice 50 Alice 1 66
+Bob 1 NULL NULL NULL
 
-query TITI rowsort
+query TITII rowsort
 SELECT * FROM t1 LEFT JOIN t2 ON t1.a = t2.a AND t2.b < t1.b
 ----
-Alice 100 Alice 1
-Alice 100 Alice 2
-Alice 50 Alice 1
-Alice 50 Alice 2
-Bob 1 NULL NULL
+Alice 100 Alice 1 66
+Alice 100 Alice 2 77
+Alice 50 Alice 1 66
+Alice 50 Alice 2 77
+Bob 1 NULL NULL NULL
 
 # right join without join filter
-query TITI rowsort
+query TITII rowsort
 SELECT * FROM t1 RIGHT JOIN t2 ON t1.a = t2.a
 ----
-Alice 100 Alice 1
-Alice 100 Alice 2
-Alice 50 Alice 1
-Alice 50 Alice 2
+Alice 100 Alice 1 66
+Alice 100 Alice 2 77
+Alice 50 Alice 1 66
+Alice 50 Alice 2 77
 
 # right join with join filter
-query TITI rowsort
+query TITII rowsort
 SELECT * FROM t1 RIGHT JOIN t2 ON t1.a = t2.a AND t2.b * 50 <= t1.b
 ----
-Alice 100 Alice 1
-Alice 100 Alice 2
-Alice 50 Alice 1
+Alice 100 Alice 1 66
+Alice 100 Alice 2 77
+Alice 50 Alice 1 66
 
-query TITI rowsort
+query TITII rowsort
 SELECT * FROM t1 RIGHT JOIN t2 ON t1.a = t2.a AND t1.b > t2.b
 ----
-Alice 100 Alice 1
-Alice 100 Alice 2
-Alice 50 Alice 1
-Alice 50 Alice 2
+Alice 100 Alice 1 66
+Alice 100 Alice 2 77
+Alice 50 Alice 1 66
+Alice 50 Alice 2 77
 
 # full join without join filter
-query TITI rowsort
+query TITII rowsort
 SELECT * FROM t1 FULL JOIN t2 ON t1.a = t2.a
 ----
-Alice 100 Alice 1
-Alice 100 Alice 2
-Alice 50 Alice 1
-Alice 50 Alice 2
-Bob 1 NULL NULL
+Alice 100 Alice 1 66
+Alice 100 Alice 2 77
+Alice 50 Alice 1 66
+Alice 50 Alice 2 77
+Bob 1 NULL NULL NULL
 
-query TITI rowsort
+query TITII rowsort
 SELECT * FROM t1 FULL JOIN t2 ON t1.a = t2.a AND t2.b * 50 > t1.b
 ----
-Alice 100 NULL NULL
-Alice 50 Alice 2
-Bob 1 NULL NULL
-NULL NULL Alice 1
+Alice 100 NULL NULL NULL
+Alice 50 Alice 2 77
+Bob 1 NULL NULL NULL
+NULL NULL Alice 1 66
 
-query TITI rowsort
+query TITII rowsort
 SELECT * FROM t1 FULL JOIN t2 ON t1.a = t2.a AND t1.b > t2.b + 50
 ----
-Alice 100 Alice 1
-Alice 100 Alice 2
-Alice 50 NULL NULL
-Bob 1 NULL NULL
+Alice 100 Alice 1 66
+Alice 100 Alice 2 77
+Alice 50 NULL NULL NULL
+Bob 1 NULL NULL NULL
 
 statement ok
 DROP TABLE t1;

@tglanz
Copy link
Contributor Author

tglanz commented Nov 17, 2025

Thanks @rluvaton, Sure

@tglanz tglanz force-pushed the fix/smj-take-correct-columns branch 2 times, most recently from 4c629e2 to 9df5882 Compare November 18, 2025 13:36
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Nov 18, 2025
@tglanz tglanz force-pushed the fix/smj-take-correct-columns branch 2 times, most recently from ec3ca20 to 11a25cb Compare November 18, 2025 13:44
@tglanz tglanz force-pushed the fix/smj-take-correct-columns branch from 11a25cb to e1de055 Compare November 18, 2025 15:45
@tglanz tglanz changed the title fix: SMJ Right take correct columns fix: Pick correct columns in Sort Merge Equijoin Nov 18, 2025
@tglanz tglanz marked this pull request as ready for review November 18, 2025 15:46
@tglanz tglanz force-pushed the fix/smj-take-correct-columns branch from 8eaeff5 to 29eae5b Compare November 18, 2025 15:57
@github-actions github-actions bot added the core Core DataFusion crate label Nov 18, 2025
@tglanz tglanz force-pushed the fix/smj-take-correct-columns branch from 5c530b0 to 29eae5b Compare November 18, 2025 16:36
@github-actions github-actions bot removed the core Core DataFusion crate label Nov 18, 2025
@tglanz tglanz force-pushed the fix/smj-take-correct-columns branch from 29eae5b to e7d80ae Compare November 18, 2025 16:36
@github-actions github-actions bot added the core Core DataFusion crate label Nov 18, 2025
@tglanz
Copy link
Contributor Author

tglanz commented Nov 18, 2025

Fuzz tests are taken from @rluvaton 's #18788 , excluding those this PR doesn't fix:

    fuzz_cases::join_fuzz::test_right_anti_join_1k_binary_filtered
    fuzz_cases::join_fuzz::test_right_anti_join_1k_filtered
    fuzz_cases::join_fuzz::test_right_semi_join_1k_filtered

@tglanz tglanz force-pushed the fix/smj-take-correct-columns branch from dba2a02 to 0c457ba Compare November 18, 2025 16:56
Copy link
Member

@rluvaton rluvaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tglanz LGTM.

Hope to see you contributing again soon

@rluvaton rluvaton added this pull request to the merge queue Nov 18, 2025
Merged via the queue into apache:main with commit 984d210 Nov 18, 2025
34 checks passed
@tglanz tglanz deleted the fix/smj-take-correct-columns branch November 19, 2025 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Panic in Sort Merge Equijoin when tables have different columns count

2 participants