Disable co-located and single-repartition joins for append tables #5359

marcocitus · 2021-10-08T11:55:22Z

Append-distributed tables have ad-hoc logic for when co-located or single-repartition (towards append) joins are allowed. For instance, the planner does not check whether shards are actually in the same place when planning a co-located join. Moreover, shards in append-distributed tables often have NULL shardminvalue/shardmaxvalue (that's how master_create_empty_shard creates them), in which case neither of those joins work.

This PR disables co-located and single-repartition joins for append-distributed tables to simplify the planner.

Our range-distributed users manually set the colocationid to mark tables as co-located because CoPartitionedTables has too much overhead otherwise, so this does not break co-located joins for range-distributed tables.

codecov · 2021-10-08T11:55:44Z

Codecov Report

Merging #5359 (bece86b) into master (6ff2083) will decrease coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #5359      +/-   ##
==========================================
- Coverage   94.27%   94.23%   -0.05%     
==========================================
  Files         215      215              
  Lines       42941    42903      -38     
==========================================
- Hits        40484    40430      -54     
- Misses       2457     2473      +16

src/backend/distributed/planner/multi_physical_planner.c

onderkalaci · 2021-10-18T11:34:13Z

src/backend/distributed/planner/relation_restriction_equivalence.c

@@ -504,6 +528,12 @@ RestrictionEquivalenceForPartitionKeys(PlannerRestrictionContext *restrictionCon
 		/* there is a single distributed relation, no need to continue */
 		return true;
 	}
+	else if (ContextContainsAppendRelation(


Well, I'm not sure if this is a good idea or not, but I have an interesting observation here. If we move this check above ContainsMultipleDistributedRelations, I think we wouldn't have this problem. However, that might make the SQL support for append tables too limited -- but consistent.

The problem is that recursive planner now kicks in nicely when there are only a single distributed append table, and fails when there are multiple.

For example:

-- fails select * from append_table_1 t1 where a IN (SELECT t2.a FROM append_table_1 t2); ERROR: complex joins are only supported when all distributed tables are co-located and joined on their distribution columns -- works via non-colocated subquery joins select * from append_table_1 t1 where a NOT IN (SELECT t2.a FROM append_table_1 t2);

Or,

-- works as there is one table in the join WITH cte_1(a,value) AS (VALUES (1,1), (2,2)) SELECT * FROM cte_1 JOIN append_table_1 USING (a); -- fails because there are multiple tables WITH cte_1(a,value) AS (VALUES (1,1), (2,2)) SELECT * FROM cte_1 JOIN append_table_1 USING (a) JOIN append_table_2 USING (a); ERROR: complex joins are only supported when all distributed tables are co-located and joined on their distribution columns

We can come up with many more such cases. So, I guess we'd want to prevent this with the cost of limited SQL coverage, right?

Yes, everything seems to all work correctly, but I don't really have time to write all the tests right now, so I think I'll move it up and address it later.

I changed my mind. Even without the SQL coverage we would probably want a test file to see whether we exercise the errors correctly. I left subquery pushdown in place and added a test file.

I think allowing this make the user experience a little inconsistent.

But, still, there are already lots of inconsistencies with append tables anyway. And, keeping some SQL support could be valuable for some users. So, I don't have objections

src/backend/distributed/planner/relation_restriction_equivalence.c

onderkalaci

I think we are good to merge now, thanks for adding the tests!

onderkalaci · 2021-10-18T16:34:02Z

src/backend/distributed/planner/relation_restriction_equivalence.c

@@ -504,6 +528,12 @@ RestrictionEquivalenceForPartitionKeys(PlannerRestrictionContext *restrictionCon
 		/* there is a single distributed relation, no need to continue */
 		return true;
 	}
+	else if (ContextContainsAppendRelation(


I think allowing this make the user experience a little inconsistent.

But, still, there are already lots of inconsistencies with append tables anyway. And, keeping some SQL support could be valuable for some users. So, I don't have objections

onderkalaci · 2021-10-18T16:36:34Z

src/test/regress/expected/subquery_append.out

+ 124 | hij
+(3 rows)
+
+SELECT key, row_number() OVER () FROM (SELECT key FROM append_table ORDER BY key) LIMIT 3;


syntax error

onderkalaci · 2021-10-18T16:38:59Z

src/test/regress/expected/subquery_append.out

+(3 rows)
+
+-- try some joins in subqueries
+SELECT key, count(*) FROM (SELECT * FROM append_table a JOIN append_table b USING (key)) u GROUP BY key ORDER BY 1,2 LIMIT 3;


This seems to be pulled up. So, maybe add random()

And, is it intentional to disable repartitioning here?

yes, it shows that there is no attempt to push down

onderkalaci · 2021-10-18T16:39:38Z

src/backend/distributed/planner/multi_logical_planner.c

@@ -234,6 +234,12 @@ TargetListOnPartitionColumn(Query *query, List *targetEntryList)
 			continue;
 		}

+		/* append-distributed tables do not have a strict partition column */


Just noting as a reference, having this also prevents DISTINCT/WiindowFunction pushdowns, which is what we want

marcocitus force-pushed the marcocitus/remove-append-2 branch from 008e4ed to 386d256 Compare October 8, 2021 19:27

marcocitus mentioned this pull request Oct 8, 2021

Change append-distributed tables into write-anywhere tables #5358

Open

12 tasks

Base automatically changed from marcocitus/remove-append-2 to master October 8, 2021 19:39

marcocitus force-pushed the marcocitus/remove-append-3 branch from 6282d13 to 3bd5911 Compare October 9, 2021 17:09

marcocitus requested review from SaitTalhaNisanci and onderkalaci October 11, 2021 10:06

onderkalaci reviewed Oct 11, 2021

View reviewed changes

src/backend/distributed/planner/multi_physical_planner.c Show resolved Hide resolved

onderkalaci reviewed Oct 11, 2021

View reviewed changes

src/backend/distributed/planner/multi_physical_planner.c Show resolved Hide resolved

marcocitus force-pushed the marcocitus/remove-append-3 branch 4 times, most recently from 769a7fd to 06f42a6 Compare October 16, 2021 12:53

onderkalaci reviewed Oct 18, 2021

View reviewed changes

src/backend/distributed/planner/relation_restriction_equivalence.c Show resolved Hide resolved

marcocitus force-pushed the marcocitus/remove-append-3 branch 4 times, most recently from c9921b8 to f002dc5 Compare October 18, 2021 16:28

onderkalaci approved these changes Oct 18, 2021

View reviewed changes

marcocitus force-pushed the marcocitus/remove-append-3 branch from f002dc5 to eb157d3 Compare October 18, 2021 19:11

marcoslot added 5 commits October 18, 2021 21:11

Disable single-repartition joins for append tables

2206e64

Disable implicit single re-partition joins for append tables

dfad73d

Disable co-located joins for append-distributed tables

b97e508

Never allow co-located joins of append-distributed tables

93e79b9

Add some subquery on append-distributed table tests

bece86b

marcocitus force-pushed the marcocitus/remove-append-3 branch from eb157d3 to bece86b Compare October 18, 2021 19:11

marcocitus merged commit 9571311 into master Oct 18, 2021

marcocitus deleted the marcocitus/remove-append-3 branch October 18, 2021 20:07

marcocitus mentioned this pull request Nov 3, 2021

Colocated joins: Cleanup code paths that check if two tables are colocated #730

Closed

marcocitus mentioned this pull request Nov 10, 2021

Missing shard min-max values are not supported with single-partition join #2082

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable co-located and single-repartition joins for append tables #5359

Disable co-located and single-repartition joins for append tables #5359

marcocitus commented Oct 8, 2021

codecov bot commented Oct 8, 2021 •

edited

Loading

onderkalaci Oct 18, 2021

marcocitus Oct 18, 2021

marcocitus Oct 18, 2021 •

edited

Loading

onderkalaci Oct 18, 2021

onderkalaci left a comment

onderkalaci Oct 18, 2021

onderkalaci Oct 18, 2021

onderkalaci Oct 18, 2021

marcocitus Oct 18, 2021

onderkalaci Oct 18, 2021

Disable co-located and single-repartition joins for append tables #5359

Disable co-located and single-repartition joins for append tables #5359

Conversation

marcocitus commented Oct 8, 2021

codecov bot commented Oct 8, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcocitus Oct 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

onderkalaci left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Oct 8, 2021 •

edited

Loading

marcocitus Oct 18, 2021 •

edited

Loading