Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable co-located and single-repartition joins for append tables #5359

Merged
merged 5 commits into from
Oct 18, 2021

Conversation

marcocitus
Copy link
Member

Append-distributed tables have ad-hoc logic for when co-located or single-repartition (towards append) joins are allowed. For instance, the planner does not check whether shards are actually in the same place when planning a co-located join. Moreover, shards in append-distributed tables often have NULL shardminvalue/shardmaxvalue (that's how master_create_empty_shard creates them), in which case neither of those joins work.

This PR disables co-located and single-repartition joins for append-distributed tables to simplify the planner.

Our range-distributed users manually set the colocationid to mark tables as co-located because CoPartitionedTables has too much overhead otherwise, so this does not break co-located joins for range-distributed tables.

@codecov
Copy link

codecov bot commented Oct 8, 2021

Codecov Report

Merging #5359 (bece86b) into master (6ff2083) will decrease coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #5359      +/-   ##
==========================================
- Coverage   94.27%   94.23%   -0.05%     
==========================================
  Files         215      215              
  Lines       42941    42903      -38     
==========================================
- Hits        40484    40430      -54     
- Misses       2457     2473      +16     

Base automatically changed from marcocitus/remove-append-2 to master October 8, 2021 19:39
@marcocitus marcocitus force-pushed the marcocitus/remove-append-3 branch 4 times, most recently from 769a7fd to 06f42a6 Compare October 16, 2021 12:53
@@ -504,6 +528,12 @@ RestrictionEquivalenceForPartitionKeys(PlannerRestrictionContext *restrictionCon
/* there is a single distributed relation, no need to continue */
return true;
}
else if (ContextContainsAppendRelation(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'm not sure if this is a good idea or not, but I have an interesting observation here. If we move this check above ContainsMultipleDistributedRelations, I think we wouldn't have this problem. However, that might make the SQL support for append tables too limited -- but consistent.

The problem is that recursive planner now kicks in nicely when there are only a single distributed append table, and fails when there are multiple.

For example:

-- fails
select * from append_table_1 t1 where a IN (SELECT t2.a FROM append_table_1 t2);
ERROR:  complex joins are only supported when all distributed tables are co-located and joined on their distribution columns

-- works via non-colocated subquery joins
select * from append_table_1 t1 where a NOT IN (SELECT t2.a FROM append_table_1 t2);

Or,

-- works as there is one table in the join
WITH cte_1(a,value) AS (VALUES (1,1), (2,2))
SELECT * FROM cte_1 JOIN append_table_1 USING (a);

-- fails because there are multiple tables
WITH cte_1(a,value) AS (VALUES (1,1), (2,2))
SELECT * FROM cte_1 JOIN append_table_1 USING (a) JOIN append_table_2 USING (a);
ERROR:  complex joins are only supported when all distributed tables are co-located and joined on their distribution columns

We can come up with many more such cases. So, I guess we'd want to prevent this with the cost of limited SQL coverage, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, everything seems to all work correctly, but I don't really have time to write all the tests right now, so I think I'll move it up and address it later.

Copy link
Member Author

@marcocitus marcocitus Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed my mind. Even without the SQL coverage we would probably want a test file to see whether we exercise the errors correctly. I left subquery pushdown in place and added a test file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think allowing this make the user experience a little inconsistent.

But, still, there are already lots of inconsistencies with append tables anyway. And, keeping some SQL support could be valuable for some users. So, I don't have objections

@marcocitus marcocitus force-pushed the marcocitus/remove-append-3 branch 4 times, most recently from c9921b8 to f002dc5 Compare October 18, 2021 16:28
Copy link
Member

@onderkalaci onderkalaci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are good to merge now, thanks for adding the tests!

@@ -504,6 +528,12 @@ RestrictionEquivalenceForPartitionKeys(PlannerRestrictionContext *restrictionCon
/* there is a single distributed relation, no need to continue */
return true;
}
else if (ContextContainsAppendRelation(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think allowing this make the user experience a little inconsistent.

But, still, there are already lots of inconsistencies with append tables anyway. And, keeping some SQL support could be valuable for some users. So, I don't have objections

124 | hij
(3 rows)

SELECT key, row_number() OVER () FROM (SELECT key FROM append_table ORDER BY key) LIMIT 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax error

(3 rows)

-- try some joins in subqueries
SELECT key, count(*) FROM (SELECT * FROM append_table a JOIN append_table b USING (key)) u GROUP BY key ORDER BY 1,2 LIMIT 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be pulled up. So, maybe add random()

And, is it intentional to disable repartitioning here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it shows that there is no attempt to push down

@@ -234,6 +234,12 @@ TargetListOnPartitionColumn(Query *query, List *targetEntryList)
continue;
}

/* append-distributed tables do not have a strict partition column */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting as a reference, having this also prevents DISTINCT/WiindowFunction pushdowns, which is what we want

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants