[SPARK-29439][SQL] DDL commands should not use DataSourceV2Relation#26091
[SPARK-29439][SQL] DDL commands should not use DataSourceV2Relation#26091cloud-fan wants to merge 1 commit intoapache:masterfrom
Conversation
I disagree. It is a relation. It is converted to a scan node when we convert to a physical plan. Early pushdown does introduce some risk. I don't think there is a case where early pushdown is introducing errors with DELETE FROM, is there? If so, we should consider how to address that, but I doubt it is by creating another nearly identical relation node. This also creates two different ways that tables are resolved. One for DDL statements and one for other statements. That's adding unnecessary complexity that will hurt later when we need to decide which resolution path to use. If you want to fix possible problems with early pushdown, then let's find out what those problems are and address them directly. I don't think there is much utility to making a change like this. It just introduces needless churn. I'm -1. |
|
I think the solution to the problem is to change the way The same problem exists for write plans: the table that will be written to is resolved to a relation. That relation should not be replaced or modified by a rule like early push-down, so the relation is not added as a child of the plan. If it is not a child, then rules are not automatically run on it. We can fix this problem the same way. Since these problems would be introduced by the @cloud-fan, does that sound like a good solution? |
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
I like that @rdblue. Thank you both!
…On Fri, Oct 11, 2019, 2:55 PM UCB AMPLab ***@***.***> wrote:
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/16913/
Test PASSed.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26091?email_source=notifications&email_token=ABIAE635IDIJ2CJQB76DOJTQODY5PA5CNFSM4I7ZZGFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBBJ3DY#issuecomment-541236623>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABIAE67ZJZ6Q4AICRCKKKKLQODY5PANCNFSM4I7ZZGFA>
.
|
|
@rdblue yea that works too. I'm closing it and will wait for your early pushdown PR. |
What changes were proposed in this pull request?
Create a new node
ResolvedV2Table, and resolveUnresolvedV2Relationto this new node instead ofDataSourceV2Relation.Why are the changes needed?
DataSourceV2Relationis a scan node. It should be used when we want to scan a v2 table. However, the DDL commands do not need to scan a v2 table, they just need a node to hold the v2 table. It's possible that there are rules trying to matchDataSourceV2Relationand convert it to something else (e.g. a newDataSourceV2ScanRelation), for better data scan. Unfortunately doing this will break these DDL commands. It's better to have a separated node to hold the v2 table for DDL commands.Does this PR introduce any user-facing change?
no
How was this patch tested?
existing tests