Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-28351][SQL] Support DELETE in DataSource V2 #25115
[SPARK-28351][SQL] Support DELETE in DataSource V2 #25115
Changes from all commits
bc06ef6
5c2590c
634f7c7
254c2cf
ba5555c
d30969b
06b12be
5271377
ce751c5
625e154
7e7ddf4
0885490
21b02ea
e68fba2
792c36b
bbf5156
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this always throws
AnalysisException
, I think this case should be removed. Instead, the next case should match and theV2SessionCatalog
should be used. If the table loaded by the v2 session catalog doesn't support delete, then conversion to physical plan will fail whenasDeletable
is called.Then users can still call v2 deletes for formats like
parquet
that have a v2 implementation that will work.FYI @brkyvz.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rdblue . Removed this case and fallback to
sessionCatalog
whenresolveTables
forDeleteFromTable
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worse to move this case from here to https://github.com/apache/spark/pull/25115/files#diff-57b3d87be744b7d79a9beacf8e5e5eb2R657 .
If we can't merge these 2 cases into one here, let's keep it as it was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, one purpose of removing the first case is we can execute delete on
parquet
format via this API (if we implement it later) as @rdblue mentioned. The key point here is we resolve the table useV2SessionCatalog
as the fallback catalog. The originalresolveTable
doesn't give any fallback-to-sessionCatalog mechanism (if no catalog found, it will fallback toresolveRelation
). So maybe we can modifyresolveTable
and let it treatV2SessionCatalog
as a try option:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to update
ResolveTables
, though I do see that it would be nice to useResolveTables
as the only rule that resolvesUnresolvedRelation
for v2 tables.There is already another rule that loads tables from a catalog,
ResolveInsertInto
.I considered updating that rule and moving the table resolution part into
ResolveTables
as well, but I think it is a little cleaner to resolve the table when converting the statement (inDataSourceResolution
), as @cloud-fan is suggesting.One of the reasons to do this for the insert plans is that those plans don't include the target relation as a child. Instead, those plans have the data to insert as a child node, which means that the unresolved relation won't be visible to the
ResolveTables
rule.Taking the same approach in this PR would also make this a little cleaner. If
DeleteFrom
didn't expose the relation as a child, it could be aUnaryNode
and you wouldn't need to update some of the other rules to explicitly includeDeleteFrom
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I rolled back the resolve rules for
DeleteFromTable
as it was as @cloud-fan suggested. For cases that like deleting from formats orV2SessionCatalog
support, let's open another pr. And another pr for resolve rules is also need because I found other issues related with that. Does this sounds reasonable?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this case after #25402, which updates
ResolveTable
to fallback to v2 session catalog.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saw the code in #25402 . I think it's the best choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also test correlated subquery?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that necessary to test correlated subquery? Because correlated subquery is a subset of subquery and we forbid subquery here, then correlated subquery is also forbidden.
My thought is later I want to add pre-execution subquery for DELETE, but correlated subquery is still forbidden, so we can modify the test cases at that time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds reasonable to me.