Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables #30403
[SPARK-33448][SQL] Support CACHE/UNCACHE TABLE commands for v2 tables #30403
Changes from 9 commits
3e532c8
f4ee301
a0687b3
f36bc59
b3fe647
4b2fba0
f232eba
8c0140c
9085189
0bdfcee
fc8a913
7ee6eb0
a5923ab
f22159c
c0e4f3e
47dc974
5e7227b
20b2474
b33d807
4c2d5e2
3c4a0cf
7f5a0b2
d0f49ef
ed1a6db
4e0e82f
911927d
7e788ce
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After more thought, I think CACHE TABLE is not a DDL command that needs to interact with catalogs, and it doesn't need a v2 version.
The current problem is that
CacheTableCommand
only takes v1 table identifier and can't cache v2 tables with n part name. Maybe we can fixCacheTableCommand
directly?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated
CacheTableCommand
anduncacheTable
to support multiparts name (and not resolving the identifier). Please check what you think about the new approach. Thanks.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, since it's not resolving to catalogs, we should move it out of
ResolveSessionCatalog
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the behavior of it if the temp view already exists? overwrite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would fail with:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the next thing we can do is to refactor it using the v2 framework (not adding a v2 version). The benefits are: 1. moving the logical plan to catalyst. 2. resolve the table in the analyzer. e.g.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue I am encountering by moving to the v2 framework (for v2 tables) is the following.
When
CACHE TABLE testcat.tbl
is run,tbl
is changed fromDataSourceV2Relation
toDataSourceV2ScanRelation
inV2ScanRelationPushDown
rule, now that the plan goes thru analyzer, optimizer, etc. But, if I runspark.table("testcat.tbl")
, the query execution hastbl
asDataSourceV2Relation
, thus cache is not applied.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, one solution is to follow
InsertIntoStatement
and do not make thetable
as a child. Then we resolveUnresolvedRelation
insideCacheTable
manually inResolveTempViews
and other resolution rules.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we include the catalog name in
tableName
? otherwise different catalogs may have tables with the same name?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tableName
is used only for display purpose (e.g.,InMemoryTableScanExec
). ThecachedData
is matched by the logical plan, so I think the current approach is OK.