[SPARK-15864] [SQL] Fix Inconsistent Behaviors when Uncaching Non-cached Tables#13593
[SPARK-15864] [SQL] Fix Inconsistent Behaviors when Uncaching Non-cached Tables#13593gatorsmile wants to merge 83 commits intoapache:masterfrom
Conversation
|
Test build #60278 has finished for PR 13593 at commit
|
|
cc @rxin @hvanhovell @liancheng This is another issue related to Thanks! |
|
From the perspective of better consistency, I'd prefer the current fix in the PR plus a new @rxin What do you think? (BTW, it's "Dataset" rather than "DataSet".) |
|
We do have a private method |
|
Hmmm maybe just throw an exception if the table does not exist, but no-op if the table is already uncached. I don't think we should change the behavior here. |
|
@rxin @liancheng I see. Since the existing Dataset API Will follow what @rxin said. No-op if the table is already uncached. |
|
@liancheng
I plan to unregister the accumulators in both APIs. Does that make sense? Thanks! |
# Conflicts: # sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
| */ | ||
| override def uncacheTable(tableName: String): Unit = { | ||
| sparkSession.sharedState.cacheManager.uncacheQuery(sparkSession.table(tableName)) | ||
| sparkSession.sharedState.cacheManager.tryUncacheQuery(query = sparkSession.table(tableName)) |
There was a problem hiding this comment.
After this change, nobody is calling sparkSession.sharedState.cacheManager.uncacheQuery. Should we remove this API?
There was a problem hiding this comment.
yea, let's remove it and rename tryUncacheQuery to uncacheQuery, and document it that it's a noop if table is already uncached
There was a problem hiding this comment.
Sure, will do it. Thanks!
|
Test build #60340 has finished for PR 13593 at commit
|
|
hi @gatorsmile , do you wanna update it? now both |
# Conflicts: # sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
|
@cloud-fan Thank you for letting me know it. Just updated the PR. Please let me know if anything needs a change. Thanks! |
|
Test build #60467 has finished for PR 13593 at commit
|
|
Test build #60474 has finished for PR 13593 at commit
|
|
LGTM, can you update the PR description? cc @rxin for final sing-off as this is kind of a behaviour change. |
|
Behavior change LGTM. @gatorsmile can you update the pr description? It is outdated. |
|
@cloud-fan @rxin @liancheng Thank you for your reviews! The PR description is updated. Let me know if any change is needed. Thanks! |
typo? |
|
@cloud-fan Sorry, please refresh the browser. Just finished the changes. |
|
thanks, merging to master/2.0! |
…ed Tables
#### What changes were proposed in this pull request?
To uncache a table, we have three different ways:
- _SQL interface_: `UNCACHE TABLE`
- _DataSet API_: `sparkSession.catalog.uncacheTable`
- _DataSet API_: `sparkSession.table(tableName).unpersist()`
When the table is not cached,
- _SQL interface_: `UNCACHE TABLE non-cachedTable` -> **no error message**
- _Dataset API_: `sparkSession.catalog.uncacheTable("non-cachedTable")` -> **report a strange error message:**
```requirement failed: Table [a: int] is not cached```
- _Dataset API_: `sparkSession.table("non-cachedTable").unpersist()` -> **no error message**
This PR will make them consistent. No operation if the table has already been uncached.
In addition, this PR also removes `uncacheQuery` and renames `tryUncacheQuery` to `uncacheQuery`, and documents it that it's noop if the table has already been uncached
#### How was this patch tested?
Improved the existing test case for verifying the cases when the table has not been cached.
Also added test cases for verifying the cases when the table does not exist
Author: gatorsmile <gatorsmile@gmail.com>
Author: xiaoli <lixiao1983@gmail.com>
Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local>
Closes #13593 from gatorsmile/uncacheNonCachedTable.
(cherry picked from commit df4ea66)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
To uncache a table, we have three different ways:
UNCACHE TABLEsparkSession.catalog.uncacheTablesparkSession.table(tableName).unpersist()When the table is not cached,
UNCACHE TABLE non-cachedTable-> no error messagesparkSession.catalog.uncacheTable("non-cachedTable")-> report a strange error message:requirement failed: Table [a: int] is not cachedsparkSession.table("non-cachedTable").unpersist()-> no error messageThis PR will make them consistent. No operation if the table has already been uncached.
In addition, this PR also removes
uncacheQueryand renamestryUncacheQuerytouncacheQuery, and documents it that it's noop if the table has already been uncachedHow was this patch tested?
Improved the existing test case for verifying the cases when the table has not been cached.
Also added test cases for verifying the cases when the table does not exist