Skip to content

Comments

[SPARK-15864] [SQL] Fix Inconsistent Behaviors when Uncaching Non-cached Tables#13593

Closed
gatorsmile wants to merge 83 commits intoapache:masterfrom
gatorsmile:uncacheNonCachedTable
Closed

[SPARK-15864] [SQL] Fix Inconsistent Behaviors when Uncaching Non-cached Tables#13593
gatorsmile wants to merge 83 commits intoapache:masterfrom
gatorsmile:uncacheNonCachedTable

Conversation

@gatorsmile
Copy link
Member

@gatorsmile gatorsmile commented Jun 10, 2016

What changes were proposed in this pull request?

To uncache a table, we have three different ways:

  • SQL interface: UNCACHE TABLE
  • DataSet API: sparkSession.catalog.uncacheTable
  • DataSet API: sparkSession.table(tableName).unpersist()

When the table is not cached,

  • SQL interface: UNCACHE TABLE non-cachedTable -> no error message
  • Dataset API: sparkSession.catalog.uncacheTable("non-cachedTable") -> report a strange error message:
    requirement failed: Table [a: int] is not cached
  • Dataset API: sparkSession.table("non-cachedTable").unpersist() -> no error message

This PR will make them consistent. No operation if the table has already been uncached.

In addition, this PR also removes uncacheQuery and renames tryUncacheQuery to uncacheQuery, and documents it that it's noop if the table has already been uncached

How was this patch tested?

Improved the existing test case for verifying the cases when the table has not been cached.
Also added test cases for verifying the cases when the table does not exist

gatorsmile and others added 30 commits November 13, 2015 14:50
@SparkQA
Copy link

SparkQA commented Jun 10, 2016

Test build #60278 has finished for PR 13593 at commit f4b334e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @rxin @hvanhovell @liancheng This is another issue related to Cache and Uncache. Actually, I am not sure if we should provide a SQL interface UNCACHE TABLE IF EXISTS.

Thanks!

@liancheng
Copy link
Contributor

From the perspective of better consistency, I'd prefer the current fix in the PR plus a new UNCACHE TABLE ... IF EXISTS syntax. However, this PR does introduce behavior breaking change.

@rxin What do you think?

(BTW, it's "Dataset" rather than "DataSet".)

@liancheng
Copy link
Contributor

We do have a private method tryUncacheQuery in CacheManager, which exposes exactly the same semantics as UNCACHE TABLE ... IF EXISTS. But Catalog doesn't have an equivalent public method.

@rxin
Copy link
Contributor

rxin commented Jun 10, 2016

Hmmm maybe just throw an exception if the table does not exist, but no-op if the table is already uncached. I don't think we should change the behavior here.

@gatorsmile
Copy link
Member Author

@rxin @liancheng I see. Since the existing Dataset API sparkSession.catalog.uncacheTable("non-cachedTable") issues an error if uncaching non-cached tables. Thus, to ensure both SQL statements and Dataset APIs have the same behavior. We still need to change one of them, right?

Will follow what @rxin said. No-op if the table is already uncached.

@gatorsmile
Copy link
Member Author

@liancheng tryUncacheQuery and uncacheQuery are different. tryUncacheQuery does not unregister the accumulators, but uncacheQuery does it. This difference is also confusing.

uncacheQuery:

cachedData(dataIndex).cachedRepresentation.uncache(blocking)

tryUncacheQuery:

cachedData(dataIndex).cachedRepresentation.cachedColumnBuffers.unpersist(blocking)

I plan to unregister the accumulators in both APIs. Does that make sense? Thanks!

# Conflicts:
#	sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
*/
override def uncacheTable(tableName: String): Unit = {
sparkSession.sharedState.cacheManager.uncacheQuery(sparkSession.table(tableName))
sparkSession.sharedState.cacheManager.tryUncacheQuery(query = sparkSession.table(tableName))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this change, nobody is calling sparkSession.sharedState.cacheManager.uncacheQuery. Should we remove this API?

Copy link
Contributor

@cloud-fan cloud-fan Jun 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, let's remove it and rename tryUncacheQuery to uncacheQuery, and document it that it's a noop if table is already uncached

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do it. Thanks!

@SparkQA
Copy link

SparkQA commented Jun 11, 2016

Test build #60340 has finished for PR 13593 at commit 17a0885.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

hi @gatorsmile , do you wanna update it? now both tryUncacheQuery and uncacheQuery won't unregister accumulator

# Conflicts:
#	sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
@gatorsmile
Copy link
Member Author

@cloud-fan Thank you for letting me know it. Just updated the PR. Please let me know if anything needs a change. Thanks!

@SparkQA
Copy link

SparkQA commented Jun 14, 2016

Test build #60467 has finished for PR 13593 at commit ef3572c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 14, 2016

Test build #60474 has finished for PR 13593 at commit 6999bd2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM, can you update the PR description?

cc @rxin for final sing-off as this is kind of a behaviour change.

@rxin
Copy link
Contributor

rxin commented Jun 14, 2016

Behavior change LGTM.

@gatorsmile can you update the pr description? It is outdated.

@gatorsmile
Copy link
Member Author

@cloud-fan @rxin @liancheng Thank you for your reviews!

The PR description is updated. Let me know if any change is needed. Thanks!

@cloud-fan
Copy link
Contributor

cloud-fan commented Jun 14, 2016

Dataset API: sparkSession.catalog.uncacheTable("non-cachedTable") -> report a strange error message: requirement failed: Table [a: int] is not cached
Dataset API: sparkSession.catalog.uncacheTable("non-cachedTable") -> no error message

typo?

@gatorsmile
Copy link
Member Author

gatorsmile commented Jun 14, 2016

@cloud-fan Sorry, please refresh the browser. Just finished the changes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/2.0!

@asfgit asfgit closed this in df4ea66 Jun 14, 2016
asfgit pushed a commit that referenced this pull request Jun 14, 2016
…ed Tables

#### What changes were proposed in this pull request?
To uncache a table, we have three different ways:
- _SQL interface_: `UNCACHE TABLE`
- _DataSet API_: `sparkSession.catalog.uncacheTable`
- _DataSet API_: `sparkSession.table(tableName).unpersist()`

When the table is not cached,
- _SQL interface_: `UNCACHE TABLE non-cachedTable` -> **no error message**
- _Dataset API_: `sparkSession.catalog.uncacheTable("non-cachedTable")` -> **report a strange error message:**
```requirement failed: Table [a: int] is not cached```
- _Dataset API_: `sparkSession.table("non-cachedTable").unpersist()` -> **no error message**

This PR will make them consistent. No operation if the table has already been uncached.

In addition, this PR also removes `uncacheQuery` and renames `tryUncacheQuery` to `uncacheQuery`, and documents it that it's noop if the table has already been uncached

#### How was this patch tested?
Improved the existing test case for verifying the cases when the table has not been cached.
Also added test cases for verifying the cases when the table does not exist

Author: gatorsmile <gatorsmile@gmail.com>
Author: xiaoli <lixiao1983@gmail.com>
Author: Xiao Li <xiaoli@Xiaos-MacBook-Pro.local>

Closes #13593 from gatorsmile/uncacheNonCachedTable.

(cherry picked from commit df4ea66)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants