-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34076][SQL] SQLContext.dropTempTable fails if cache is non-empty #31136
Conversation
Test build #133946 has finished for PR 31136 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sql(s"CREATE TEMPORARY VIEW $v AS SELECT key FROM src LIMIT 10")
sql(s"CREATE TABLE $t AS SELECT * FROM src")
sql(s"CACHE TABLE $t")
dropTempTable(v)
Is the table $t
related to the temp view $v
? Why caching the table $t
affects the temp view $v
?
This is just to make sure the cache is non-empty so when dropping the view comparison will be done. The issue occurs during the comparison. |
sparkSession.sharedState.cacheManager.uncacheQuery( | ||
sparkSession, viewDef, cascade = false) | ||
sparkSession.table(viewName), cascade = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we add an assert into uncacheQuery
to make sure all passed in logical plans are analyzed? e.g.,
def uncacheQuery(
spark: SparkSession,
plan: LogicalPlan,
cascade: Boolean,
blocking: Boolean = false): Unit = {
assert(plan.analyzed, "the plan input to `uncacheQuery` should be analyzed")
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not totally sure about this. It seems common that a temporary view is dropped after its dependent view/table is already gone, as demonstrated by the test failures, so this assertion will fail in those cases.
Another approach is to ignore the possible non-fatal errors during query analysis and uncache, similar to what's done in DropTableCommand
.
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #133980 has finished for PR 31136 at commit
|
sparkSession.sharedState.cacheManager.uncacheQuery( | ||
sparkSession, plan.analyzed, cascade = false) | ||
} catch { | ||
case NonFatal(_) => // ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is table or view not found error, does it mean we skip uncaching the temp view?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question. I think currently this won't happen because:
- when dropping a table or permanent view, we'll drop all the caches with reference on it in a cascading fashion, so when we are dropping the caches themselves, they are already invalidated.
- On the other hand, we currently store temp view as analyzed logical plans so they won't be analyzed again upon retrieving, which means we won't run into the error you mentioned. Although, this also means the plans themselves could become stale and potentially generate incorrect result. [SPARK-34052][SQL] store SQL text for a temp view created using "CACHE TABLE .. AS SELECT" #31107 proposes to change this, following similar changes done by [SPARK-33142][SPARK-33647][SQL] Store SQL text for SQL temp view #30567, so the behavior of temporary view as well as cache is more aligned to that of permanent view.
cc @cloud-fan and @MaxGekk too FYI |
sparkSession.sharedState.cacheManager.uncacheQuery( | ||
sparkSession, plan.analyzed, cascade = false) | ||
} catch { | ||
case NonFatal(_) => // ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it safer to not add this try-catch? if there are failures, then the uncache fails and we should not swallow the exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also discussed above. Suppose we have:
CREATE TEMPORARY VIEW v1 AS SELECT * FROM v2
a Spark user can drop v2 first followed by v1. In this case the uncacheQuery
will fail because v2
is already gone. Consequently, the view will not be dropped. This seems to be a quite common scenario.
Also this is following the DropTableCommand
, which swallows any non-fatal exception and proceed to drop the view.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanation!
thanks, merging to master/3.1! |
### What changes were proposed in this pull request? This changes `CatalogImpl.dropTempView` and `CatalogImpl.dropGlobalTempView` use analyzed logical plan instead of `viewDef` which is unresolved. ### Why are the changes needed? Currently, `CatalogImpl.dropTempView` is implemented as following: ```scala override def dropTempView(viewName: String): Boolean = { sparkSession.sessionState.catalog.getTempView(viewName).exists { viewDef => sparkSession.sharedState.cacheManager.uncacheQuery( sparkSession, viewDef, cascade = false) sessionCatalog.dropTempView(viewName) } } ``` Here, the logical plan `viewDef` is not resolved, and when passing to `uncacheQuery`, it could fail at `sameResult` call, where canonicalized plan is compared. The error message looks like: ``` Invalid call to qualifier on unresolved object, tree: 'key ``` This can be reproduced via: ```scala sql(s"CREATE TEMPORARY VIEW $v AS SELECT key FROM src LIMIT 10") sql(s"CREATE TABLE $t AS SELECT * FROM src") sql(s"CACHE TABLE $t") dropTempTable(v) ``` ### Does this PR introduce _any_ user-facing change? The only user-facing change is that, previously `SQLContext.dropTempTable` may fail in the above scenario but will work with this fix. ### How was this patch tested? Added new unit tests. Closes #31136 from sunchao/SPARK-34076. Authored-by: Chao Sun <sunchao@apple.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 62d82b5) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
lgtm too. |
What changes were proposed in this pull request?
This changes
CatalogImpl.dropTempView
andCatalogImpl.dropGlobalTempView
use analyzed logical plan instead ofviewDef
which is unresolved.Why are the changes needed?
Currently,
CatalogImpl.dropTempView
is implemented as following:Here, the logical plan
viewDef
is not resolved, and when passing touncacheQuery
, it could fail atsameResult
call, where canonicalized plan is compared. The error message looks like:This can be reproduced via:
Does this PR introduce any user-facing change?
The only user-facing change is that, previously
SQLContext.dropTempTable
may fail in the above scenario but will work with this fix.How was this patch tested?
Added new unit tests.