New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34197][SQL] SessionCatalog.refreshTable()
should not invalidate the relation cache for temporary views
#31265
Conversation
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #134292 has finished for PR 31265 at commit
|
SessionCatalog.refreshTable()
should not invalidate the relation cache for temporary views
@cloud-fan @dongjoon-hyun @HyukjinKwon @sunchao Could you take a look at the small fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @MaxGekk for pinging. Overall looks good to me. BTW do you know whether the refreshTable
method is used in places where there could be both temp view and table with the same name and we explicitly do not want to refresh the temp view but just the table relation cache?
@@ -994,15 +994,17 @@ class SessionCatalog( | |||
// Go through temporary views and invalidate them. | |||
// If the database is defined, this may be a global temporary view. | |||
// If the database is not defined, there is a good chance this is a temp view. | |||
if (name.database.isEmpty) { | |||
tempViews.get(tableName).foreach(_.refresh()) | |||
val isTempView = if (name.database.isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the code structure here looks pretty similar to isTemporaryTable
, and it would be nice if we can do:
lookupTemporaryTable(name).map(_.refresh()).getOrElse {
// Also invalidate the table relation cache.
val qualifiedTableName = QualifiedTableName(dbName, tableName)
tableRelationCache.invalidate(qualifiedTableName)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that.
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #134342 has finished for PR 31265 at commit
|
Theoretically, I could imagine the situation when an user run any v1 command for a table, for instance case class AlterTableDropPartitionCommand(
tableName: TableIdentifier,
...
override def run(sparkSession: SparkSession): Seq[Row] = {
val table = catalog.getTableMetadata(tableName)
...
// USER CREATES A TEMP VIEW WITH THE SAME NAME
...
sparkSession.catalog.refreshTable(table.identifier.quotedString) And at the end of |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #134360 has finished for PR 31265 at commit
|
@@ -877,21 +877,27 @@ class SessionCatalog( | |||
isTemporaryTable(nameParts.asTableIdentifier) | |||
} | |||
|
|||
private def lookupTempView(name: TableIdentifier): Option[LogicalPlan] = { | |||
val table = formatTableName(name.table) | |||
val dbName = formatDatabaseName(name.database.getOrElse(currentDb)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to check the current database, as we can't set global_tmp as the current database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous code didn't check the current database as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan This is "previous" code:
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
Line 998 in e79c1cd
val dbName = formatDatabaseName(name.database.getOrElse(currentDb)) |
or you are speaking about another previous code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, do you propose to call database.get
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, let's follow that "previous" code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that. ... The function name isTemporaryTable
looks weird, so, I would replace it by:
def isTempView(name: TableIdentifier): Boolean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the PR #31295 which renames isTemporaryTable()
.
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #134374 has finished for PR 31265 at commit
|
…-refresh-table # Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
Test build #134402 has finished for PR 31265 at commit
|
Any objections for the changes? |
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
Show resolved
Hide resolved
thanks, merging to master! |
…ate the relation cache for temporary views ### What changes were proposed in this pull request? Check the name passed to `SessionCatalog.refreshTable`, and if it belongs to a temporary view, do not invalidate the relation cache. ### Why are the changes needed? When `SessionCatalog.refreshTable` refreshes a temporary or global temporary view, it should not invalidate an entry in the relation cache associated to a table with the same name. ### Does this PR introduce _any_ user-facing change? Should not. The change might improve performance slightly. ### How was this patch tested? By running new UT: ``` $ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *SessionCatalogSuite" ``` Closes apache#31265 from MaxGekk/fix-session-catalog-refresh-table. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Check the name passed to
SessionCatalog.refreshTable
, and if it belongs to a temporary view, do not invalidate the relation cache.Why are the changes needed?
When
SessionCatalog.refreshTable
refreshes a temporary or global temporary view, it should not invalidate an entry in the relation cache associated to a table with the same name.Does this PR introduce any user-facing change?
Should not. The change might improve performance slightly.
How was this patch tested?
By running new UT: