[HUDI-6866]When invalidate the table in the spark sql query cache, verify if the…#9425
[HUDI-6866]When invalidate the table in the spark sql query cache, verify if the…#9425zhangyue19921010 merged 1 commit intoapache:masterfrom empcl:master_hivesync_db_exists
Conversation
… hive-async database exists
|
@hudi-bot run azure |
| val qualifiedTableName = String.join(".", hoodieConfig.getStringOrDefault(HIVE_DATABASE), name) | ||
| if (spark.catalog.tableExists(qualifiedTableName)) { | ||
| val syncDb = hoodieConfig.getStringOrDefault(HIVE_DATABASE) | ||
| val qualifiedTableName = String.join(".", syncDb, name) |
There was a problem hiding this comment.
Reasonable, should we also take the default database name into consideration?
There was a problem hiding this comment.
Hello, let me first talk about the current background. When I use the spark sync hive function, if the spark version is 3.1 or below, there is a problem with the database when performing the validate sync hive table operation. After reviewing the source code, it was found that Spark 3.1 tableExists needs to verify whether the database exists.
protected def requireDbExists(db: String): Unit = { if (!databaseExists(db)) { throw new NoSuchDatabaseException(db) } }
| if (spark.catalog.tableExists(qualifiedTableName)) { | ||
| val syncDb = hoodieConfig.getStringOrDefault(HIVE_DATABASE) | ||
| val qualifiedTableName = String.join(".", syncDb, name) | ||
| if (spark.catalog.databaseExists(syncDb) && spark.catalog.tableExists(qualifiedTableName)) { |
There was a problem hiding this comment.
spark.catalog.tableExists(qualifiedTableName) will contain dbName to check table, why need check db before?
There was a problem hiding this comment.
Hello, in Spark 3.1 and earlier versions, when detecting the existence of a table, it is mandatory for the database to exist. However, here, the database is not registered in the catalog in advance
There was a problem hiding this comment.
on, correct me if I'm wrong, it's mean if I use default as dbName to check, and it throw error, so check db first is reasonable. But I have a question, in which scenario is this parameter META_SYNC_DATABASE_NAME not set, I think we need fix it
There was a problem hiding this comment.
Okay, I think what you're saying makes sense. Let me see where to complete the registration of databases and tables
There was a problem hiding this comment.
@danny0405 @KnightChess Hello, the invalidate table operation should only be executed when enableHiveSupport() is enabled, but sometimes we do not need to enable enableHiveSupport(), such as for testing purposes.
|
@yihua Hello, do you have time to help take a look at this PR? thanks |
zhangyue19921010
left a comment
There was a problem hiding this comment.
LGTM.Nice catch
… hive-async database exists (#9425) Co-authored-by: chenlei677 <chenlei677@jd.com>
… hive-async database exists
Change Logs
When invalidate the table in the spark sql query cache, verify if the hive-async database exists
Impact
When invalidate the table in the spark sql query cache, verify if the hive-async database exists
Risk level (write none, low medium or high below)
none
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist