[WIP] Core: Fix drop table without purge for hadoop catalog#5283
[WIP] Core: Fix drop table without purge for hadoop catalog#5283ajantha-bhat wants to merge 1 commit intoapache:masterfrom
Conversation
|
The problem with the test cases is that by default, spark is calling "DROP TABLE" SQL, which doesn't purge the data. Now that hadoop catalog, I am supporting purge = false. So, testcase will not clean the data and getting table exists error. Also, even without the version-hint files, hadoop catalog prepares version by reading the metadata file name. Probably I need to modify test cases to use "DROP TABLE PURGE" SQL or stop deriving the version info when the version file doesn't exist( but not sure about the impact) |
| CatalogUtil.dropTableData(ops.io(), lastMetadata); | ||
| return fs.delete(tablePath, true /* recursive */); | ||
| } else { | ||
| // just drop the version-hint.txt file |
There was a problem hiding this comment.
The version hint is a hint. If any metadata file exists, then the table will still exist.
I think that the original version is correct. For Hadoop tables, dropping a table means deleting its directory. The confusion here is one reason why Hadoop tables are not recommended for production use.
There was a problem hiding this comment.
@rdblue: yeah. That's exactly why test cases are failing.
Also, it is odd to me that HadoopCatalog extends BaseMetastoreCatalog (as it should not be a metastore catalog. should be file system catalog).
I will mostly close this PR.
We are working on a catalog migration API (any catalog to any catalog and of course we will contribute it here). API is simple. But adding testcase with cross catalog is more work.
While adding API, after catalog migration, we call drop table with purge = false on source catalog. At that time Hadoop catalog was cleaning the migrated table's data too. Hence opened this PR.
When
dropTable()is called withpurge=falseit isdeleting the data and metadata files.The expected behaviour for
purge = falseis that it should not clean the data files and metadata files. Only catalog's entries should be deleted.