AWS: Handle S3 Table Bucket purge gracefully in GlueCatalog (#14449)#16073
AWS: Handle S3 Table Bucket purge gracefully in GlueCatalog (#14449)#16073yadavay-amzn wants to merge 1 commit into
Conversation
…4449) When calling GlueCatalog.dropTable() with purge=true on a table in an S3 Table Bucket, the purge fails because S3 Table Buckets do not allow manual file deletion. This change wraps CatalogUtil.dropTableData() in a try-catch so that purge failures are logged as warnings instead of propagating and failing the entire drop operation. Closes apache#14449
840236c to
e996c80
Compare
| LOG.info("Glue table {} data purged", identifier); | ||
| try { | ||
| CatalogUtil.dropTableData(ops.io(), lastMetadata); | ||
| LOG.info("Glue table {} data purged", identifier); |
There was a problem hiding this comment.
Is there any way to check whether the target table exists in S3 Table bucket?
There was a problem hiding this comment.
The location could be checked for S3 Table Bucket ARN patterns, but catching the exception is more robust as it handles any case where purge fails (permissions, bucket policies, etc.) without needing to enumerate all possible URI formats.
Looks like this also aligns with the Trino approach you linked!
Happy to add a URI check if you'd prefer a more targeted approach though.
There was a problem hiding this comment.
I was wondering if we could do both (S3 Table check + try-catch) to avoid redundant S3 requests and warning logs. I think we should keep the try-catch regardless of S3 Table because it may fail for other reasons.
The "Enumerate all possible URI formats" approach doesn't look straightforward. Only adding try-catch looks good to me.
| LOG.info("Glue table {} data purged", identifier); | ||
| } catch (Exception e) { | ||
| LOG.warn( | ||
| "Failed to purge data for table: {}, continuing drop without purge", identifier, e); |
There was a problem hiding this comment.
The table has already been dropped by the time we reach this line, so this change makes sense to me.
The Trino Iceberg connector also suppresses failures when it cannot delete data using the Glue catalog:
| try { | ||
| CatalogUtil.dropTableData(ops.io(), lastMetadata); | ||
| LOG.info("Glue table {} data purged", identifier); | ||
| } catch (Exception e) { |
There was a problem hiding this comment.
is it possible to catch the Specific exception rather than catch all Exception ?
There was a problem hiding this comment.
Intentionally catching broad Exception here — the table metadata has already been dropped by this point, so the try-catch is a safety net to prevent any unexpected failure from blocking the drop operation. Narrowing to a specific exception risks missing edge cases from different IO implementations (S3, GCS, HDFS, etc.). This also aligns with the Trino approach that @ebyhr linked.
There was a problem hiding this comment.
I'm hesitant against this broad of an exception. We should be strict in what we are catching here it's a purge that other use cases could hit.
There was a problem hiding this comment.
Fair point — two reviewers have raised this now. What exception type would you prefer here? The challenge is that CatalogUtil.dropTableData delegates to different IO implementations (S3, GCS, HDFS) which throw different exception types. Would RuntimeException be narrow enough, or would you prefer something S3-specific like SdkServiceException?
There was a problem hiding this comment.
Stepping back here, this PR is the first time in OSS (on the Java side) that GlueCatalog acknowledges the "federation" story from Glue. Where Glue can talk to linked catalogs, whether they're IRC backed or, in this case, S3 Tables. And in some cases federation is painful, like this managed storage story where S3 Tables limits normal S3 operations, and uses work arounds for location allocation.
With that said, it's worth being explicit about that rather than implicit with a broad try/catch.
So the root cause is that this Glue Table is federated from an S3 Tables catalog where data is managed server side, so our client side CatalogUtil.dropTableData is unnecessary and will fail. Catching Exception around dropTableData can swallow unrelated failures like iam issues or bugs we otherwise would want surfaced in non s3table catalogs especially during a purge.
There are 3 options here:
-
Build support for Glues federation use case. Glue responses expose a federated connection type. So in this case if its s3 tables we could skip client side purge. This is what we do today.
-
Decide
GlueCatalogdoesn't model federation yet and keep this scoped down. If we're not ready to introduce the glue federation support. Then this PR at a minimum narrow this try/catch to what S3Tables throws on delete and log. Furthermore, we need some documentation on this. -
Block all requests against federated tables. This is quite difficult if I remember correctly because a federated table can be linked to a database? I'd need to think this one through more.
I'd perfer option 1 because it will scale with the federation stories in Glue.
WDYT?
| try { | ||
| CatalogUtil.dropTableData(ops.io(), lastMetadata); | ||
| LOG.info("Glue table {} data purged", identifier); | ||
| } catch (Exception e) { |
There was a problem hiding this comment.
I'm hesitant against this broad of an exception. We should be strict in what we are catching here it's a purge that other use cases could hit.
| .build()); | ||
| LOG.info("Successfully dropped table {} from Glue", identifier); | ||
| if (purge && lastMetadata != null) { | ||
| CatalogUtil.dropTableData(ops.io(), lastMetadata); |
There was a problem hiding this comment.
Technically, s3tables is purging data behind the scenes right. So in the case of dropTable(identifier, false) we should throw and force the user to be specific about purging.
There was a problem hiding this comment.
Good point about S3 Tables purging behind the scenes. The current fix only affects the purge=true path — when purge=false, dropTableData is never called so there's no change in behavior. But I agree the broader S3 Tables story may need more thought beyond this PR.
There was a problem hiding this comment.
This is a matter of expected behavior someone could call drop table with purge set to false, and their data will still be deleted by S3Tables. I'm leaning towards forcing the user to specify purge when dropping otherwise fail.
| } | ||
|
|
||
| @Test | ||
| public void testDropTableWithPurgeFailure() { |
There was a problem hiding this comment.
Were you able to test this functionality against a real s3tables catalog?
There was a problem hiding this comment.
Haven't tested against a real S3 Tables catalog. Is that strictly necessary for this change, or would the unit test coverage be sufficient?
There was a problem hiding this comment.
Well we'd know what the actual exceptions to account for in catch above with some real testing right. We should be strict in handling here to avoid masking any real bugs.
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
When calling
GlueCatalog.dropTable()withpurge=trueon a table in an S3 Table Bucket, the purge fails because S3 Table Buckets do not allow manual file deletion.This change wraps
CatalogUtil.dropTableData()in a try-catch so that purge failures are logged as warnings instead of propagating and failing the entire drop operation. The table is still successfully dropped from the Glue catalog.Closes #14449