-
Notifications
You must be signed in to change notification settings - Fork 3.3k
AWS: Handle S3 Table Bucket purge gracefully in GlueCatalog (#14449) #16073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -356,8 +356,13 @@ public boolean dropTable(TableIdentifier identifier, boolean purge) { | |
| .build()); | ||
| LOG.info("Successfully dropped table {} from Glue", identifier); | ||
| if (purge && lastMetadata != null) { | ||
| CatalogUtil.dropTableData(ops.io(), lastMetadata); | ||
| LOG.info("Glue table {} data purged", identifier); | ||
| try { | ||
| CatalogUtil.dropTableData(ops.io(), lastMetadata); | ||
| LOG.info("Glue table {} data purged", identifier); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any way to check whether the target table exists in S3 Table bucket?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The location could be checked for S3 Table Bucket ARN patterns, but catching the exception is more robust as it handles any case where purge fails (permissions, bucket policies, etc.) without needing to enumerate all possible URI formats. Happy to add a URI check if you'd prefer a more targeted approach though.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was wondering if we could do both (S3 Table check + try-catch) to avoid redundant S3 requests and warning logs. I think we should keep the try-catch regardless of S3 Table because it may fail for other reasons. The "Enumerate all possible URI formats" approach doesn't look straightforward. Only adding try-catch looks good to me. |
||
| } catch (Exception e) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it possible to catch the Specific exception rather than catch all Exception ?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Intentionally catching broad
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm hesitant against this broad of an exception. We should be strict in what we are catching here it's a purge that other use cases could hit.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair point — two reviewers have raised this now. What exception type would you prefer here? The challenge is that |
||
| LOG.warn( | ||
| "Failed to purge data for table: {}, continuing drop without purge", identifier, e); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The table has already been dropped by the time we reach this line, so this change makes sense to me. The Trino Iceberg connector also suppresses failures when it cannot delete data using the Glue catalog: |
||
| } | ||
| } | ||
| LOG.info("Dropped table: {}", identifier); | ||
| return true; | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -26,7 +26,10 @@ | |
| import java.util.UUID; | ||
| import java.util.concurrent.atomic.AtomicInteger; | ||
| import org.apache.iceberg.BaseMetastoreTableOperations; | ||
| import org.apache.iceberg.CatalogUtil; | ||
| import org.apache.iceberg.Schema; | ||
| import org.apache.iceberg.TableMetadata; | ||
| import org.apache.iceberg.TableOperations; | ||
| import org.apache.iceberg.aws.AwsProperties; | ||
| import org.apache.iceberg.aws.s3.S3FileIOProperties; | ||
| import org.apache.iceberg.catalog.Namespace; | ||
|
|
@@ -40,6 +43,7 @@ | |
| import org.apache.iceberg.util.LockManagers; | ||
| import org.junit.jupiter.api.BeforeEach; | ||
| import org.junit.jupiter.api.Test; | ||
| import org.mockito.MockedStatic; | ||
| import org.mockito.Mockito; | ||
| import org.mockito.invocation.InvocationOnMock; | ||
| import org.mockito.stubbing.Answer; | ||
|
|
@@ -312,6 +316,44 @@ public void testDropTable() { | |
| glueCatalog.dropTable(TableIdentifier.of("db1", "t1")); | ||
| } | ||
|
|
||
| @Test | ||
| public void testDropTableWithPurgeFailure() { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Were you able to test this functionality against a real s3tables catalog?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Haven't tested against a real S3 Tables catalog. Is that strictly necessary for this change, or would the unit test coverage be sufficient?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well we'd know what the actual exceptions to account for in catch above with some real testing right. We should be strict in handling here to avoid masking any real bugs. |
||
| Map<String, String> properties = Maps.newHashMap(); | ||
| properties.put( | ||
| BaseMetastoreTableOperations.TABLE_TYPE_PROP, | ||
| BaseMetastoreTableOperations.ICEBERG_TABLE_TYPE_VALUE); | ||
| Mockito.doReturn( | ||
| GetTableResponse.builder() | ||
| .table( | ||
| Table.builder().databaseName("db1").name("t1").parameters(properties).build()) | ||
| .build()) | ||
| .when(glue) | ||
| .getTable(Mockito.any(GetTableRequest.class)); | ||
| Mockito.doReturn( | ||
| GetDatabaseResponse.builder().database(Database.builder().name("db1").build()).build()) | ||
| .when(glue) | ||
| .getDatabase(Mockito.any(GetDatabaseRequest.class)); | ||
| Mockito.doReturn(DeleteTableResponse.builder().build()) | ||
| .when(glue) | ||
| .deleteTable(Mockito.any(DeleteTableRequest.class)); | ||
|
|
||
| TableMetadata mockMetadata = Mockito.mock(TableMetadata.class); | ||
| TableOperations mockOps = Mockito.mock(TableOperations.class); | ||
| Mockito.when(mockOps.current()).thenReturn(mockMetadata); | ||
|
|
||
| GlueCatalog spyCatalog = Mockito.spy(glueCatalog); | ||
| Mockito.doReturn(mockOps).when(spyCatalog).newTableOps(Mockito.any(TableIdentifier.class)); | ||
|
|
||
| try (MockedStatic<CatalogUtil> mockedCatalogUtil = Mockito.mockStatic(CatalogUtil.class)) { | ||
| mockedCatalogUtil | ||
| .when(() -> CatalogUtil.dropTableData(Mockito.any(), Mockito.any())) | ||
| .thenThrow(new RuntimeException("Cannot delete files from S3 Table Bucket")); | ||
|
|
||
| boolean result = spyCatalog.dropTable(TableIdentifier.of("db1", "t1"), true); | ||
| assertThat(result).isTrue(); | ||
| } | ||
| } | ||
|
|
||
| @Test | ||
| public void testRenameTable() { | ||
| AtomicInteger counter = new AtomicInteger(1); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, s3tables is purging data behind the scenes right. So in the case of
dropTable(identifier, false)we should throw and force the user to be specific about purging.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about S3 Tables purging behind the scenes. The current fix only affects the
purge=truepath — whenpurge=false,dropTableDatais never called so there's no change in behavior. But I agree the broader S3 Tables story may need more thought beyond this PR.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a matter of expected behavior someone could call drop table with purge set to false, and their data will still be deleted by S3Tables. I'm leaning towards forcing the user to specify purge when dropping otherwise fail.