Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -356,8 +356,13 @@ public boolean dropTable(TableIdentifier identifier, boolean purge) {
.build());
LOG.info("Successfully dropped table {} from Glue", identifier);
if (purge && lastMetadata != null) {
CatalogUtil.dropTableData(ops.io(), lastMetadata);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, s3tables is purging data behind the scenes right. So in the case of dropTable(identifier, false) we should throw and force the user to be specific about purging.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about S3 Tables purging behind the scenes. The current fix only affects the purge=true path — when purge=false, dropTableData is never called so there's no change in behavior. But I agree the broader S3 Tables story may need more thought beyond this PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a matter of expected behavior someone could call drop table with purge set to false, and their data will still be deleted by S3Tables. I'm leaning towards forcing the user to specify purge when dropping otherwise fail.

LOG.info("Glue table {} data purged", identifier);
try {
CatalogUtil.dropTableData(ops.io(), lastMetadata);
LOG.info("Glue table {} data purged", identifier);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to check whether the target table exists in S3 Table bucket?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The location could be checked for S3 Table Bucket ARN patterns, but catching the exception is more robust as it handles any case where purge fails (permissions, bucket policies, etc.) without needing to enumerate all possible URI formats.
Looks like this also aligns with the Trino approach you linked!

Happy to add a URI check if you'd prefer a more targeted approach though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if we could do both (S3 Table check + try-catch) to avoid redundant S3 requests and warning logs. I think we should keep the try-catch regardless of S3 Table because it may fail for other reasons.

The "Enumerate all possible URI formats" approach doesn't look straightforward. Only adding try-catch looks good to me.

} catch (Exception e) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to catch the Specific exception rather than catch all Exception ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentionally catching broad Exception here — the table metadata has already been dropped by this point, so the try-catch is a safety net to prevent any unexpected failure from blocking the drop operation. Narrowing to a specific exception risks missing edge cases from different IO implementations (S3, GCS, HDFS, etc.). This also aligns with the Trino approach that @ebyhr linked.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant against this broad of an exception. We should be strict in what we are catching here it's a purge that other use cases could hit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point — two reviewers have raised this now. What exception type would you prefer here? The challenge is that CatalogUtil.dropTableData delegates to different IO implementations (S3, GCS, HDFS) which throw different exception types. Would RuntimeException be narrow enough, or would you prefer something S3-specific like SdkServiceException?

LOG.warn(
"Failed to purge data for table: {}, continuing drop without purge", identifier, e);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table has already been dropped by the time we reach this line, so this change makes sense to me.

The Trino Iceberg connector also suppresses failures when it cannot delete data using the Glue catalog:

https://github.com/trinodb/trino/blob/5a116341b53f9f3a3b29b8b405773010e307e40b/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/glue/TrinoGlueCatalog.java#L676-L696

}
}
LOG.info("Dropped table: {}", identifier);
return true;
Expand Down
42 changes: 42 additions & 0 deletions aws/src/test/java/org/apache/iceberg/aws/glue/TestGlueCatalog.java
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,10 @@
import java.util.UUID;
import java.util.concurrent.atomic.AtomicInteger;
import org.apache.iceberg.BaseMetastoreTableOperations;
import org.apache.iceberg.CatalogUtil;
import org.apache.iceberg.Schema;
import org.apache.iceberg.TableMetadata;
import org.apache.iceberg.TableOperations;
import org.apache.iceberg.aws.AwsProperties;
import org.apache.iceberg.aws.s3.S3FileIOProperties;
import org.apache.iceberg.catalog.Namespace;
Expand All @@ -40,6 +43,7 @@
import org.apache.iceberg.util.LockManagers;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.mockito.MockedStatic;
import org.mockito.Mockito;
import org.mockito.invocation.InvocationOnMock;
import org.mockito.stubbing.Answer;
Expand Down Expand Up @@ -312,6 +316,44 @@ public void testDropTable() {
glueCatalog.dropTable(TableIdentifier.of("db1", "t1"));
}

@Test
public void testDropTableWithPurgeFailure() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you able to test this functionality against a real s3tables catalog?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't tested against a real S3 Tables catalog. Is that strictly necessary for this change, or would the unit test coverage be sufficient?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we'd know what the actual exceptions to account for in catch above with some real testing right. We should be strict in handling here to avoid masking any real bugs.

Map<String, String> properties = Maps.newHashMap();
properties.put(
BaseMetastoreTableOperations.TABLE_TYPE_PROP,
BaseMetastoreTableOperations.ICEBERG_TABLE_TYPE_VALUE);
Mockito.doReturn(
GetTableResponse.builder()
.table(
Table.builder().databaseName("db1").name("t1").parameters(properties).build())
.build())
.when(glue)
.getTable(Mockito.any(GetTableRequest.class));
Mockito.doReturn(
GetDatabaseResponse.builder().database(Database.builder().name("db1").build()).build())
.when(glue)
.getDatabase(Mockito.any(GetDatabaseRequest.class));
Mockito.doReturn(DeleteTableResponse.builder().build())
.when(glue)
.deleteTable(Mockito.any(DeleteTableRequest.class));

TableMetadata mockMetadata = Mockito.mock(TableMetadata.class);
TableOperations mockOps = Mockito.mock(TableOperations.class);
Mockito.when(mockOps.current()).thenReturn(mockMetadata);

GlueCatalog spyCatalog = Mockito.spy(glueCatalog);
Mockito.doReturn(mockOps).when(spyCatalog).newTableOps(Mockito.any(TableIdentifier.class));

try (MockedStatic<CatalogUtil> mockedCatalogUtil = Mockito.mockStatic(CatalogUtil.class)) {
mockedCatalogUtil
.when(() -> CatalogUtil.dropTableData(Mockito.any(), Mockito.any()))
.thenThrow(new RuntimeException("Cannot delete files from S3 Table Bucket"));

boolean result = spyCatalog.dropTable(TableIdentifier.of("db1", "t1"), true);
assertThat(result).isTrue();
}
}

@Test
public void testRenameTable() {
AtomicInteger counter = new AtomicInteger(1);
Expand Down
Loading