Skip to content

Core: Close FileIO on cache eviction to prevent thread leaks#15910

Open
utafrali wants to merge 3 commits intoapache:mainfrom
utafrali:fix/issue-15898-cachingcatalog-does-not-close-fileio-on-
Open

Core: Close FileIO on cache eviction to prevent thread leaks#15910
utafrali wants to merge 3 commits intoapache:mainfrom
utafrali:fix/issue-15898-cachingcatalog-does-not-close-fileio-on-

Conversation

@utafrali
Copy link
Copy Markdown

@utafrali utafrali commented Apr 8, 2026

Fixes #15898

When tables get evicted from the cache, the FileIO wasn't being closed. This leaves S3FileIO with open thread pools that never get cleaned up, which is a problem in long-running applications. Added a try-catch around the close call since FileIO implementations can throw exceptions, and added a test to verify it gets called.

When tables get evicted from the cache, the FileIO wasn't being closed. This leaves S3FileIO with open thread pools that never get cleaned up, which is a problem in long-running applications. Added a try-catch around the close call since FileIO implementations can throw exceptions, and added a test to verify it gets called.

Fixes apache#15898
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses resource/thread leaks in CachingCatalog by ensuring a table’s FileIO is closed when the table is evicted from the cache, preventing long-lived background threads (notably from S3FileIO) from accumulating in long-running applications.

Changes:

  • Close table.io() when a cached table entry is removed due to expiration.
  • Add a unit test verifying FileIO.close() is invoked on cache eviction.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
core/src/main/java/org/apache/iceberg/CachingCatalog.java Adds FileIO.close() in the cache removal listener for expired base tables.
core/src/test/java/org/apache/iceberg/hadoop/TestCachingCatalog.java Adds a Mockito-based test asserting FileIO.close() is called after TTL eviction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +98 to +106
if (RemovalCause.EXPIRED.equals(cause)) {
if (!MetadataTableUtils.hasMetadataTableName(tableIdentifier)) {
tableCache.invalidateAll(metadataTableIdentifiers(tableIdentifier));
if (table != null) {
try {
table.io().close();
} catch (Exception e) {
LOG.warn("Failed to close FileIO for evicted table {}", tableIdentifier, e);
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The close logic only runs when the removal cause is EXPIRED. Caffeine can also remove entries due to COLLECTED (softValues GC) and potentially SIZE (if a max-size/weight policy is added); in those cases the FileIO would still not be closed and the thread leak described in #15898 can persist. Consider running the same invalidation/close path for all eviction causes (e.g., cause.wasEvicted()), while still skipping metadata table identifiers.

Copilot uses AI. Check for mistakes.
@utafrali
Copy link
Copy Markdown
Author

utafrali commented Apr 9, 2026

Done, pushed the fix.

tableCache.invalidateAll(metadataTableIdentifiers(tableIdentifier));
if (table != null) {
try {
table.io().close();
Copy link
Copy Markdown
Contributor

@anoopj anoopj Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be safe for all catalogs. e.g. HadoopCatalog creates a common fileIO at the catalog level and not table level. The same fileIO reference is shared with the table object. So the close() here would break tables in the catalog that are still actively used.

Can this be just handled as part of the catalog's close() instead?

@utafrali
Copy link
Copy Markdown
Author

Done, pushed the fix.

@anoopj
Copy link
Copy Markdown
Contributor

anoopj commented Apr 10, 2026

Thanks for removing the table.io().close() call. That addresses the shared FileIO problem. The remaining change looks correct for broadening metadata table invalidation.

But the test testFileIOClosedOnCacheEviction() still asserts close() in the mock class, which will fail since the close call was removed. I recommend modifying the test to verify the wasEvicted() behavior instead (e.g., assert that metadata tables are invalidated for non-EXPIRED eviction causes like COLLECTED).

Also the PR description still references closing FileIO - you might want to fix that as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CachingCatalog does not close FileIO on cache eviction, causing S3FileIO / SDK v2 thread leak in long-running applications

3 participants