[#9673] Changes for Reuse FileSystem cache in the fileset server and client module#9708
[#9673] Changes for Reuse FileSystem cache in the fileset server and client module#9708manojks1999 wants to merge 1 commit intoapache:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request refactors the FileSystem cache implementation to enable reuse between the fileset server module and the GVFS client module. The changes extract the cache and cache key classes from both modules into a shared location (catalogs/hadoop-common), implementing a unified FileSystemCache wrapper around Caffeine cache.
Changes:
- Moved
FileSystemCacheKeyand cache implementation from both modules tocatalogs/hadoop-common - Created a new
FileSystemCachewrapper class with builder pattern for consistent cache configuration - Updated both server (
FilesetCatalogOperations) and client (BaseGVFSOperations) to use the shared cache implementation - Added comprehensive unit tests for the new shared cache classes
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
catalogs/hadoop-common/src/main/java/org/apache/gravitino/catalog/hadoop/fs/FileSystemCache.java |
New shared cache implementation with builder pattern and cleanup logic |
catalogs/hadoop-common/src/main/java/org/apache/gravitino/catalog/hadoop/fs/FileSystemCacheKey.java |
New shared cache key supporting both UGI-based and config-based caching |
catalogs/hadoop-common/src/test/java/org/apache/gravitino/catalog/hadoop/fs/TestFileSystemCache.java |
Comprehensive unit tests for the cache implementation |
catalogs/hadoop-common/src/test/java/org/apache/gravitino/catalog/hadoop/fs/TestFileSystemCacheKey.java |
Unit tests for cache key equality and hashing |
catalogs/hadoop-common/build.gradle.kts |
Added caffeine and slf4j-api dependencies |
clients/filesystem-hadoop3/src/main/java/org/apache/gravitino/filesystem/hadoop/BaseGVFSOperations.java |
Removed duplicate cache implementation and uses shared cache |
clients/filesystem-hadoop3/src/test/java/org/apache/gravitino/filesystem/hadoop/TestGvfsBase.java |
Updated tests to use new cache type |
clients/filesystem-hadoop3/src/test/java/org/apache/gravitino/filesystem/hadoop/TestFileSystemCacheKey.java |
Updated to test shared cache key class |
catalogs/catalog-fileset/src/main/java/org/apache/gravitino/catalog/fileset/FilesetCatalogOperations.java |
Removed duplicate cache implementation and uses shared cache |
|
|
||
| @VisibleForTesting | ||
| Cache<FileSystemCacheKey, FileSystem> internalFileSystemCache() { | ||
| @VisibleForTesting |
There was a problem hiding this comment.
Duplicate @VisibleForTesting annotation on method internalFileSystemCache(). Remove one of the duplicate annotations.
| @VisibleForTesting |
| } catch (IOException e) { | ||
| LOG.warn("Failed to close FileSystem instance in cache for key: {}", k, e); | ||
| } | ||
| }); |
There was a problem hiding this comment.
The close() method should call cache.invalidateAll() after closing file systems and before cleanup to ensure all entries are removed from the cache. While cleanUp() performs maintenance, invalidateAll() explicitly clears all entries. This makes the cache state more predictable after close and is consistent with proper resource cleanup patterns.
| }); | |
| }); | |
| cache.invalidateAll(); |
| // Close all cached file systems | ||
| cache | ||
| .asMap() | ||
| .forEach( | ||
| (k, v) -> { | ||
| try { | ||
| v.close(); | ||
| } catch (IOException e) { | ||
| LOG.warn("Failed to close FileSystem instance in cache for key: {}", k, e); | ||
| } | ||
| }); | ||
| cache.cleanUp(); | ||
|
|
||
| // Shutdown the scheduler if it was created | ||
| if (scheduler != null) { | ||
| scheduler.shutdownNow(); | ||
| } |
There was a problem hiding this comment.
Consider collecting all IOExceptions during FileSystem closure and throwing a composite exception at the end. Currently, all exceptions are only logged as warnings, which may hide important cleanup failures. This would make error handling more explicit and allow callers to be aware of partial failures during cache cleanup.
| // Close all cached file systems | |
| cache | |
| .asMap() | |
| .forEach( | |
| (k, v) -> { | |
| try { | |
| v.close(); | |
| } catch (IOException e) { | |
| LOG.warn("Failed to close FileSystem instance in cache for key: {}", k, e); | |
| } | |
| }); | |
| cache.cleanUp(); | |
| // Shutdown the scheduler if it was created | |
| if (scheduler != null) { | |
| scheduler.shutdownNow(); | |
| } | |
| // Close all cached file systems, collecting any IOExceptions | |
| IOException closeException = null; | |
| for (Map.Entry<FileSystemCacheKey, FileSystem> entry : cache.asMap().entrySet()) { | |
| FileSystemCacheKey key = entry.getKey(); | |
| FileSystem fileSystem = entry.getValue(); | |
| try { | |
| fileSystem.close(); | |
| } catch (IOException e) { | |
| LOG.warn("Failed to close FileSystem instance in cache for key: {}", key, e); | |
| if (closeException == null) { | |
| closeException = e; | |
| } else { | |
| closeException.addSuppressed(e); | |
| } | |
| } | |
| } | |
| cache.cleanUp(); | |
| // Shutdown the scheduler if it was created | |
| if (scheduler != null) { | |
| scheduler.shutdownNow(); | |
| } | |
| if (closeException != null) { | |
| throw closeException; | |
| } |
|
@manojks1999 |
Fix for the #9673