Branch fs spi review#62024
Merged
morningman merged 2 commits intoapache:branch-fs-spifrom Apr 2, 2026
Merged
Conversation
…odule ### What problem does this PR solve? Issue Number: N/A Problem Summary: Code review of fe/fe-filesystem/ identified 2 critical bugs, 5 major issues, and 5 minor issues. This commit fixes all of them: **Critical bugs fixed:** - ObjFileSystem.exists(): called location.withoutScheme() which strips the URI scheme, causing S3Uri/AzureUri parsing to throw on every exists() call. Fixed to use location.uri() (full URI with scheme). - S3FileSystem.globListWithLimit(): pagination loop exited prematurely after hitting the limit, so nextMatchAfterLimit was only searched in the remaining items of the current S3 page. Fixed to continue paginating until the boundary key is found. **Major issues fixed:** - RemoteObject.modificationTime() renamed to getModificationTime() for JavaBean consistency with the other getters (getKey, getSize, getEtag, getRelativePath). - S3ObjStorage.buildClient(): removed hard requirement for PROP_ENDPOINT; when absent the AWS SDK uses region-only routing for native AWS S3 access. - BrokerInputStream FD leak: added clientInvalidated flag; close() now skips the closeReader RPC when the client was already invalidated by a prior RPC failure, preventing a double-invalidate and broker-side FD leak. - DFSFileSystem.list(): replaced eager listStatus(path) (returns full FileStatus[]) with lazy listStatusIterator(path) backed by HdfsFileIterator; each hasNext() and next() call runs inside the HadoopAuthenticator context for Kerberos safety. - COS/OBS/OSS providers: added explicit _STORAGE_TYPE_ check so VPC/custom-domain endpoints are correctly routed without relying on domain-name pattern matching. **Minor issues fixed:** - RemoteIterator, SimpleRemoteIterator, FileSystemIOException removed from SPI (unused; FileIterator already covers the same purpose). - RequestBody.content() no longer declares throws IOException (no IO occurs). - LocalSeekableInputStream.read() and seek() now throw IOException when closed. - AzureFileSystem.rename() throws IOException instead of UnsupportedOperationException for directory renames, matching the FileSystem contract. - BrokerFileSystemProvider.create(): null check added for BROKER_PORT before Integer.parseInt to give a clear error message instead of NPE. ### Release note Fix ObjFileSystem.exists() always failing due to withoutScheme(); fix S3 glob pagination losing results after limit; support AWS region-only S3 access without an explicit endpoint; fix broker reader FD leak on RPC failure; HDFS listing is now lazy/streaming. ### Check List (For Author) - Test: Manual compile verification (all 10 fe-filesystem modules BUILD SUCCESS) - Behavior changed: Yes — S3 clients no longer require PROP_ENDPOINT for native AWS S3; ObjFileSystem.exists() now works correctly; HDFS list is lazy - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve? Issue Number: N/A Problem Summary: fe-filesystem-spi mixed consumer-facing API types (FileSystem, Location, FileEntry, etc.) with provider-facing SPI contracts (ObjStorage, HadoopAuthenticator, etc.) in the same module and package (org.apache.doris.filesystem.spi). This made it impossible to depend on just the consumer API without pulling in all provider-implementation contracts. ### Changes Created a new fe-filesystem-api module with package org.apache.doris.filesystem, containing the 16 consumer-facing types: - FileSystem, FileSystemProvider (plugin discovery interface) - Location, FileEntry, BlockInfo (value types) - FileIterator, DorisInputFile, DorisInputStream, DorisOutputFile (I/O types) - GlobListing, FileSystemType, FileSystemUtil, FileSystemTransferUtil (utilities) - FileSystemIOException, RemoteIterator, SimpleRemoteIterator (exceptions/iteration) Retained in fe-filesystem-spi (package org.apache.doris.filesystem.spi), the 9 provider-only contracts: - ObjFileSystem, ObjStorage, HadoopAuthenticator, IOCallable - RemoteObject, RemoteObjects, RequestBody, StsCredentials, UploadPartResult All 8 impl modules (S3/Azure/HDFS/Broker/COS/OBS/OSS/Local) and fe-core have their imports updated to the new package. META-INF/services files renamed from org.apache.doris.filesystem.spi.FileSystemProvider to org.apache.doris.filesystem.FileSystemProvider to match the moved interface. build.sh updated to include fe-filesystem-api in the build module list and exclude it from plugin dependency copy (it is on the main FE classpath). ### Release note None ### Check List (For Author) - Test: Build verified with ./build.sh --fe -j4 → Successfully build Doris - Behavior changed: No (API-compatible refactor, same public contracts) - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.