Skip to content

[draft]Branch fs spi phase5 clean master#62022

Closed
morningman wants to merge 43 commits intoapache:masterfrom
morningman:branch-fs-spi-phase5-clean-master
Closed

[draft]Branch fs spi phase5 clean master#62022
morningman wants to merge 43 commits intoapache:masterfrom
morningman:branch-fs-spi-phase5-clean-master

Conversation

@morningman
Copy link
Copy Markdown
Contributor

No description provided.

morningman and others added 30 commits March 30, 2026 12:48
### What problem does this PR solve?

Issue Number: N/A

Problem Summary: Before splitting filesystem implementations into independent
Maven modules (Phase 3), several compile-time couplings must be eliminated.
This commit completes all Phase 0 prerequisite decoupling tasks:

- P0.1: Introduce FsStorageType enum in fe-foundation (zero-dep module) to
  replace StorageBackend.StorageType (Thrift-generated) in PersistentFileSystem.
  Add FsStorageTypeAdapter for bidirectional Thrift conversion. Update all
  subclasses and callers (Repository, BackupJob, RestoreJob, CloudRestoreJob).

- P0.2: Add IOException-based default bridge methods to ObjStorage interface
  (checkObjectExists, getObjectChecked, putObjectChecked, deleteObjectChecked,
  deleteObjectsChecked, copyObjectChecked, listObjectsChecked). Add
  ObjStorageStatusAdapter for Status→IOException conversion. Zero changes to
  existing implementations.

- P0.3: Decouple SwitchingFileSystem from ExternalMetaCacheMgr via new
  FileSystemLookup functional interface. FileSystemProviderImpl passes a lambda.

- P0.4: Extract MultipartUploadCapable interface from ObjFileSystem, removing
  the forced abstract method. S3FileSystem and AzureFileSystem implement it.
  HMSTransaction now uses instanceof check instead of ObjFileSystem cast.

- P0.5: Introduce FileSystemDescriptor POJO for Repository metadata serialization,
  replacing direct PersistentFileSystem subclass serialization. Migrate GsonUtils
  to string-based Class.forName() reflection for legacy format backward compat,
  removing 7 compile-time imports of concrete filesystem classes.

- P0.6: Add FileSystemSpiProvider interface skeleton in fs/spi/ as the future
  ServiceLoader contract for Phase 3 module split.

### Release note

None

### Check List (For Author)

- Test: No need to test (pure refactor; all changes are backward compatible;
  three successful FE builds verified during development)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Maven build cache was caching checkstyle:check results and emitting
'Skipping plugin execution (cached)' even when sources had changed.
Two fixes:

1. Add check/checkstyle/ to global cache input so that changes to
   checkstyle rules (checkstyle.xml, suppressions.xml, etc.) correctly
   invalidate all module caches.

2. Mark the checkstyle:check execution (id: validate) as runAlways in
   executionControl so it is never skipped regardless of cache state.
   Checkstyle is a quality gate and must always execute.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- ObjFileSystem: remove unused Map import (was used by the now-removed
  abstract completeMultipartUpload method)
- GsonUtils: fix CustomImportOrder violations - LogManager/Logger imports
  were inserted before com.google.* imports; move them after all com.* imports
  in correct lexicographical order

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…pache#61862)

## Summary

This PR completes **Phase 0** of the [FE filesystem SPI
refactoring](apache#61860) — removing
compile-time couplings that would otherwise prevent splitting filesystem
implementations into independent Maven modules in later phases.

## Changes

### P0.1 — FsStorageType enum migration
- Introduce `FsStorageType` enum in `fe-foundation` (zero-dependency
module) to replace Thrift-generated `StorageBackend.StorageType` in
`PersistentFileSystem`
- Add `FsStorageTypeAdapter` in `fe-core` for bidirectional
Thrift↔FsStorageType conversion
- Update all subclasses and callers: `Repository`, `BackupJob`,
`RestoreJob`, `CloudRestoreJob`

### P0.2 — ObjStorage IOException bridge
- Add `IOException`-based `default` bridge methods to `ObjStorage`
interface
- Add `ObjStorageStatusAdapter` for `Status→IOException` conversion

### P0.3 — SwitchingFileSystem decoupling
- Introduce `FileSystemLookup` functional interface
- Decouple `SwitchingFileSystem` from `ExternalMetaCacheMgr`

### P0.4 — MultipartUploadCapable interface
- Extract `MultipartUploadCapable` interface from `ObjFileSystem`
- `S3FileSystem` and `AzureFileSystem` implement it; `HMSTransaction`
uses `instanceof` check

### P0.5 — GsonUtils compile-time decoupling
- Introduce `FileSystemDescriptor` POJO for `Repository` metadata
serialization
- `GsonUtils` removes 7 compile-time concrete class imports, uses
`Class.forName()` reflection

### P0.6 — FileSystemSpiProvider skeleton
- Add `FileSystemSpiProvider` interface in `fs/spi/`

### Build
- Fix Maven build cache incorrectly skipping `checkstyle:check`
- Fix checkstyle violations (unused import, import order)

## Testing
- FE build: ✅  Checkstyle: 0 violations ✅

Closes part of apache#61860

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…stem API and value objects (apache#61908)

### What problem does this PR solve?

Issue Number: apache#61860

Problem Summary: The existing FileSystem interface uses Status-based
return values, bare String paths, and Hadoop-dependent RemoteFile
objects throughout the FE codebase, making it hard to test and
impossible to isolate from Hadoop at the module boundary. Phase 1
introduces the new clean IOException-based FileSystem API with typed
Location value objects, while preserving full backward compatibility via
LegacyFileSystemApi.

### Release note

None

### Check List (For Author)

- Test: Manual build verification (./build.sh --fe) passes with zero
errors
- Behavior changed: No (all existing code paths preserved via
LegacyFileSystemApi)
- Does this need documentation: No

New files:
- Location.java: immutable URI value object replacing bare String paths
- FileEntry.java: immutable file/dir descriptor replacing
Hadoop-dependent RemoteFile
- FileIterator.java: lazy Closeable iterator interface for directory
listing
- LegacyFileSystemApi.java: @deprecated copy of old FileSystem interface
(Status-based)
- LegacyFileSystemAdapter.java: abstract bridge implementing new
FileSystem via legacy* methods
- LegacyToNewFsAdapter.java: wraps any LegacyFileSystemApi as new
FileSystem
- MemoryFileSystem.java: in-memory FileSystem for unit testing

Modified files:
- FileSystem.java: replaced with new clean IOException-based interface
- PersistentFileSystem, LocalDfsFileSystem, SwitchingFileSystem:
implements LegacyFileSystemApi
- FileSystemProvider, FileSystemLookup: return LegacyFileSystemApi
- DorisInputFile, DorisOutputFile: add location() method; deprecate
path()
- HdfsInputFile, HdfsOutputFile, HdfsInputStream: use Location instead
of ParsedPath
- ParsedPath: @deprecated + toLocation() conversion method
- RemoteFile: @deprecated + toFileEntry()/fromFileEntry() conversion
methods
- RemoteFiles, RemoteFileRemoteIterator: @deprecated
- All callers updated to use LegacyFileSystemApi type

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…#61909)

followup apache#61908

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…module (apache#61911)

### What problem does this PR solve?

Issue Number: apache#61860

Problem Summary: Eliminates duplicate object storage abstraction in the
Cloud mode codebase. Previously, cloud.storage.RemoteBase and its
subclasses (S3Remote, OssRemote, CosRemote, ObsRemote, BosRemote,
AzureRemote, etc.) provided object storage operations independently from
the fs.obj module. This change migrates all functionality into fs.obj
and removes the entire Remote* hierarchy, so all Cloud Stage/Copy
operations now go through ObjFileSystem.

Key changes:
- Add OssObjStorage, CosObjStorage, ObsObjStorage, BosObjStorage as new
ObjStorage implementations with STS token, presigned URL, list, and head
support
- Extend AzureObjStorage and S3ObjStorage with getStsToken,
getPresignedUrl, listObjectsWithPrefix, headObjectWithMeta
- Add ObjFileSystem passthrough methods: getStsToken, getPresignedUrl,
listObjectsWithPrefix, headObjectWithMeta, deleteObjectsByKeys
- Extract RemoteBase.ObjectInfo inner class to standalone
cloud.storage.ObjectInfo
- Rewrite ObjectInfoAdapter: no RemoteBase dependency; uses
ObjFileSystem for STS in ARN flow
- Migrate all callers (StageUtil, CopyLoadPendingTask, CleanCopyJobTask,
CopyIntoAction, CreateStageCommand, CopyIntoInfo) to ObjFileSystem
- Delete RemoteBase, S3Remote, OssRemote, CosRemote, ObsRemote,
BosRemote, TosRemote, AzureRemote, DefaultRemote, MockRemote
- Rewrite CopyLoadPendingTaskTest and StageUtilTest to use
MockUp<ObjFileSystem>
- Fix Maven build cache causing test execution to be skipped
(run-fe-ut.sh)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ent SPI modules (apache#61919)

### What problem does this PR solve?

Issue Number: N/A

Problem Summary:
Refactor the Apache Doris FE filesystem layer so that each storage
backend
(S3, OSS, COS, OBS, Azure, HDFS, Local) lives in its own independent
Maven
module and is discovered at runtime via Java ServiceLoader (SPI
pattern).
Adding a new storage backend now requires zero changes to fe-core.

Changes by step:

**P3.0b** – Rename `FileSystemProvider` →
`LegacyFileSystemProviderFactory`
  to free up the name `FileSystemProvider` for the new SPI interface.

**Step 1** – Create `fe/fe-filesystem/` aggregator POM and
`fe-filesystem-spi`
  module. Defines the stable SPI contracts:
  - `FileSystem`, `ObjStorage`, `FileSystemProvider` interfaces
  - `HadoopAuthenticator`, `IOCallable` for HDFS/Kerberos
- Value objects: `FileEntry`, `FileIterator`, `InputFile`, `OutputFile`,
    `RequestBody`, `ObjectMetadata`, `MultipartPart`, `UploadedPart`

**Step 2** – `fe-filesystem-s3` module:
  - `S3Uri`, `S3ObjStorage`, `S3FileSystem`, `S3OutputStream`
  - `S3FileSystemProvider` registered via META-INF/services

**Step 3** – `fe-filesystem-oss`, `fe-filesystem-cos`,
`fe-filesystem-obs`:
  - Thin providers that reuse S3FileSystem via S3-compatible APIs
  - Each translates vendor-specific property keys to AWS_* keys

**Step 3b** – `fe-filesystem-azure` module:
  - `AzureUri` (wasb/wasbs/abfs/abfss/https), `AzureObjStorage`
  - Supports shared-key, service-principal, and default-credential auth
  - Multipart upload via Azure block-blob stageBlock/commitBlockList

**Step 4** – `fe-filesystem-hdfs` module:
  - `HdfsConfigBuilder`, `SimpleHadoopAuthenticator`,
    `KerberosHadoopAuthenticator`
  - `DFSFileSystem` implements `spi.FileSystem` directly
  - Supports hdfs/viewfs/ofs/jfs/oss schemes

**Step 6** – `fe-filesystem-local` (test-only):
  - `LocalFileSystem` using `java.nio.file` for unit-test isolation

**Step 7** – `fe-core` integration:
- `fe-filesystem-spi` added as compile dep; all impl modules as runtime
deps
- New `StoragePropertiesConverter` translates `StorageProperties` → Map
  - `FileSystemFactory` extended with `getFileSystem(Map)` /
`getFileSystem(StorageProperties)` using ServiceLoader (double-checked
locking); old `get()` methods kept `@Deprecated` for Phase-4 migration
  - Fix checkstyle violations (import order, line length) in HMS classes
    introduced by P3.0b rename

### Release note

None (internal refactoring; no user-visible behavior change)

### Check List (For Author)

- Test: `mvn install -pl fe-core -am -DskipTests` passes (BUILD SUCCESS)
- Behavior changed: No (legacy API fully preserved; new SPI API added
alongside)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?

Issue Number: N/A

Problem Summary: Phase 4 step P4.0 — add FileSystemTransferUtil, a utility
class that provides higher-level transfer operations (download, upload,
directUpload, globList) built on top of the spi.FileSystem primitives.
This utility is a prerequisite for migrating all 21 FileSystemFactory.get()
call sites to the new SPI API.

Also adds fe-filesystem-local as a test-scope dependency in fe-core/pom.xml
to enable unit tests using LocalFileSystemProvider.

### Release note

None

### Check List (For Author)

- Test: Unit Test (FileSystemTransferUtilTest — 12 tests, all passing)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ading infrastructure

### What problem does this PR solve?

Issue Number: N/A

Problem Summary: Phase 4 P4.1 introduces the plugin-directory loading mechanism so
fe-core no longer bundles cloud filesystem providers as runtime JARs. Providers are
now discovered at startup by FileSystemPluginManager via DirectoryPluginRuntimeManager
(production) or ServiceLoader (tests / classpath). This decouples fe-core from
transitive cloud-SDK dependencies.

Changes:
- fe-filesystem-spi/pom.xml: add fe-extension-spi compile dep (enables PluginFactory)
- FileSystemProvider: extends PluginFactory; add default Plugin create() bridge
- FileSystemPluginManager: new class, dual-path provider loading (ServiceLoader +
  DirectoryPluginRuntimeManager), createFileSystem() delegation
- FileSystemFactory: add initPluginManager() static init; getFileSystem() delegates
  to manager when set, falls back to ServiceLoader for tests
- Config.java + fe.conf: add filesystem_plugin_root config key
- Env.java: call initFileSystemPluginManager() during FE startup
- fe-core/pom.xml: remove 6 fe-filesystem-* runtime deps; keep local as test-scope
- build.sh: deploy each fe-filesystem-<name>.jar + deps into
  output/fe/plugins/filesystem/<name>/ at build time

### Release note

None

### Check List (For Author)

- Test: No need to test (P4.1 is infrastructure; integration covered by P4.8 tests)
- Behavior changed: Yes — filesystem providers loaded from plugin directory at runtime
  instead of bundled JARs; behavior identical when plugin dir is configured correctly
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…i.FileSystem

### What problem does this PR solve?

Issue Number: N/A

Problem Summary: Phase 4 P4.2 migrates FileSystemCache from returning legacy
RemoteFileSystem to spi.FileSystem, and updates all downstream callers to use
the new SPI API instead of LegacyFileSystemApi.

Changes:
- FileSystemCache: change cache type to spi.FileSystem; use
  FileSystemFactory.getFileSystem() in loadFileSystem(); rename
  getRemoteFileSystem() to getFileSystem(); add getProperties() to FileSystemCacheKey
- DirectoryLister: change listFiles() parameter from LegacyFileSystemApi to spi.FileSystem
- FileSystemDirectoryLister: reimplement using FileSystemTransferUtil.globList()
  + FileEntry→RemoteFile conversion (transitional, RemoteFile not yet replaced)
- TransactionScopeCachingDirectoryLister: update parameter types
- AcidUtil: change parameter types to spi.FileSystem; replace Status-based
  exists/globList/listFiles calls with spi.FileSystem equivalents
- HiveExternalMetaCache: change to use fsCache.getFileSystem()
- HiveUtil.isSplittable(): change parameter from RemoteFileSystem to spi.FileSystem
  (BrokerFileSystem instanceof check retained as no-op pending broker SPI module)
- FileSystemProviderImpl: use legacy FileSystemFactory.get() directly since
  SwitchingFileSystem is not yet migrated to spi.FileSystem
- HiveAcidTest: use spi.LocalFileSystem instead of LocalDfsFileSystem for AcidUtil calls
- TransactionScopeCachingDirectoryListerTest: update mock to match new interface

### Release note

None

### Check List (For Author)

- Test: Regression test (HiveAcidTest + TransactionScopeCachingDirectoryListerTest
  compile and pass after migration; build succeeds with DskipTests)
- Behavior changed: Internal — FileSystem objects are now loaded from SPI providers
  rather than legacy implementations in fe-core
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…System

### What problem does this PR solve?

Issue Number: N/A

Problem Summary: Replace FileSystemFactory.get() + RemoteFileSystem.deleteDirectory()
with FileSystemFactory.getFileSystem() + spi.FileSystem.delete(Location, true)
in InsertIntoTVFCommand.deleteExistingFiles().

### Release note

None

### Check List (For Author)

- Test: Build succeeds with -DskipTests
- Behavior changed: No — same delete-directory semantics via SPI
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…Type in RepositoryMgr

### What problem does this PR solve?

Issue Number: N/A

Problem Summary: Remove direct dependency on concrete S3FileSystem/AzureFileSystem
implementation classes in RepositoryMgr. Use FsStorageType enum from FileSystemDescriptor
instead of instanceof checks, so RepositoryMgr no longer needs compile-time references to
legacy filesystem implementation classes.

Changes:
- Repository: add getFileSystemDescriptor() public accessor
- RepositoryMgr: replace instanceof S3FileSystem/AzureFileSystem check with
  getFileSystemDescriptor().getStorageType() == FsStorageType.S3/AZURE;
  remove imports of S3FileSystem and AzureFileSystem

### Release note

None

### Check List (For Author)

- Test: Build succeeds with -DskipTests
- Behavior changed: No — S3FileSystem and AzureFileSystem always had storageType S3 and AZURE respectively
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…4.3)

### What problem does this PR solve?

Issue Number: N/A

Problem Summary: Cloud callers (StageUtil, CopyLoadPendingTask, CleanCopyJobTask,
ObjectInfoAdapter, CopyIntoAction, CreateStageCommand) were using the legacy
ObjFileSystem/RemoteFileSystem API. This commit migrates them to the new
org.apache.doris.filesystem.spi.FileSystem SPI, completing Phase 4.3 of the
filesystem SPI migration.

### Changes

**fe-filesystem-spi:**
- New StsCredentials class replacing Triple<String,String,String> for STS token results
- ObjStorage: added 5 default cloud-specific methods (getStsToken, listObjectsWithPrefix,
  headObjectWithMeta, getPresignedUrl, deleteObjectsByKeys) with UnsupportedOperationException
  defaults
- ObjFileSystem: added 5 delegate methods forwarding to the underlying ObjStorage

**fe-filesystem-s3:**
- S3ObjStorage: added PROP_BUCKET/PROP_ROLE_ARN/PROP_EXTERNAL_ID constants, bucket field,
  and full implementations of all 5 cloud-specific methods ported from legacy S3ObjStorage

**fe-core:**
- StoragePropertiesConverter: added AWS_BUCKET and STS key (AWS_ROLE_ARN/AWS_EXTERNAL_ID)
  pass-through for AbstractS3CompatibleProperties
- 6 cloud callers migrated from FileSystemFactory.get() to FileSystemFactory.getFileSystem():
  CleanCopyJobTask, CopyLoadPendingTask, StageUtil, ObjectInfoAdapter, CopyIntoAction,
  CreateStageCommand
- CloudInternalCatalog.filterCopyFiles: updated parameter from ObjectFile to RemoteObject
- CopyLoadPendingTaskTest: updated mocks and types to use new SPI types

### Release note

None

### Check List (For Author)

- Test: Build passes; CopyLoadPendingTaskTest updated with new SPI types
    - Build verified with ./build.sh --fe
- Behavior changed: No (same runtime behavior, different abstraction layer)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?

Issue Number: N/A

Problem Summary:
Iceberg's DelegateFileIO and its 3 companion classes (DelegateInputFile,
DelegateOutputFile, DelegateSeekableInputStream) were using the legacy
LegacyFileSystemApi / fs.io.* APIs. This commit migrates them to the new
org.apache.doris.filesystem.spi.FileSystem SPI, completing phase P4.4 of
the filesystem SPI migration.

Changes:
- SPI layer: add DorisInputStream (seekable InputStream abstraction),
  extend DorisInputFile with exists()/lastModifiedTime() defaults and
  change newStream() return type to DorisInputStream; add
  newInputFile(Location, long) default to FileSystem
- HDFS: add HdfsSeekableInputStream wrapping FSDataInputStream; update
  HdfsInputFile to return it
- S3: add openInputStreamAt()/headObjectLastModified() to S3ObjStorage;
  update S3InputFile and add S3SeekableInputStream (lazy range-based seek)
- Azure: add openInputStreamAt()/headObjectLastModified() to
  AzureObjStorage using BlobInputStreamOptions+BlobRange; update
  AzureInputFile and add AzureSeekableInputStream
- Local: add LocalSeekableInputStream using RandomAccessFile; update
  anonymous DorisInputFile in LocalFileSystem
- MemoryFileSystem: fix Phase 2 placeholder in newStream(); add
  MemorySeekableInputStream extending fs.io.DorisInputStream
- DelegateSeekableInputStream: migrate to spi.DorisInputStream, use
  getPos() instead of getPosition()
- DelegateInputFile: migrate to spi.DorisInputFile, use location()
  instead of deprecated path()
- DelegateOutputFile: migrate constructor from
  (LegacyFileSystemApi, ParsedPath) to (spi.FileSystem, spi.Location)
- DelegateFileIO: migrate field/initialize() to use spi.FileSystem via
  FileSystemFactory.getFileSystem(); update all methods to use
  spi.Location; simplify deleteFiles() to iterate individually

### Release note

None

### Check List (For Author)

- Test: Build verification (./build.sh --fe) passes
    - Manual test: N/A (DelegateFileIO is not yet enabled in production)
- Behavior changed: No (DelegateFileIO enabling line remains commented out)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…criptor

### What problem does this PR solve?

Issue Number: N/A

Problem Summary:
BackupJob, RestoreJob and BackupHandler were calling
repo.getRemoteFileSystem().getStorageProperties().getBackendConfigProperties()
and repo.getRemoteFileSystem().getThriftStorageType() to populate BE
snapshot/upload/download task RPCs. These calls required a live
PersistentFileSystem object even though no actual I/O was performed —
only metadata extraction.

This commit adds getThriftStorageType() and getBackendConfigProperties()
to FileSystemDescriptor (the lightweight POJO already used for
serialization), then replaces all metadata-only getRemoteFileSystem()
calls with getFileSystemDescriptor() equivalents:

- FileSystemDescriptor: add getThriftStorageType() via FsStorageTypeAdapter,
  add getBackendConfigProperties() via StorageProperties.createPrimary()
- BackupJob (2 sites): updateBrokerProperties + UploadTask construction
- RestoreJob (2 sites): updateBrokerProperties + createDownloadTask
- BackupHandler (1 site): mergeProperties() property map extraction

Repository.getRemoteFileSystem() and its I/O usages (globList, upload,
download, etc.) are intentionally left unchanged — they require a live
connection and are a separate concern.

### Release note

None

### Check List (For Author)

- Test: Build verification (./build.sh --fe) passes
    - No behavior change: same values produced via different code path
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ithLimit)

### What problem does this PR solve?

Issue Number: N/A

Problem Summary: S3SourceOffsetProvider used legacy RemoteFileSystem.globListWithLimit
which depends on fe-core's S3URI and GlobListResult. Migrates to the new SPI layer to
eliminate the dependency on legacy filesystem APIs.

Changes:
- Add spi.GlobListing: SPI replacement for GlobListResult carrying List<FileEntry>,
  bucket, prefix, and maxFile fields
- Add FileSystem.globListWithLimit() default method (throws UnsupportedOperationException)
  with full javadoc; S3FileSystem overrides it
- Implement S3FileSystem.globListWithLimit(): parses S3 URI, uses PathMatcher for glob
  filtering, paginates via ListObjectsV2 with optional startAfter and size/count limits
- S3SourceOffsetProvider: use FileSystemFactory.getFileSystem() returning spi.FileSystem,
  replace GlobListResult/RemoteFile with GlobListing/FileEntry, replace Status-based error
  handling with IOException; debug point now throws IOException directly

### Release note

None

### Check List (For Author)

- Test: Build succeeded; logic is structurally identical to the legacy globListInternal
- Behavior changed: No (same S3 listing logic, same offset semantics)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
….FileSystem (P4.7)

### What problem does this PR solve?

Issue Number: N/A

Problem Summary:
HMSTransaction and its surrounding infrastructure (HMSExternalCatalog,
HiveTransactionManager, TransactionManagerFactory, FileSystemUtil) were
using the legacy LegacyFileSystemApi / SwitchingFileSystem / MultipartUploadCapable
chain. This commit migrates the entire Hive transaction write path to the
new spi.FileSystem abstraction.

Key changes:
- Add SpiSwitchingFileSystem: new spi.FileSystem implementation that
  routes per-path to the appropriate FileSystem via LocationPath +
  FileSystemFactory; caches resolved filesystems; supports a test
  constructor for injecting a delegate
- spi.FileSystem: add listFiles(), listFilesRecursive(), listDirectories(),
  renameDirectory() default methods
- spi.ObjFileSystem: add completeMultipartUpload(path, uploadId, Map)
  convenience overload
- HMSTransaction: remove LegacyFileSystemApi/SwitchingFileSystem/
  MultipartUploadCapable; field fs is now spi.FileSystem (SpiSwitchingFileSystem).
  MPU abort uses ObjFileSystem.getObjStorage().abortMultipartUpload().
  MPU commit uses ObjFileSystem.completeMultipartUpload(). Wrapper methods
  now throw IOException / RuntimeException instead of returning Status.
- FileSystemUtil: asyncRenameFiles/asyncRenameDir now accept spi.FileSystem
- HMSExternalCatalog: replaces FileSystemProviderImpl with SpiSwitchingFileSystem
- HiveTransactionManager / TransactionManagerFactory: parameter type changed
  from LegacyFileSystemProviderFactory to SpiSwitchingFileSystem
- Tests: HmsCommitTest and HMSTransactionPathTest migrated to spi.FileSystem;
  FakeFileSystem now implements spi.FileSystem; LocalDfsFileSystem replaced
  with LocalFileSystem (SPI)

### Release note

None

### Check List (For Author)

- Test: Unit Test — HmsCommitTest and HMSTransactionPathTest updated and compile
- Behavior changed: No (same semantics, different API layer)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rokerUtil

### What problem does this PR solve?

Issue Number: N/A

Problem Summary: As part of the filesystem SPI migration (Phase 5), this
commit implements the `fe-filesystem-broker` Maven module — a zero-fe-core-
dependency plugin that provides broker-based filesystem access via the unified
`FileSystem` SPI. Two legacy `FileSystemFactory.get()` + `RemoteFileSystem`
calls in `BrokerUtil` are migrated to use the new SPI.

### Release note

None

### Check List (For Author)

- Test: Manual build verification (FE build passes cleanly)
- Behavior changed: No (same broker Thrift RPC calls, new abstraction layer)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?

Issue Number: N/A

Problem Summary: Migrates all I/O operations in Repository.java (initRepository,
ping, listSnapshots, upload, download, getSnapshotInfo) from the legacy
Status-based PersistentFileSystem API to the new IOException-based SPI
FileSystem API. This is Phase 4.6-IO of the filesystem SPI migration.

Key design decisions:
- Non-broker repos: spiFs initialized once in constructor/gsonPostProcess via
  FileSystemFactory.getFileSystem(StorageProperties)
- Broker repos: spiFs=null; acquireSpiFs() resolves a live broker endpoint per
  I/O call via BrokerMgr (matching the lazy-resolution behavior of legacy
  BrokerFileSystem)
- Legacy PersistentFileSystem field is retained for metadata-only methods
  (getLocation, getInfo, getCreateStatement, getBrokerAddress) — to be removed
  in a follow-up cleanup pass

### Release note

None

### Check List (For Author)

- Test: Build verification (FE build passes cleanly)
- Behavior changed: No (same operations, new abstraction layer)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ath (P4.6-meta)

### What problem does this PR solve?

Issue Number: N/A

Problem Summary: Repository.java previously held a transient
PersistentFileSystem field used for both I/O and metadata operations.
The I/O path was migrated to the SPI FileSystem in the previous commit.
This commit completes the metadata migration, eliminating the last
dependency on PersistentFileSystem from the live (non-persistence) path.

Changes:
- FileSystemDescriptor: add fromStorageProperties() factory method that
  maps StorageProperties → FileSystemDescriptor without going through
  a legacy PersistentFileSystem instance.
- Repository: remove transient fileSystem/getFileSystem()/
  getRemoteFileSystem() fields/methods; constructor now accepts
  StorageProperties directly; getLocation(), getBrokerAddress(),
  getInfo(), getCreateStatement() all use fileSystemDescriptor;
  gsonPostProcess() only initializes spiFs (no legacy fileSystem).
- BackupHandler: createRepository() and alterRepository() pass
  StorageProperties directly to Repository constructor; remove
  FileSystemFactory.get() (legacy API) and RemoteFileSystem imports.
- CloudRestoreJob: replace repo.getRemoteFileSystem().* calls with
  repo.getFileSystemDescriptor().* equivalents.

### Release note

None

### Check List (For Author)

- Test: Manual build verification (sh build.sh --fe)
- Behavior changed: No (same semantics, different internal path)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… FileSystem API

Issue Number: N/A

Problem Summary: After the Repository class was refactored to accept
StorageProperties instead of RemoteFileSystem, and getRemoteFileSystem()
was removed, 6 test files failed to compile (18 errors total). This
commit fixes all compilation errors by updating the tests to use the
new SPI-based API.

Changes:
- Repository.java: Fix listSnapshots() logic bug — old code called
  fs.listFiles() (files only) then filtered for isDirectory(), always
  yielding empty results. Rewritten to use fs.list() + FileIterator.
- RepositoryTest.java: Full rewrite of mocks from @mocked RemoteFileSystem
  to @mocked spi.FileSystem + MockUp<FileSystemFactory>; use BrokerProperties
  as the Repository constructor arg so getLocation() returns raw strings.
- BackupJobTest.java, RestoreJobTest.java, CloudRestoreJobTest.java:
  Replace FileSystemFactory.get(BrokerProperties.of(...)) with
  BrokerProperties.of(...) directly (constructor now takes StorageProperties).
- CreateRepositoryCommandTest.java: Replace getRemoteFileSystem().getProperties()
  with getFileSystemDescriptor().getProperties().
- S3FileSystemTest.java: Pass StorageProperties.createPrimary(properties)
  instead of an S3FileSystem instance to the Repository constructor.

None

- Test: Regression test / Unit Test / Manual test / No need to test (with reason)
    - Build verified: mvn test-compile passes with no errors
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Issue Number: N/A

Problem Summary: The META-INF/services/org.apache.doris.filesystem.spi.FileSystemProvider
files in all 8 filesystem modules were missing the required Apache
Software Foundation license header, causing license check failures.

None

- Test: No need to test (license header addition only)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…build.sh

### What problem does this PR solve?

Issue Number: N/A

Problem Summary: Three issues in the fe-filesystem plugin modules:

1. S3ObjStorage.getProperties() body was missing (accidental deletion),
   causing a compilation error.
2. S3FileSystem.java had a misindented block comment causing checkstyle failure.
3. fe-filesystem-broker files had import ordering violations (SAME_PACKAGE /
   THIRD_PARTY groups not separated by blank lines, and THIRD_PARTY imports
   appearing before SAME_PACKAGE imports in BrokerClientFactory/Pool).
4. BrokerClientFactory used a 4-arg TSocket constructor not present in
   thrift 0.16.0; replaced with the 3-arg (host, port, timeout) form.
5. BrokerClientPool used generic GenericKeyedObjectPoolConfig<T> not available
   in commons-pool2 2.2; replaced with raw type.
6. build.sh --fe did not compile the fe-filesystem plugin modules at all;
   added them to FE_MODULES so they are built alongside fe-core.

### Release note

None

### Check List (For Author)

- Test: No need to test (build fix only; all filesystem modules compile cleanly)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…plugins

### What problem does this PR solve?

Issue Number: N/A

Problem Summary: The dependency:copy-dependencies step for filesystem
plugins was running from DORIS_HOME with 'fe/fe-filesystem/...' paths,
but there is no root-level pom.xml that includes these modules, causing
'Could not find the selected project in the reactor' errors. Fixed by
running the command from DORIS_HOME/fe with 'fe-filesystem/...' paths.
Also added the broker module to the copy loop (it was missing) and
removed the now-unused FS_SPI_MODULE variable.

### Release note

None

### Check List (For Author)

- Test: sh build.sh --fe passes with BUILD SUCCESS and no errors
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tput copy

### What problem does this PR solve?

Issue Number: N/A

Problem Summary: When Maven Build Cache restores a cache hit, it only restores
the packaged JAR (doris-fe.jar) and skips re-running the
maven-dependency-plugin copy-dependencies goal that populates target/lib/.
This caused 'cp: cannot stat fe-core/target/lib/*: No such file or directory'
when build.sh tried to copy dependency JARs into output/fe/lib/.

Fix: detect missing target/lib/ after the Maven build phase and explicitly
run 'mvn dependency:copy-dependencies -pl fe-core' before the cp step.

### Release note

None

### Check List (For Author)

- Test: Manual test — ran 'sh build.sh --fe' successfully with BUILD SUCCESS
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…SystemAdapter

### What problem does this PR solve?

Issue Number: N/A

Problem Summary: Phase A of the P4.8 legacy filesystem class deletion. Both
LegacyToNewFsAdapter and LegacyFileSystemAdapter have no external callers
(confirmed by grep). LegacyToNewFsAdapter was a concrete adapter wrapping
LegacyFileSystemApi; LegacyFileSystemAdapter was the abstract Status→IOException
bridge. Both are now dead code after the P4.1–P4.7 caller migrations.

Also removes the stale @see LegacyFileSystemAdapter javadoc tag from FileSystem.java.

### Release note

None

### Check List (For Author)

- Test: No need to test (deleting dead code with no callers)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… SPI FileEntry

### What problem does this PR solve?

Issue Number: N/A

Problem Summary:
The DirectoryLister interface and its implementations returned RemoteFile
(a Hadoop-dependent legacy class), forcing callers like HiveExternalMetaCache
and AcidUtil to depend on Hadoop Path objects. This phase eliminates that
dependency by:

1. Adding modificationTime to SPI FileEntry and RemoteObject so no metadata
   is lost during migration.
2. All SPI providers (HDFS, S3, Azure, Broker, Local) now populate
   modificationTime from their underlying SDKs.
3. DirectoryLister interface now returns RemoteIterator<FileEntry> (SPI).
4. FileSystemDirectoryLister, SimpleRemoteIterator, and
   TransactionScopeCachingDirectoryLister all updated accordingly.
5. RemoteFileRemoteIterator and RemoteFiles deleted (no remaining callers).
6. HiveExternalMetaCache.addFile() now accepts FileEntry; converts
   List<BlockInfo> to BlockLocation[] for HiveFileStatus.
7. AcidUtil removes toRemoteFiles() bridge; works with FileEntry directly.
8. PathVisibleTest updated to use String-based isFileVisible() signature.

### Release note

None

### Check List (For Author)

- Test: Regression test / Unit Test / Manual test / No need to test (with reason)
    - Build verification (FE build succeeds)
    - PathVisibleTest, TransactionScopeCachingDirectoryListerTest,
      RepositoryTest, CopyLoadPendingTaskTest updated and compile clean

- Behavior changed: No (modificationTime was 0L before; now populated from SDK)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?

Issue Number: N/A

Problem Summary:
The MultipartUploadCapable interface (legacy fe-core) was the only
multipart upload contract for S3FileSystem and AzureFileSystem. All
callers have already migrated to the SPI ObjFileSystem (which has its
own completeMultipartUpload() method and ObjStorage.initiateMultipartUpload/
uploadPart/abortMultipartUpload). The interface is now dead code:
- HMSTransaction already uses SPI ObjFileSystem directly
- No code casts to MultipartUploadCapable

Delete the interface and remove its implements clause from S3FileSystem
and AzureFileSystem. The legacy completeMultipartUpload(bucket, key,
uploadId, parts) methods remain in those classes pending Phase G deletion.

### Release note

None

### Check List (For Author)

- Test: No need to test (dead interface removal, FE build verified)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…gacy HDFS IO wrappers

### What problem does this PR solve?

Issue Number: N/A

Problem Summary:
- HdfsStorageVault.checkConnectivity() was instantiating legacy DFSFileSystem
  directly for makeDir/exists/delete operations using Status-based API
- Five legacy HDFS IO wrapper classes (HdfsInputFile, HdfsOutputFile,
  HdfsInputStream, HdfsOutputStream, HdfsInput) only had callers in the legacy
  DFSFileSystem.newInputFile()/newOutputFile() overrides

Changes:
- HdfsStorageVault: replace new DFSFileSystem(...) with
  FileSystemFactory.getFileSystem(StorageProperties) and use SPI
  mkdirs/exists/delete(Location) methods; IOException wrapped as DdlException
- DFSFileSystem: remove newInputFile()/newOutputFile() overrides (falls back
  to LegacyFileSystemApi default UnsupportedOperationException); remove unused
  imports (Location, HdfsInputFile, HdfsOutputFile, DorisInputFile,
  DorisOutputFile, ParsedPath)
- Delete: HdfsInputFile, HdfsOutputFile, HdfsInputStream, HdfsOutputStream,
  HdfsInput (all dead code after removing DFSFileSystem overrides)

Note: ExternalCatalog and HMSExternalCatalog still reference
DFSFileSystem.PROP_ALLOW_FALLBACK_TO_SIMPLE_AUTH and DFSFileSystem.getHdfsConf()
as static-only usage; full migration deferred to Phase G when legacy DFSFileSystem
is deleted (blocked by OSSHdfsFileSystem, JFSFileSystem, OFSFileSystem subclasses).

### Release note

None

### Check List (For Author)

- Test: No need to test (HdfsStorageVault.checkConnectivity is integration-only,
  no existing unit test; behavior identical — SPI HDFS provider uses same Hadoop
  FileSystem under the hood)
- Behavior changed: No (same HDFS operations, same error handling)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
morningman and others added 13 commits April 1, 2026 16:00
… provider infrastructure

### What problem does this PR solve?

Issue Number: N/A

Problem Summary:
FileSystemProviderImpl, LegacyFileSystemProviderFactory, SwitchingFileSystem,
and FileSystemLookup had zero callers outside their own files. The entire
legacy provider chain was dead code: no production or test code referenced
FileSystemProviderImpl or LegacyFileSystemProviderFactory after previous
phases migrated all callers to the SPI FileSystemFactory.

Changes:
- Delete FileSystemProviderImpl (no callers; only created SwitchingFileSystem)
- Delete LegacyFileSystemProviderFactory (interface with no callers)
- Delete SwitchingFileSystem (only instantiated by deleted FileSystemProviderImpl)
- Delete FileSystemLookup (FunctionalInterface only used by deleted SwitchingFileSystem)
- FileSystem.java: replace @see LegacyFileSystemApi javadoc with reference to SPI

Note: LegacyFileSystemApi is NOT deleted here — it is still implemented by
PersistentFileSystem and LocalDfsFileSystem which are Phase G deletion scope.

### Release note

None

### Check List (For Author)

- Test: No need to test (deleting unreachable dead code; build verifies no callers)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…em, AzureFileSystem and StorageTypeMapper

### What problem does this PR solve?

Issue Number: N/A

Problem Summary:
- StorageTypeMapper was only called from FileSystemFactory.get() (deprecated),
  which had zero production callers — only dead @Disabled/@ignore tests
- BrokerFileSystem, S3FileSystem, AzureFileSystem (legacy fe-core versions) were
  only instantiated through StorageTypeMapper; all production code already routes
  through FileSystemFactory.getFileSystem() → SPI providers
- HiveUtil.isSplittable() had a dead instanceof BrokerFileSystem branch: the fs
  parameter is always an SPI FileSystem from FileSystemCache, never the legacy
  BrokerFileSystem, so the broker-specific Thrift RPC path was unreachable

Changes:
- Delete StorageTypeMapper.java (legacy enum factory, no callers)
- Delete BrokerFileSystem.java, S3FileSystem.java, AzureFileSystem.java (legacy)
- Delete FileSystemFactory.get(StorageProperties) and get(FileSystemType, Map)
  deprecated methods (no production callers; StorageTypeMapper is gone)
- HiveUtil.isSplittable(): remove dead instanceof BrokerFileSystem branch
- IcebergHadoopCatalogTest: instantiate DFSFileSystem directly (was using deleted
  FileSystemFactory.get(); test is @ignore)
- PaimonDlfRestCatalogTest: remove readByDorisS3FileSystem() helper (used deleted
  S3FileSystem; test is @disabled)
- Delete BrokerStorageTest, S3FileSystemTest (tested deleted legacy classes)

Note: GsonUtils already handles missing legacy classes via reflection + try/catch
ClassNotFoundException; the BrokerFileSystem/S3FileSystem/AzureFileSystem entries
in the RuntimeTypeAdapterFactory will gracefully skip at startup.

### Release note

None

### Check List (For Author)

- Test: No need to test (deleting dead code with zero live callers; build
  verifies no remaining references)
- Behavior changed: No (SPI path already handled all cases; broker RPC branch
  in isSplittable was unreachable)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…d legacy ObjFileSystem

### What problem does this PR solve?

Issue Number: N/A

Problem Summary: As part of the P4.8 legacy class deletion series, removes dead
code that has no production callers:
- OSSHdfsFileSystem, JFSFileSystem, OFSFileSystem: subclasses of DFSFileSystem
  with zero production instantiation (StorageTypeMapper which mapped them was
  deleted in P4.8-F)
- org.apache.doris.fs.remote.ObjFileSystem: legacy abstract class whose only
  subclasses (S3FileSystem, AzureFileSystem) were deleted in P4.8-F

Also removes their entries from GsonUtils reflection array (ClassNotFoundException
was already handled gracefully, but the entries serve no purpose).

Fixes StageUtilTest to import org.apache.doris.filesystem.spi.ObjFileSystem
(the SPI interface used by production StageUtil) instead of the now-deleted
legacy org.apache.doris.fs.remote.ObjFileSystem.

### Release note

None

### Check List (For Author)

- Test: Regression test / Unit Test / Manual test / No need to test (with reason)
    - FE build passes
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rence infrastructure

### What problem does this PR solve?

Issue Number: N/A

Problem Summary: Delete DFSFileSystem (the legacy HDFS wrapper class) and its
supporting classes now that all callers have been migrated:
- DFSFileSystem.java: deleted after migrating ExternalCatalog and HMSExternalCatalog
  away from its static members (PROP_ALLOW_FALLBACK_TO_SIMPLE_AUTH constant and
  getHdfsConf() method). These are now inlined directly in the callers using
  HdfsConfiguration and the literal string 'ipc.client.fallback-to-simple-auth-allowed'.
- DFSFileSystemPhantomReference.java: helper class for phantom reference tracking,
  only used within the dfs package
- RemoteFSPhantomManager.java: background cleanup thread for Hadoop FileSystem
  objects, only called from DFSFileSystem.nativeFileSystem()
- IcebergHadoopCatalogTest.java: @ignore test with no assertions, purely manual
  exploration code using DFSFileSystem.nativeFileSystem()

Also removes DFSFileSystem from the GsonUtils reflection array.

### Release note

None

### Check List (For Author)

- Test: No need to test (deleted classes have no production callers; ExternalCatalog
    and HMSExternalCatalog behavior is unchanged — same Hadoop config semantics)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?

Issue Number: N/A

Problem Summary: Completes the P4.8 legacy class deletion series by removing the
entire legacy filesystem class hierarchy now that DFSFileSystem is gone:

G.4 - RemoteFileSystem: abstract class extending PersistentFileSystem; no remaining
  concrete subclasses. Deletes RemoteFileSystemTest (only tested this abstract class).

G.5 - PersistentFileSystem: base abstract class of the legacy hierarchy; callers
  (Repository.legacyFileSystem) migrated first. LegacyFileSystemApi: interface
  implemented only by PersistentFileSystem and LocalDfsFileSystem. LocalDfsFileSystem:
  only used in HiveAcidTest to create test fixtures; replaced with java.nio.file.Files
  helper method. GsonUtils: removes buildLegacyFileSystemAdapterFactory() and its
  RuntimeTypeAdapterFactory registration.

G.6 - GlobListResult: was used only internally within S3ObjStorage (and previously
  by LegacyFileSystemApi default method); moved to a private static inner class of
  S3ObjStorage.

H.1 - Repository.legacyFileSystem: removes the @deprecated backward-compat field
  and the deserialization branch that used it. Old serialized metadata with a 'rfs'
  field will now silently skip the legacy path (fileSystemDescriptor will be null,
  and the method returns early). New metadata uses 'fs_descriptor' exclusively.

H.2 - FileSystemDescriptor.fromPersistentFileSystem(): removes the migration helper
  method (no longer called by anyone after H.1). Also cleans up the javadoc that
  referenced deleted classes.

### Release note

None

### Check List (For Author)

- Test: No need to test (all deleted classes are dead code with zero production callers;
    HiveAcidTest behavior unchanged — same test fixture files created via NIO)
- Behavior changed: No (Repository deserialization: clusters running this code have
    already written 'fs_descriptor' format; the 'rfs' legacy field path was only needed
    for migration from very old metadata)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?

Issue Number: N/A

Problem Summary:
1. The fallback dependency:copy-dependencies for fe-core ran without
   -DoutputDirectory, so Maven used the default target/dependency/ instead
   of target/lib/. This caused 'cp: cannot stat target/lib/*' when a
   Maven Build Cache hit skipped the copy-dependencies lifecycle phase.
   Fix: add -DoutputDirectory and -DincludeScope=runtime to match the
   pom.xml execution configuration.

2. All fe-filesystem impl modules lacked <finalName>, producing
   fe-filesystem-s3-1.2-SNAPSHOT.jar while build.sh expected
   doris-fe-filesystem-s3*.jar. Fix: add <finalName>doris-fe-filesystem-{name}</finalName>
   to each module's pom.xml, consistent with fe-filesystem-spi.

3. The filesystem plugin deployment loop used bare mvn instead of
   ${MVN_CMD}. Fix: use ${MVN_CMD} for consistency.

4. The main JAR glob pattern in the deployment loop had a spurious '-'
   before '*' (doris-fe-filesystem-s3-*.jar) that couldn't match the
   versionless finalName output. Fix: remove the '-'.

### Release note

None

### Check List (For Author)

- Test: Manual test - ran sh build.sh --fe, verified all 8 filesystem
  plugins (s3/azure/oss/cos/obs/hdfs/local/broker) deploy correctly to
  output/fe/plugins/filesystem/ with main JAR + transitive deps
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Deploy all JARs (main + transitive deps) flat into
output/fe/plugins/filesystem/{module}/ instead of using a lib/
subdirectory, matching the expected plugin loading convention.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ParsedPath

### What problem does this PR solve?

Problem Summary: Remove completely dead code from fe-core fs/ package.
- fs/operations/ (6 files): Broker/HDFS file operations with zero external callers.
- fs/spi/FileSystemSpiProvider.java: Superseded by fe-filesystem-spi module, zero callers.
- fs/io/ParsedPath.java: @deprecated with zero callers; also remove the @deprecated path()
  default methods from DorisInputFile and DorisOutputFile that referenced it.

### Release note

None

### Check List (For Author)

- Test: FE unit tests (fs.FileSystemTransferUtilTest, fs.MemoryFileSystemTest,
        fs.TransactionScopeCachingDirectoryListerTest, fs.SchemaTypeMapperTest)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…s interface layer

### What problem does this PR solve?

Problem Summary: Continue fe-core fs/ cleanup (Phase 5.2).
- Add isFile()/name() convenience methods to spi.FileEntry
- Add deleteFiles() default method to spi.FileSystem
- Rewrite MemoryFileSystem to implement filesystem.spi.FileSystem instead of the
  old fs.FileSystem; override listFilesRecursive() and listDirectories() to handle
  the implicit directory model correctly
- Update MemoryFileSystemTest to use SPI types and renamed API methods
- Delete 6 dead files: fs/FileSystem.java, fs/FileIterator.java,
  fs/io/DorisInputFile.java, fs/io/DorisInput.java, fs/io/DorisInputStream.java,
  fs/io/DorisOutputFile.java

### Release note

None

### Check List (For Author)

- Test: FE unit tests (MemoryFileSystemTest, FileSystemTransferUtilTest,
        TransactionScopeCachingDirectoryListerTest, SchemaTypeMapperTest) — all pass
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?

Issue Number: N/A

Problem Summary: Phase 5.3 of the fs-core cleanup. Removes the legacy
fe-core object-storage layer (fs/obj/, fs/remote/RemoteFile, fs/FileEntry,
fs/Location) and migrates its three remaining production callers to the
modern SPI interfaces already used by the rest of the codebase.

### Release note

None

### Check List (For Author)

- Test: Regression test / Unit Test / Manual test / No need to test
    - ./build.sh --fe BUILD SUCCESS
    - MemoryFileSystemTest 27 tests passed
    - Zero remaining imports from fs.obj.*, fs.remote.*, fs.FileEntry, fs.Location
- Behavior changed: No (ping connectivity test logic preserved; multipartUpload
  now uses 3-step SPI API instead of single-method wrapper)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…dule

### What problem does this PR solve?

Issue Number: N/A

Problem Summary: FileSystemTransferUtil had zero fe-core dependencies (used
only SPI types), but lived in org.apache.doris.fs inside fe-core. Moving it
to fe-filesystem-spi keeps all pure-SPI utilities in the SPI module and
reduces fe-core's responsibility.

Changes:
- Move FileSystemTransferUtil.java from org.apache.doris.fs (fe-core) to
  org.apache.doris.filesystem.spi (fe-filesystem-spi)
- Remove redundant same-package imports; make globToRegex() public so
  callers outside the package (e.g. test) can access it
- Update AcidUtil.java and FileSystemDirectoryLister.java import paths
- Add explicit import in FileSystemTransferUtilTest.java (same-package
  access no longer applies)

### Release note

None

### Check List (For Author)

- Test: Unit test (FileSystemTransferUtilTest — all 20 methods pass)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…e-filesystem-spi

### What problem does this PR solve?

Issue Number: N/A

Problem Summary: Several classes in org.apache.doris.fs had zero fe-core
dependencies and belonged logically in fe-filesystem-spi. This commit
moves them to keep the SPI module self-contained and reduce fs/ package
scope.

Changes:
- Move RemoteIterator, FileSystemIOException, SimpleRemoteIterator to
  org.apache.doris.filesystem.spi; drop unused ErrCode field from
  FileSystemIOException (no caller ever set it)
- Move FileSystemType to org.apache.doris.filesystem.spi
- Move FileSystemUtil to org.apache.doris.filesystem.spi; replace
  org.apache.hadoop.fs.Path (path concat only) with plain string
  concatenation to eliminate Hadoop dependency from SPI module
- Move MemoryFileSystem from fe-core src/main to src/test (test-only
  class should not ship in production JAR)
- Update all callers: DirectoryLister, FileSystemDirectoryLister,
  SchemaTypeMapper, TransactionScopeCachingDirectoryLister,
  HiveExternalMetaCache, HMSTransaction, LocationPath and their tests
- Remove dead ErrCode.NOT_FOUND check in HiveExternalMetaCache
  (getErrorCode() always returned empty; else-branch always executed)

### Release note

None

### Check List (For Author)

- Test: Unit tests (MemoryFileSystemTest, SchemaTypeMapperTest — all pass)
- Behavior changed: No (ErrCode branch was dead code; FileSystemUtil
  path concatenation is semantically identical to Hadoop Path)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 1, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman morningman closed this Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants