Skip to content

[refactor](fe) Upgrade remote storage FileSystem to SPI (Phase 1)#61841

Draft
morningman wants to merge 1 commit intoapache:masterfrom
morningman:wt-fs-spi
Draft

[refactor](fe) Upgrade remote storage FileSystem to SPI (Phase 1)#61841
morningman wants to merge 1 commit intoapache:masterfrom
morningman:wt-fs-spi

Conversation

@morningman
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Problem Summary:
Phase 1 of the Doris FE remote storage access layer refactoring. This consolidates the fs modules into a modern, unified, SPI-based architecture (inspired by Trino FileSystem).

Key changes:

  1. Introduced Location as a strongly-typed URI value object.
  2. Introduced FileEntry representing zero-Hadoop file metadata.
  3. Introduced FileIterator for lazy iteration to prevent OOM on large directories.
  4. Cleaned up FileSystem interface (IOException-based, Location-typed) and retained the old LegacyFileSystem (Status-based) for backward compatibility during migration.
  5. Added LegacyFileSystemAdapter to bridge old and new interfaces.
  6. Upgraded DorisInputFile and DorisOutputFile with location().
  7. Created MemoryFileSystem for comprehensive unit testing without real storage.
  8. Marked RemoteFile and ParsedPath as @deprecated with bidirectional converters.
  9. Migrated Iceberg's DelegateFileIO to natively use the new FileSystem and Location API.

Release note

None

Check List (For Author)

  • Test:
    • Unit Test (Added comprehensive tests for Location, FileEntry, and MemoryFileSystem. Passed all 39 new tests)
  • Behavior changed: No
  • Does this need documentation: No

### What problem does this PR solve?

Problem Summary:
Phase 1 of the Doris FE remote storage access layer refactoring. This consolidates the `fs` modules into a modern, unified, SPI-based architecture (inspired by Trino FileSystem).

Key changes:
1. Introduced `Location` as a strongly-typed URI value object.
2. Introduced `FileEntry` representing zero-Hadoop file metadata.
3. Introduced `FileIterator` for lazy iteration to prevent OOM on large directories.
4. Cleaned up `FileSystem` interface (IOException-based, Location-typed) and retained the old `LegacyFileSystem` (Status-based) for backward compatibility during migration.
5. Added `LegacyFileSystemAdapter` to bridge old and new interfaces.
6. Upgraded `DorisInputFile` and `DorisOutputFile` with `location()`.
7. Created `MemoryFileSystem` for comprehensive unit testing without real storage.
8. Marked `RemoteFile` and `ParsedPath` as @deprecated with bidirectional converters.
9. Migrated Iceberg's `DelegateFileIO` to natively use the new `FileSystem` and `Location` API.

### Release note

None

### Check List (For Author)

- Test:
    - Unit Test (Added comprehensive tests for Location, FileEntry, and MemoryFileSystem. Passed all 39 new tests)
- Behavior changed: No
- Does this need documentation: No
@morningman morningman requested a review from CalvinKirs as a code owner March 28, 2026 06:18
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Mar 28, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman morningman marked this pull request as draft March 28, 2026 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants