Skip to content

fix(io): return correct list_status paths and reuse storage operators#101

Merged
JingsongLi merged 3 commits intoapache:mainfrom
QuakeWang:fix-list-status-path
Mar 3, 2026
Merged

fix(io): return correct list_status paths and reuse storage operators#101
JingsongLi merged 3 commits intoapache:mainfrom
QuakeWang:fix-list-status-path

Conversation

@QuakeWang
Copy link
Contributor

Purpose

Fix FileIO::list_status path semantics: each returned FileStatus.path now points to the actual listed entry instead of the input directory path.
Also, keep reusable operators in Storage to preserve memory backend state across calls.

Brief change log

  • Fix list_status to return real entry paths (base_path + entry.path()), and fetch required metadata via list_with(...).metakey(...).
  • Refactor Storage to reuse Operator instances across calls (including memory backend).
  • Add regression tests for list_status on both fs and memory backends.

Tests

  • cargo fmt --all -- --check
  • cargo clippy --all-targets --workspace -- -D warnings
  • cargo test -p paimon io::file_io::file_action_test
  • cargo test --all-targets --workspace

All tests passed locally.

API and Format

  • No public API signature changes.
  • No storage format changes.
  • Behavior fix only: list_status now returns correct entry paths.

Documentation

No documentation update required.

@QuakeWang
Copy link
Contributor Author

@Aitozi @JingsongLi PTAL 👀

@JingsongLi
Copy link
Contributor

cc @luoyuxia to take a review

# Conflicts:
#	crates/paimon/src/io/file_io.rs
Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QuakeWang Thanks for the pr. Left minor comments. PTAL

size: meta.content_length(),
is_dir: meta.is_dir(),
path: entry.path().to_string(),
path: format!("{base_path}{}", entry.path()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, format!("{base_path}{}", entry.path()) just add leading / to entry.path()?
Considering openal will always call normalize_path to remove leading /, do we real need to add eading / to entry.path()? Could you just keep entry.path()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luoyuxia Thanks for the comment. entry.path() in OpenDAL is relative to operator root, while FileStatus.path here is expected to preserve the original FileIO path prefix (e.g., memory:/ or
file:/) for consistency with get_status and follow-up FileIO operations. normalize_path applies to Operator input paths, not to Entry::path() output.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It sounds reasonable to me to return the full path with scheme prefix.

But it doesn't seem to be necessary now for reusing the storage operator. Maybe you could create a separate PR for reusing the storage operator part in which we can revist it further it.

Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left minor comments.

size: meta.content_length(),
is_dir: meta.is_dir(),
path: entry.path().to_string(),
path: format!("{base_path}{}", entry.path()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It sounds reasonable to me to return the full path with scheme prefix.

But it doesn't seem to be necessary now for reusing the storage operator. Maybe you could create a separate PR for reusing the storage operator part in which we can revist it further it.

@QuakeWang
Copy link
Contributor Author

@luoyuxia Thanks for the suggestions. I narrowed this PR to the list_status path fix only and reverted the storage-operator reuse changes. I also simplified the new test by removing non-essential assertions and reducing duplicated write code. The FS path regression test is kept.

Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QuakeWang Thanks. +1

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 22bb7b7 into apache:main Mar 3, 2026
7 checks passed
@QuakeWang QuakeWang deleted the fix-list-status-path branch March 3, 2026 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants