Skip to content

perf(reader): Pass data file size and delete file size to reader to avoid stat() calls#2175

Open
mbutrovich wants to merge 5 commits intoapache:mainfrom
mbutrovich:file_size_passthrough
Open

perf(reader): Pass data file size and delete file size to reader to avoid stat() calls#2175
mbutrovich wants to merge 5 commits intoapache:mainfrom
mbutrovich:file_size_passthrough

Conversation

@mbutrovich
Copy link
Collaborator

@mbutrovich mbutrovich commented Feb 24, 2026

Which issue does this PR close?

Screenshot 2026-02-24 at 2 03 07 PM

What changes are included in this PR?

  • Pass through data file size to FileScanTask. Iceberg Java does this by wrapping a reference to a DataFile in its FileScanTask. In this case we're just cherry-picking some fields until we decide we need more.
  • Remove redundant data file creation code in tests.

Are these changes tested?

Existing tests.

@mbutrovich mbutrovich self-assigned this Feb 24, 2026
@mbutrovich mbutrovich changed the title perf(reader): Optionally pass data file size to reader through FileScanTasks to avoid stat() call perf(reader): Optionally pass data file size to reader through FileScanTasks to avoid stat() call Feb 24, 2026
Copy link
Contributor

@blackmwk blackmwk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mbutrovich for this pr!

@mbutrovich mbutrovich changed the title perf(reader): Optionally pass data file size to reader through FileScanTasks to avoid stat() call perf(reader): Pass data file size to reader through FileScanTasks to avoid stat() call Feb 25, 2026
@mbutrovich mbutrovich force-pushed the file_size_passthrough branch from 9fe0cc2 to f558510 Compare February 25, 2026 14:21
@mbutrovich
Copy link
Collaborator Author

mbutrovich commented Feb 25, 2026

Thanks for the review @blackmwk. Hopefully I addressed your concerns. I went back to the spec, and it requires manifests to have file_size_in_bytes so it's no longer an Option and I think size 0 doesn't make sense as a fallback. I also passed through the delete file size.

@mbutrovich mbutrovich changed the title perf(reader): Pass data file size to reader through FileScanTasks to avoid stat() call perf(reader): Pass data file size and delete file size to reader to avoid stat() call Feb 25, 2026
@mbutrovich mbutrovich requested a review from blackmwk February 25, 2026 14:42
@mbutrovich mbutrovich changed the title perf(reader): Pass data file size and delete file size to reader to avoid stat() call perf(reader): Pass data file size and delete file size to reader to avoid stat() calls Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants