Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-7347] Introduce SeekableDataInputStream for random access #10575

Merged
merged 2 commits into from
Feb 1, 2024

Conversation

yihua
Copy link
Contributor

@yihua yihua commented Jan 27, 2024

Change Logs

This PR introduces SeekableDataInputStream for random access on a file, i.e., seeking to a position in a file and reading the content. The random access on a file is used by the log scanner and reader. Before this PR, the log reader relies on FSDataInputStream which is coupled with Hadoop file system. To allow Hadoop-independent file systems to be used to read log files, the interface SeekableDataInputStream is introduced to provide random access APIs without relying on FSDataInputStream.

HadoopSeekableDataInputStream implements SeekableDataInputStream with FSDataInputStream instance and is used to realize the same logic as before (no behavior change).

This is part of the effort to provide Hudi storage abstraction and decouple hudi-common from hadoop dependencies. For reference, the single big-change PR can be found here: #10360.

Impact

Makes random access on files and log reading independent of Hadoop APIs. No behavior change.

Risk level

none

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@yihua yihua changed the title [HUDI-7347][Stacked on HUDI-7335] Introduce SeekableDataInputStream for random access [HUDI-7347] Introduce SeekableDataInputStream for random access Jan 31, 2024
@danny0405 danny0405 merged commit e23f402 into apache:master Feb 1, 2024
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants