[HUDI-7347] Introduce SeekableDataInputStream for random access #10575
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Change Logs
This PR introduces
SeekableDataInputStream
for random access on a file, i.e., seeking to a position in a file and reading the content. The random access on a file is used by the log scanner and reader. Before this PR, the log reader relies onFSDataInputStream
which is coupled with Hadoop file system. To allow Hadoop-independent file systems to be used to read log files, the interfaceSeekableDataInputStream
is introduced to provide random access APIs without relying onFSDataInputStream
.HadoopSeekableDataInputStream
implementsSeekableDataInputStream
withFSDataInputStream
instance and is used to realize the same logic as before (no behavior change).This is part of the effort to provide Hudi storage abstraction and decouple
hudi-common
from hadoop dependencies. For reference, the single big-change PR can be found here: #10360.Impact
Makes random access on files and log reading independent of Hadoop APIs. No behavior change.
Risk level
none
Documentation Update
N/A
Contributor's checklist