-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-7336] Introduce new HoodieStorage abstraction #10567
[HUDI-7336] Introduce new HoodieStorage abstraction #10567
Conversation
18e8ccc
to
ca3cb83
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general, I'm wondering do we have referenced system for the HoodieStorage
abstration?
hudi-io/src/main/java/org/apache/hudi/io/storage/HoodieStorage.java
Outdated
Show resolved
Hide resolved
hudi-io/src/main/java/org/apache/hudi/io/storage/HoodieStorage.java
Outdated
Show resolved
Hide resolved
Yes, it's mainly based on hadoop's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice abstraction. LGTM
ea050a0
to
c5ebd50
Compare
This commit introduces `HoodieStorage` abstraction and Hudi's counterpart classes for Hadoop File System classes (`org.apache.hadoop.fs.`[`FileSystem`, `Path`, `PathFilter`, `FileStatus`]) to decouple Hudi's implementation from Hadoop classes, so it's much easier to plugin different file system implementation.
…mmon module (#10591) This commit makes the changes to replace most `FileSystem`, `Path`, and `FileStatus` usage with `HoodieStorage`, `StoragePath` and `StoragePathInfo` (introduced in #10567, renamed in #10672) in `hudi-common` module, to remove dependency on Hadoop FS abstraction which is not essential to most Hudi core read and write logic. This commit still keeps using the Hadoop FileSystem-based implementation under the hood. A follow-up PR will make `HoodieStorage` and I/O implementation pluggable.
…mmon module (#10591) This commit makes the changes to replace most `FileSystem`, `Path`, and `FileStatus` usage with `HoodieStorage`, `StoragePath` and `StoragePathInfo` (introduced in #10567, renamed in #10672) in `hudi-common` module, to remove dependency on Hadoop FS abstraction which is not essential to most Hudi core read and write logic. This commit still keeps using the Hadoop FileSystem-based implementation under the hood. A follow-up PR will make `HoodieStorage` and I/O implementation pluggable.
…mmon module (#10591) This commit makes the changes to replace most `FileSystem`, `Path`, and `FileStatus` usage with `HoodieStorage`, `StoragePath` and `StoragePathInfo` (introduced in #10567, renamed in #10672) in `hudi-common` module, to remove dependency on Hadoop FS abstraction which is not essential to most Hudi core read and write logic. This commit still keeps using the Hadoop FileSystem-based implementation under the hood. A follow-up PR will make `HoodieStorage` and I/O implementation pluggable.
…mmon module (#10591) This commit makes the changes to replace most `FileSystem`, `Path`, and `FileStatus` usage with `HoodieStorage`, `StoragePath` and `StoragePathInfo` (introduced in #10567, renamed in #10672) in `hudi-common` module, to remove dependency on Hadoop FS abstraction which is not essential to most Hudi core read and write logic. This commit still keeps using the Hadoop FileSystem-based implementation under the hood. A follow-up PR will make `HoodieStorage` and I/O implementation pluggable.
Change Logs
This PR introduces
HoodieStorage
abstraction and Hudi's counterpart classes for Hadoop File System classes (org.apache.hadoop.fs.
[FileSystem
,Path
,PathFilter
,FileStatus
]) to decouple Hudi's implementation from Hadoop classes, so it's much easier to plugin different file system implementation. Detailed changes include:HoodieStorage
interface: the counterpart class for Hadoop'sFileSystem
. This provides all I/O APIs on files and directories on storage, such asopen
,read
, etc. This can also contain storage layer optimizations like caching, federated storage layout, hot/cold storage separation, etc. This needs to be implemented based on particular systems.HoodieHadoopStorage
implemenetsHoodieStorage
with Hadoop'sFileSystem
.HoodieLocation
: the counterpart class for Hadoop'sPath
. We migrate and simply path parsing logic in this class.HoodieLocationFilter
interface: the counterpart class for Hadoop'sPathFilter
.HoodieFileStatus
: the counterpart class for Hadoop'sFileStatus
. This keeps the location, length, isDirectory, and modification which are used by Hudi.This is part of the effort to provide Hudi storage abstraction and decouple
hudi-common
from hadoop dependencies. For reference, the single big-change PR can be found here: #10360.Impact
No impact as this PR does not have the integration.
Risk level
none
Documentation Update
N/A
Contributor's checklist