[HUDI-7776] Simplify HoodieStorage instance fetching#11259
[HUDI-7776] Simplify HoodieStorage instance fetching#11259yihua merged 12 commits intoapache:masterfrom
Conversation
24ea421 to
9cb1f6b
Compare
9cb1f6b to
e02d4a3
Compare
| + "by Hudi. The provided class should implement `org.apache.hudi.io.storage.HoodieIOFactory`."); | ||
|
|
||
|
|
||
| public static final ConfigProperty<String> HOODIE_STORAGE_CLASS = ConfigProperty |
There was a problem hiding this comment.
Do we need a separate config? Can we infer from HOODIE_IO_FACTORY_CLASS? Or maybe we can add infer function.
There was a problem hiding this comment.
I think we can replace HOODIE_IO_FACTORY_CLASS with HOODIE_STORAGE_CLASS since wherever the HoodieIOFactory is instantiated, the HoodieStorage is needed. So we can add a new API getIOFactory() in HoodieStorage to get the HoodieIOFactory instance and only use HOODIE_STORAGE_CLASS if the instantiation through reflection is needed.
There was a problem hiding this comment.
Rethinking about this, the reason we cannot directly use a getter to return the HoodieIOFactory instance from HoodieStorage instance is that HoodieIOFactory and related classes are in hudi-common module because they use Hudi concepts such as HoodieRecord, BloomFilter, etc., while the HoodieStorage is in hudi-io module (hudi-common module depends on hudi-io). For now, the best way is to keep two configs. For Hadoop-based implementation, no configs are required as the default are the HoodieHadoopIOFactory and HoodieHadoopStorage.
There was a problem hiding this comment.
Created HUDI-7789 for future effort of moving HoodieIOFactory to hudi-io module and keeping one config only.
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieStorageConfig.java
Outdated
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/storage/HoodieStorageUtils.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieFileReaderFactory.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java
Show resolved
Hide resolved
hudi-common/src/main/java/org/apache/hudi/metadata/AbstractHoodieTableMetadata.java
Show resolved
Hide resolved
2006412 to
853c2ae
Compare
|
@hudi-bot run azure |
25906d9 to
35f0698
Compare

Change Logs
This PR simplifies
HoodieStorageinstance fetching to pass down theHoodieStorageinstance from the meta client as much as possible, instead of using reflection, which may not work with a given file system instance likeTrinoFileSysteminstance.The major changes in this PR include:
hoodie.storage.classfor instantiation ofHoodieStorageclass in Spark engine only.HoodieIOFactoryto instantiate withHoodieStorageinstance and use the instance for creating readers and writers; makesHoodieStorageto store theStorageConfigurationinstance and adds a new API#newInstancetoHoodieStorage.HoodieStorageUtils.getStorageon the read path inhudi-commonandhudi-iomodules. Before this PR,HoodieStorageinstantiated through reflection with the path and storage configuration throughHoodieStorageUtils.getStoragedoes not work with a provided file system instance likeTrinoFileSystem. With this PR,HoodieStorageinstance is passed down from the meta client as much as possible. For engines like Spark, we still have the reflection code forHoodieHadoopStorageto make it work on the executor side.HoodieStorageis passed down to the places it is needed.Impact
As above.
Risk level
low
Documentation Update
none
Contributor's checklist