[WIP][HUDI-3625][RFC-60] OSSStorageStrategy POC#12460
[WIP][HUDI-3625][RFC-60] OSSStorageStrategy POC#12460zhangyue19921010 wants to merge 6 commits intomasterfrom
Conversation
| public static final String FILE_ID_KEY = "hoodie_file_id"; | ||
| public static final String TABLE_BASE_PATH = "hoodie_table_base_path"; | ||
| public static final String TABLE_NAME = "hoodie_table_name"; | ||
| public static final String TABLE_STORAGE_PATH = "hoodie_storage_path"; |
There was a problem hiding this comment.
We should reuse existing config names
There was a problem hiding this comment.
Sure thing, just for poc quick test
| return FSUtils.makeWriteToken(getPartitionId(), getStageId(), getAttemptId()); | ||
| } | ||
|
|
||
| protected StoragePath getPartitionPath(String partitionPath) { |
There was a problem hiding this comment.
nit: we can change the method name to toPhysicalPath
There was a problem hiding this comment.
make sense!
| HoodiePartitionMetadata partitionMetadata = new HoodiePartitionMetadata(storage, instantTime, | ||
| new StoragePath(config.getBasePath()), | ||
| FSUtils.constructAbsolutePath(config.getBasePath(), partitionPath), | ||
| new StoragePath(config.getBasePath()), getPartitionPath(partitionPath), |
There was a problem hiding this comment.
Why do we need to store physical path in partition metadata?
|
|
||
| public StoragePath makeNewPath(String partitionPath) { | ||
| StoragePath path = FSUtils.constructAbsolutePath(config.getBasePath(), partitionPath); | ||
| StoragePath path = getPartitionPath(partitionPath); |
There was a problem hiding this comment.
I assume this change is just for the POC? Ideally the conversion from logical path to physical path should happen within HoodieStorage at L134
| List<Pair<String, StoragePath>> absolutePartitionPathList = partitionSet.stream() | ||
| .map(partition -> Pair.of( | ||
| partition, FSUtils.constructAbsolutePath(metaClient.getBasePath(), partition))) | ||
| partition, storage.getAllLocations(partition, config).stream().findFirst().get())) |
There was a problem hiding this comment.
Just curious, would findFirst() work here even in the production?
| HoodieStorage storage = dataMetaClient.getStorage(); | ||
| String tableName = dataMetaClient.getTableConfig().getTableName(); | ||
| StoragePath dataBasePath = dataMetaClient.getBasePath(); | ||
| long blockSize = storage.getDefaultBlockSize(partitionPath); |
There was a problem hiding this comment.
How would this line leverage storage strategy?
Change Logs
OSS Storage strategy POC
For local testing, write data using Spark and query data with Spark, using UT as an example. Assume /tmp/bucketA/ is the user's S3 bucket. The final data distribution is as follows
UT can pass directly.
The distribution of metadata
The distribution of data
Base Path
Data Path
Impact
no
Risk level (write none, low medium or high below)
low
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist