feat: Add metadata record_index lookup command to Hudi CLI#17940
feat: Add metadata record_index lookup command to Hudi CLI#17940nsivabalan merged 6 commits intoapache:masterfrom
Conversation
| } | ||
|
|
||
| // @ShellOption(value = "--backup", help = "Backup the metadata table before delete", defaultValue = "true", arity = 1) final boolean backup | ||
| @ShellMethod(key = "metadata lookup-record-index", value = "Print Record index information for a record_key") |
There was a problem hiding this comment.
should we support a list of record keys. comma separated.
Also, we have partitioned RLI support in latest 1.x.
We might need to take in partition path as input in such case.
even if we don't have bandwidth to support non global variant, lets check for RLI index definition and if its non global variant, we should throw exception for now.
For users who are looking to read records from MDT global RLI index, it should succeed.
There was a problem hiding this comment.
Yes, we should support partitioned RLI as well. Added the necessary code for it.
f386786 to
3901452
Compare
| return HoodiePrintHelper.print(header, new HashMap<>(), "", false, Integer.MAX_VALUE, false, rows); | ||
| } | ||
|
|
||
| // @ShellOption(value = "--backup", help = "Backup the metadata table before delete", defaultValue = "true", arity = 1) final boolean backup |
There was a problem hiding this comment.
why commented out. can we remove
| } | ||
|
|
||
| // @ShellOption(value = "--backup", help = "Backup the metadata table before delete", defaultValue = "true", arity = 1) final boolean backup | ||
| @ShellMethod(key = "metadata lookup-record-index", value = "Print Record index information for a record_key") |
There was a problem hiding this comment.
lets fix the documentation to call out, either a record key for global RLI or partition path and record key in case of partitioned RLI
There was a problem hiding this comment.
Yes, fixed the documentation.
hudi-cli/src/main/java/org/apache/hudi/cli/commands/MetadataCommand.java
Show resolved
Hide resolved
| public String getRecordIndexInfo( | ||
| @ShellOption(value = "--record_key", help = "Record key entry whose info will be fetched") | ||
| final String recordKey, | ||
| @ShellOption(value = "--partition_path_for_non_global_rli", help = " Partition path needs to be provided for non Global or partition level Record index", |
There was a problem hiding this comment.
we can just name this --partition_path.
in the documentation/desc of the option, we can call out the purpose.
There was a problem hiding this comment.
I agree, making the change.
| metaReader.readRecordIndexLocationsWithKeys(HoodieListData.eager(Collections.singletonList(recordKey)), dataTablePartition); | ||
| List<Pair<String, HoodieRecordGlobalLocation>> recordLocationKeyPair = recordKeyToGlobalLocationMap.collectAsList(); | ||
| if (recordLocationKeyPair.isEmpty()) { | ||
| return "[INFO] Record key " + recordKey + " not found in Record Index"; |
There was a problem hiding this comment.
we should print the partition path as well incase of partitioned RLI
hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java
Show resolved
Hide resolved
3901452 to
6516a66
Compare
| "[ERROR] Record index partition is not enabled/initialized\n\n"); | ||
|
|
||
| // Check if RLI is partitioned and validate partition_path is provided | ||
| if (config.isRecordLevelIndexEnabled() && !config.isGlobalRecordLevelIndexEnabled()) { |
There was a problem hiding this comment.
sorry. for hudi cli, we may not have all writer properties.
so we cant really rely on config.isGlobalRecordLevelIndexEnabled() or we can't rely on HoodieMetadataConfig only.
thats why I suggested to look into index definition. which the reader will have access to.
There was a problem hiding this comment.
Yes, I am thinking of using indexMetadata to find out if the RLI is partitioned. I think if we can write that information we should be able to test it.
hudi-cli/src/main/java/org/apache/hudi/cli/commands/MetadataCommand.java
Show resolved
Hide resolved
| } | ||
|
|
||
| @Test | ||
| public void testGetRecordIndexInfoForNonGlobalRLI() throws Exception { |
There was a problem hiding this comment.
lets name the method testGetRecordIndexInfoForPartitionedRLI
| assertTrue(metaClient.getTableConfig().isMetadataPartitionAvailable(org.apache.hudi.metadata.MetadataPartitionType.RECORD_INDEX)); | ||
|
|
||
| // Validate entries in the Global RLI. | ||
| validateRecordIndexOutput(recordKey, Option.empty(), newCommitTime, DEFAULT_FIRST_PARTITION_PATH); |
There was a problem hiding this comment.
can we also try looking up a non existant record key.
hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestMetadataCommand.java
Show resolved
Hide resolved
6516a66 to
d25c821
Compare
…e location Summary: Using this command metadata table's record_index entry details like partitionPath, fileId and instantTimestamp can be fetched.
d25c821 to
c197d8a
Compare
Describe the issue this Pull Request addresses
This PR adds a new CLI command to query record index information from the metadata table, allowing users to look up the global file location (partition path, file ID, and instant time) for a given record key.
Summary and Changelog
Users can now use the
metadata lookup-record-indexcommand in Hudi CLI to fetch record index entry details including partitionPath, fileId, and instantTimestamp for a given record key.Changes:
getRecordIndexInfo()method inMetadataCommand.javato implement the new CLI commandtestGetRecordIndexInfo()test case inTestMetadataCommand.javato verify the new functionalityImpact
This is a new CLI command with no impact on existing functionality. It provides users with a convenient way to debug and inspect record index entries through the Hudi CLI.
Risk Level
Low - This is a new read-only CLI command that does not modify any data or change existing behavior. It only adds new functionality for querying metadata table.
Documentation Update
Documentation should be updated to include the new
metadata lookup-record-indexCLI command with usage examples.Contributor's checklist