Skip to content

[HUDI-8519] Fix update with multiple secondary indexes#12259

Closed
codope wants to merge 2 commits intoapache:masterfrom
codope:hudi-8519-multi-sec-index
Closed

[HUDI-8519] Fix update with multiple secondary indexes#12259
codope wants to merge 2 commits intoapache:masterfrom
codope:hudi-8519-multi-sec-index

Conversation

@codope
Copy link
Member

@codope codope commented Nov 14, 2024

Change Logs

Multiple secondary indexes (or functional index) exist in different partitions but still we use the same file id prefix. So, thre is a chance of collision in the append handle when two different secondary index have same file id prefix and same shard. This PR fixes the file id prefix in such a case. Added updates to the existing test case which creates multiple secondary index.

Impact

Fix updates with multiple secondary indexes.

Risk level (write none, low medium or high below)

low

only affects sec index and func index.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Nov 14, 2024
LOG.info(msg);
final List<String> fileGroupFileIds = IntStream.range(0, fileGroupCount)
.mapToObj(i -> HoodieTableMetadataUtil.getFileIDForFileGroup(metadataPartition, i))
.mapToObj(i -> HoodieTableMetadataUtil.getFileIDForFileGroup(metadataPartition, i, partitionName))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this impacts both sec index and functional index as well then ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it impacts both indexes.

public static String getFileIDForFileGroup(MetadataPartitionType partitionType, int index) {
public static String getFileIDForFileGroup(MetadataPartitionType partitionType, int index, String partitionName) {
if (MetadataPartitionType.FUNCTIONAL_INDEX.equals(partitionType) || MetadataPartitionType.SECONDARY_INDEX.equals(partitionType)) {
return String.format("%s%04d-%d", partitionName.replaceAll("_", "-").concat("-"), index, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need the replace for functional and sec index ? can you help me understand

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue impacts both functional and secondary index so fix is required for both

// update the secondary key column after creating multiple secondary indexes
spark.sql(s"update $tableName set not_record_key_col = 'xyz' where record_key_col = 'row1'")
// validate the secondary index records themselves
checkAnswer(s"select key, SecondaryIndexMetadata.isDeleted from hudi_metadata('$basePath') where type=7 and key like '%row1'")(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also validate entire contents of sec index and lets validate the partition path meta field in MDT partition as well (which will ensure diff sec index partitions are validated for its content

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

checkAnswer(s"select ts, record_key_col, not_record_key_col, partition_key_col from $tableName where record_key_col = 'row1'")(
Seq(1, "row1", "xyz", "p1")
)
verifyQueryPredicate(hudiOpts, "not_record_key_col", "abc")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also do query predicate on "ts" col (the other sec index)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

@nsivabalan
Copy link
Contributor

can we write tests for functional index as well since the fix is applicable for FI as well

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@codope
Copy link
Member Author

codope commented Nov 15, 2024

Closing in favor of #12263

@codope codope closed this Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants