Skip to content

[HUDI-8001] Insert overwrite failed due to missing 'path' property when using Spark 3.5.1 and Hudi 1.0.0#11646

Merged
danny0405 merged 2 commits intoapache:masterfrom
majian1998:hudi-8001
Jul 20, 2024
Merged

[HUDI-8001] Insert overwrite failed due to missing 'path' property when using Spark 3.5.1 and Hudi 1.0.0#11646
danny0405 merged 2 commits intoapache:masterfrom
majian1998:hudi-8001

Conversation

@majian1998
Copy link
Contributor

The issue with Spark 3.5.1 arises because the InsertIntoHoodieTableCommand chain calls the initialization of the HoodieFileIndex class. For v1 tables, the path is stored in CatalogTable#CatalogStorageFormat#storageProperties, but not in CatalogTable#properties.
image

When Spark reloads the table, it removes the path key from CatalogTable#CatalogStorageFormat#storageProperties.

image
image

Consequently, InsertIntoHoodieTableCommand in Hudi cannot retrieve the path from either CatalogTable#CatalogStorageFormat#storageProperties or CatalogTable#properties during deduceOverwriteConfig. This absence of the path key in combinedOpts leads to an error when initializing HoodieFileIndex.
image

Change Logs

None

Impact

None

Risk level (write none, low medium or high below)

None

Documentation Update

None

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:XS PR with lines of changes in <= 10 label Jul 18, 2024
@majian1998 majian1998 changed the title HUDI-8001 - Insert overwrite failed due to missing 'path' property when using Spark 3.5.1 and Hudi 1.0.0 [HUDI-8001] Insert overwrite failed due to missing 'path' property when using Spark 3.5.1 and Hudi 1.0.0 Jul 18, 2024
@majian1998 majian1998 closed this Jul 18, 2024
@majian1998 majian1998 reopened this Jul 18, 2024
@github-actions github-actions bot added size:S PR with lines of changes in (10, 100] and removed size:XS PR with lines of changes in <= 10 labels Jul 19, 2024
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 merged commit 3349839 into apache:master Jul 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants