Skip to content

[HUDI-8842] Support skipping compaction and cluster for spark increme…#12639

Merged
danny0405 merged 1 commit intoapache:masterfrom
cshuo:HUDI-8842
Jan 16, 2025
Merged

[HUDI-8842] Support skipping compaction and cluster for spark increme…#12639
danny0405 merged 1 commit intoapache:masterfrom
cshuo:HUDI-8842

Conversation

@cshuo
Copy link
Collaborator

@cshuo cshuo commented Jan 15, 2025

…ntal reading on mor table

Change Logs

Support skipping compaction and clustering for Spark incremental read with reading configurations.

Impact

Spark struct streaming read can enable clustering/compaction skipping to improve read performance.

Risk level (write none, low medium or high below)

low

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Jan 15, 2025
@cshuo
Copy link
Collaborator Author

cshuo commented Jan 15, 2025

@danny0405 PTAL

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

.metaClient(metaClient)
.startCompletionTime(optParams(DataSourceReadOptions.START_COMMIT.key))
.endCompletionTime(optParams.getOrElse(DataSourceReadOptions.END_COMMIT.key, null))
.skipClustering(optParams.getOrElse(DataSourceReadOptions.INCREMENTAL_READ_SKIP_CLUSTER.key(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change also fix the DeltaStreamer streaming read?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change also fix the DeltaStreamer streaming read?

No, this change only covers spark struct streaming as the title describes.

@danny0405 danny0405 merged commit 6f43ffa into apache:master Jan 16, 2025
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants