Skip to content

[HUDI-5234] streaming read skip clustering#7231

Closed
zhuanshenbsj1 wants to merge 4 commits intoapache:masterfrom
zhuanshenbsj1:streaming-read-skip-clustering
Closed

[HUDI-5234] streaming read skip clustering#7231
zhuanshenbsj1 wants to merge 4 commits intoapache:masterfrom
zhuanshenbsj1:streaming-read-skip-clustering

Conversation

@zhuanshenbsj1
Copy link
Contributor

Change Logs

Related to HUDI-5234

Skip Clustering instants configurable is added to Streaming read , to avoid reading duplicates.

Impact

Describe any public API or user-facing feature change or any performance impact.

Risk level (write none, low medium or high below)

If medium or high, explain what verification was done to mitigate the risks.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@zhuanshenbsj1 zhuanshenbsj1 changed the title streaming read skip clustering [HUDI-5234] streaming read skip clustering Nov 17, 2022
private Stream<HoodieInstant> maySkipOverwriteInstants(Stream<HoodieInstant> instants) {
return instants.filter(instant -> !this.skipCompaction || !instant.getAction().equals(HoodieTimeline.COMPACTION_ACTION))
.filter(instant -> !this.skipClustering|| !instant.getAction().equals(HoodieTimeline.REPLACE_COMMIT_ACTION));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!this.skipClustering|| -> !this.skipClustering ||

@danny0405
Copy link
Contributor

Hi, can you take a look with the errors ?

@zhuanshenbsj1
Copy link
Contributor Author

Hi, can you take a look with the errors ?

Solved. Remove unused imports.

@danny0405
Copy link
Contributor

Thanks for the contribution, I have reviewed and applied a patch:
5234.zip

And the following tests are failing:
ITTestHoodieDataSource#testStreamWriteReadSkippingClustering
TestInputFormat.testReadSkipCompaction

@danny0405 danny0405 self-assigned this Nov 23, 2022
@danny0405 danny0405 added engine:flink Flink integration area:streaming Streaming operations reader-core labels Nov 23, 2022
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405
Copy link
Contributor

Close this one because i already work on this credit with another PR: #7296

@danny0405 danny0405 closed this Nov 24, 2022
@zhuanshenbsj1 zhuanshenbsj1 deleted the streaming-read-skip-clustering branch January 8, 2024 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:streaming Streaming operations engine:flink Flink integration

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants