Skip to content

[cdc] Unaware bucket cdc sink should not chain#4203

Merged
leaves12138 merged 2 commits into
apache:masterfrom
JingsongLi:cdc_unaware
Sep 19, 2024
Merged

[cdc] Unaware bucket cdc sink should not chain#4203
leaves12138 merged 2 commits into
apache:masterfrom
JingsongLi:cdc_unaware

Conversation

@JingsongLi
Copy link
Copy Markdown
Contributor

Purpose

rebalance it to make sure schema change work to avoid infinite loop.

Tests

No test, it is hard to reproduce in the unit test.

API and Format

Documentation

@JingsongLi JingsongLi changed the title [WIP][cdc] Unaware bucket cdc sink should not chain [cdc] Unaware bucket cdc sink should not chain Sep 18, 2024
Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@leaves12138 leaves12138 merged commit c55887c into apache:master Sep 19, 2024
JingsongLi pushed a commit that referenced this pull request May 7, 2026
)

Bucket unaware append table [1] is a great choice for streaming events
into Paimon format for batch consumers. These types of streams can be
very high throughput like clickstream data. Currently there is shuffle
in the writer #4203 and in my
production use cases (Kafka --> Paimon) this shuffles a _lot_ of data.

rebalance was added to avoid chaining, instead we can use
[startNewChain](https://nightlies.apache.org/flink/flink-docs-stable/docs/dev/datastream/operators/overview/#task-chaining-and-resource-groups).
This can avoid the deadlock issue described in that PR without shuffle.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants