Skip to content

[flink] Fix stability of Flink 2.0 test workflow#5877

Merged
JingsongLi merged 4 commits intoapache:masterfrom
yunfengzhou-hub:fix-flink-2.0-cdc
Jul 11, 2025
Merged

[flink] Fix stability of Flink 2.0 test workflow#5877
JingsongLi merged 4 commits intoapache:masterfrom
yunfengzhou-hub:fix-flink-2.0-cdc

Conversation

@yunfengzhou-hub
Copy link
Copy Markdown
Contributor

@yunfengzhou-hub yunfengzhou-hub commented Jul 11, 2025

Purpose

Linked issue: close #5876

This PR fixes the problems that affect the stability of Flink 2.0 test workflow and add back the workflow script again, which was once removed in a6d2e83.

I counted the most recent 100 Action executions before the script is removed, and the problems that caused the failures are as follows.

  1. (79%) The Action run successfully.
  2. (11%) paimon-flink-cdc module download timeout.
  3. (7%) CatalogTableITCase#testConsumersTable failed. It is a flaky test with race condition.
  4. (3%) Various failures that appeared only once.

So we need to fix the problems mentioned in 2 and 3 before enabling the Action again. The fixing is as follows.

2. Build paimon-flink-cdc under Flink 2.0

Example failure Action log: https://github.com/apache/paimon/actions/runs/15804130573/job/44546474506

The failure was because that paimon-e2e-tests and paimon-docs relies on paimon-flink-cdc module, but this module is not compiled when profile flink2 is enabled. In the attempt to download the snapshot of this module, network connection timeout might occur.

This PR fixes the failure by enabling the compilation of paimon-flink-cdc and its pre-module (paimon-flink1-common) under -Pflink2. This change is supposed not to affect the production code and the releasing process.

3. Fix race condition in CatalogTableITCase#testConsumersTable

Example failure Action log: https://github.com/apache/paimon/actions/runs/15822438299/job/44594579766

This failure was because that the Flink job with the consumer source might not have completed its initialization before the second batch of data is written to the Paimon table. In order to fix it, a blocking operation is added to force the test case to wait for the source's initialization before the next write.

Tests

API and Format

Documentation

@yunfengzhou-hub yunfengzhou-hub marked this pull request as ready for review July 11, 2025 02:26
Copy link
Copy Markdown
Contributor

@Sxnan Sxnan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Just left one minor comment

Copy link
Copy Markdown
Contributor

@Sxnan Sxnan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. LGTM

@JingsongLi
Copy link
Copy Markdown
Contributor

+1

@JingsongLi JingsongLi merged commit 370b1ab into apache:master Jul 11, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Fix stability of Flink 2.0 test workflow

3 participants