Skip to content

Conversation

@Zakelly
Copy link
Contributor

@Zakelly Zakelly commented Jul 10, 2024

What is the purpose of the change

This PR fixes the unstable test SnapshotFileMergingCompatibilityITCase. There is a problem with the test code itself. The test verifies the cp files are cleaned as expected, which depends on an async cleaner (the job io thread). While the mini-cluster quit, the async threads may not have finished their jobs before termination.

It is better to try multiple rounds AZP to verify this fix.

Brief change log

Only SnapshotFileMergingCompatibilityITCase. Make it wait for file deletion before cluster termination.

Verifying this change

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

@flinkbot
Copy link
Collaborator

flinkbot commented Jul 10, 2024

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@fredia fredia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zakelly Thanks for the quick fix, overall LGTM.

Thread.sleep(500L);
waited += 500L;
// Or timeout
assertThat(waited).isLessThan(DELETE_TIMEOUT_MILLS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the timeout is reached and the files are not deleted, will this check failed?
If restoring from file-merging disabled to file-merging disabled, is it okay to skip this check?

Copy link
Contributor Author

@Zakelly Zakelly Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the timeout is reached and the files are not deleted, will this check failed?

Yes.

If restoring from file-merging disabled to file-merging disabled, is it okay to skip this check?

In this case, we are not testing the file-merging, but we could check the file existence by the way.

@Zakelly
Copy link
Contributor Author

Zakelly commented Jul 11, 2024

I don't think the CI failure is related with file-merging. It happens when file-merging is disabled. I'm investigating the potential file leak.

Update: Well there is an issue with the test code.

@Zakelly
Copy link
Contributor Author

Zakelly commented Jul 11, 2024

I have run the test with the last commit for 200 times. No failure.

Copy link
Contributor

@fredia fredia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigating and updating, let's wait CI green.

@Zakelly Zakelly merged commit d31c450 into apache:master Jul 11, 2024
@Zakelly Zakelly deleted the f35801 branch July 11, 2024 09:35
snuyanzin pushed a commit to snuyanzin/flink that referenced this pull request Jul 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants