Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-26993][tests] Wait until checkpoint was actually triggered #19865

Merged
merged 1 commit into from Jun 16, 2022

Conversation

akalash
Copy link
Contributor

@akalash akalash commented Jun 1, 2022

What is the purpose of the change

Backport to release-1.15 of #19356

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Jun 1, 2022

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@pnowojski
Copy link
Contributor

Change LGTM, but I couldn't figure out what has happened in this test failure. I can not figure out which test has failed, am I missing something obvious? I see this being logged:

Jun 01 22:02:55 [ERROR] pure virtual method called
Jun 01 22:02:55 [ERROR] terminate called without an active exception
Jun 01 22:02:59 [ERROR] Aborted (core dumped)

@akalash
Copy link
Contributor Author

akalash commented Jun 9, 2022

@pnowojski ,You can check logs from this ticket for example - https://issues.apache.org/jira/browse/FLINK-27216(it is a fresh duplicate of FLINK-26993). There is a pretty clear NPE exception(https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=34535&view=logs&j=0da23115-68bb-5dcd-192c-bd4c8adebde1&t=24c3384f-1bcb-57b3-224f-51bf973bbee8&l=9180) which happens because we call receiveAcknowledgeMessage before the PendingCheckpoint was initialized(it happens only in test since we trigger it manually but it is an impossible situation in reality). It was broken when we separate setting checkpoint storage from creating of the PendingCheckpoint.

@akalash
Copy link
Contributor Author

akalash commented Jun 14, 2022

@flinkbot run azure

@pnowojski pnowojski merged commit 3e86cf2 into apache:release-1.15 Jun 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants