Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-25096][runtime] Fixes empty exception history for JobInitializationException #17967

Merged
merged 1 commit into from
Dec 10, 2021

Conversation

XComp
Copy link
Contributor

@XComp XComp commented Nov 30, 2021

What is the purpose of the change

The ExecutionGraphInfo was not initialized properly if a failure happened during the initialization phase. The failure was stored in the ArchivedExecutionGraph but not in the Collection of RootExceptionHistoryEntry elements.

Brief change log

Instead of providing an empty list, a RootExceptionHistoryEntry is added if the passed ExecutionGraph contains a failure.

Verifying this change

I extended the DefaultJobMasterServiceProcessTest to cover the JobInitializationError case and introduced ExecutionGraphInfoTest.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit b456694 (Tue Nov 30 15:59:42 UTC 2021)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Nov 30, 2021

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Copy link
Member

@dmvk dmvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fix LGTM overall, great job! 🎉 I have few minor comments on test cosmetics.

Copy link
Member

@dmvk dmvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice progress,I'm really starting to like the new assertj ecosystem.

In general I'm in favor of more sophisticated assertions, however I think we should only provide custom InstanceOfAssertFactory factories, that could be nicely chained into the existing assertion ecosystem.

Also I'm not convinced about need for custom asserts for completable futures, the ones provided by assertj are IMO sufficient.

I've added few examples on how the current assert could be simplified / implemented using existing asserts.

WDYT?

Copy link
Member

@dmvk dmvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall 👍 I've few more questions / comments on the new asserts for exception testing.

Would it make sense (especially for assertj newcomers) to add an example to the documentation on how to use / discover / extend these asserts?

@XComp
Copy link
Contributor Author

XComp commented Dec 3, 2021

Although, I'm not that sure anymore whether FlinkThrowableAssert wouldn't be the only implementation. FlinkMatchers covers only CompletableFuture and the cause chain use case.

@XComp
Copy link
Contributor Author

XComp commented Dec 3, 2021

@slinkydeveloper I refactored your code after I realized that there was some concurrent development on the same issue. I moved the code into flink-test-utils-junit to align it with your approach. I think having this class in flink-test-utils-junit makes sense instead of flink-core because a few modules benefit from it (even though it means duplicating the ExceptionUtils code for traversing the cause chain). Additionally, I realized that it was already the case beforehand (see FlinkMatchers WDYT?

@XComp XComp force-pushed the FLINK-25096 branch 3 times, most recently from 172e638 to 105b37d Compare December 6, 2021 13:00
Copy link
Member

@dmvk dmvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good job 🎉 👍 Can you please squash the commit history before merging?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants