Skip to content

Comments

[FLINK-38977][runtime] Expose exceptions for applications#27655

Open
eemario wants to merge 2 commits intoapache:masterfrom
eemario:FLIP560-5
Open

[FLINK-38977][runtime] Expose exceptions for applications#27655
eemario wants to merge 2 commits intoapache:masterfrom
eemario:FLIP560-5

Conversation

@eemario
Copy link
Contributor

@eemario eemario commented Feb 24, 2026

What is the purpose of the change

This pull request record exceptions that cause application failure and expose this information via the REST API / UI.

Brief change log

  • Record exceptions that cause application failure in the applications
  • Add the REST API /applications/:applicationid/exceptions
  • Modify the UI for application page to include exceptions

Verifying this change

This change added tests and can be verified as follows:

  • Added tests that validate the exception history when an application fails
  • Added tests that validate the REST handler's behavior
  • Manually verified the change by running a standalone cluster, submitting an application with expected exception and confirming the REST response / UI display

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (docs)

@eemario eemario changed the title Expose exceptions for applications [FLINK-38977][runtime] Expose exceptions for applications Feb 24, 2026
@flinkbot
Copy link
Collaborator

flinkbot commented Feb 24, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@eemario eemario marked this pull request as ready for review February 24, 2026 08:55
const exceptionHistory = data.exceptionHistory;
if (exceptionHistory.entries.length > 0) {
const exceptionInfo = exceptionHistory.entries[0];
this.rootException = `${formatDate(exceptionInfo.timestamp, 'yyyy-MM-dd HH:mm:ss', 'en')}\n${
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to also print the related job if the root cause is a job failure.

// } catch (Exception e) {
// log.warn("Failed to get job result for job {}", jobId);
// }
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unwanted changes?

final Optional<UnsuccessfulExecutionException>
maybeJobFailure = extractJobFailure(t);
if (maybeJobFailure.isPresent()
&& isCanceledOrFailed(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case does a job fail, but the status is neither "Failed" nor "Canceled"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants