Skip to content

return task status reported by peon#14040

Merged
clintropolis merged 11 commits intoapache:masterfrom
nlippis:report-status-from-peon
Apr 24, 2023
Merged

return task status reported by peon#14040
clintropolis merged 11 commits intoapache:masterfrom
nlippis:report-status-from-peon

Conversation

@nlippis
Copy link
Contributor

@nlippis nlippis commented Apr 6, 2023

Return task status generated by Peon in KuberentesTaskRunner

The KubernetesTaskRunner determines the status of a completed task by looking at the status of the containers the job ran (Kubernetes marks a container as failed if it exits with a non zero exit code). If any of the containers succeeded, the task is reported as successful. If any of the containers failed, the task is reported as failed.

This way of determining final task status is incorrect for the following reasons

  1. A task that fails gracefully will result in the Peon having an exit code of zero (for example cancelled tasks)
  2. Granular information on task success or failure are lost since the KubernetesTaskRunner creates a new task status object with simple messages.

Adds a method to push and stream task status from object storage

Peon pushes task status to object storage

KubernetesTaskRunner returns task status read from object storage

Release note

If the Druid cluster is upgraded from a version that does not have this change to a version that does, peons running the old version of the code will not write their task status to object storage. When the task completes the KubernetesTaskRunner will use the previous logic in order to preserve backwards compatibility.


Key changed/added classes in this PR
  • KubernetesTaskRunner
  • AbstractTask

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.


Optional<InputStream> inputStreamOptional = s3TaskLogs.streamTaskStatus(KEY_1);
String report = new BufferedReader(
new InputStreamReader(inputStreamOptional.get(), StandardCharsets.UTF_8))

Check warning

Code scanning / CodeQL

Potential input resource leak

This InputStreamReader is not always closed on method exit.
@cryptoe cryptoe added the Bug label Apr 7, 2023
@clintropolis clintropolis added this to the 26.0 milestone Apr 17, 2023
Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems fine to me 👍, and while the interface changes are to things marked as @ExtensionPoint the default implementation makes it not too disruptive.

{
}

default void pushTaskStatus(String taskid, File reportFile) throws IOException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these new methods need javadocs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put together a follow up PR that adds javadocs to all of these methods.

return Optional.absent();
}

default Optional<InputStream> streamTaskStatus(final String taskid) throws IOException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadocs for devs who would want to implement this method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put together a follow up PR that adds javadocs to all of these methods.

Comment on lines +151 to +157
@Override
public void pushTaskStatus(String taskid, File statusFile) throws IOException
{
final String taskKey = getTaskLogKey(taskid, "status.json");
log.info("Pushing task status %s to: %s", statusFile, taskKey);
pushTaskFile(statusFile, taskKey);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be the default implementation?

log.debug("No task reports file exists to push");
}
} else {
if (!toolbox.getConfig().isEncapsulatedTask()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@churromorales @nlippis - does the druid.indexer.task.encapsulatedTask has to be a user-facing setting? seems like we could just set it to true always if the task runner is k8s.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to clarify, it's not a blocker comment for this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, especially since the peon depends on the druid.indexer.runner.type setting.

@clintropolis clintropolis merged commit 9d4cc50 into apache:master Apr 24, 2023
clintropolis pushed a commit to clintropolis/druid that referenced this pull request Apr 25, 2023
* return task status reported by peon

* Write TaskStatus to file in AbstractTask.cleanUp

* Get TaskStatus from task log

* Fix merge conflicts in AbstractTaskTest

* Add unit tests for TaskLogPusher, TaskLogStreamer, NoopTaskLogs to satisfy code coverage

* Add license headerss

* Fix style

* Remove unknown exception declarations
abhishekagarwal87 pushed a commit that referenced this pull request Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants