Skip to content

Comments

(WIP) Wait for initialization during KubernetesTaskRunner startup#15041

Closed
georgew5656 wants to merge 2 commits intoapache:masterfrom
georgew5656:waitForinitialization
Closed

(WIP) Wait for initialization during KubernetesTaskRunner startup#15041
georgew5656 wants to merge 2 commits intoapache:masterfrom
georgew5656:waitForinitialization

Conversation

@georgew5656
Copy link
Contributor

This fix attempts to bring the KubernetesTaskRunner more into line with the HttpRemoteTaskRunner (https://github.com/apache/druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/overlord/hrtr/HttpRemoteTaskRunner.java#L560) w.r.t startup initialization.

Right now when the overlord becomes a leader using the KubernetesTaskRunner it adds all of the running tasks to its mapping, but doesn't wait for the underlying thread pool to finish syncing state from Kubernetes. This change attempts to do this (although it doesn't fail if it is unable to completely finish syncing)

Description

Best-effort attempt to sync state from Kubernetes completely before becoming the overlord leader when running mm-less ingestion.

In the start() method, after adding all the jobs in kubernetes to the tasks map, try to wait for the underlying thread pool to finish syncing state from K8s.

Release note

Improvments to overlord lifecycle when running mm-less ingestion

Key changed/added classes in this PR
  • KubernetesTaskRunner

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions
Copy link

github-actions bot commented Mar 6, 2024

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Mar 6, 2024
@github-actions
Copy link

github-actions bot commented Apr 5, 2024

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant