New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle task location fetch from overlord during rolling upgrades #16227
Handle task location fetch from overlord during rolling upgrades #16227
Conversation
I don't think it is desirable to fall back to another API within the OverlordClientImpl. It would make more sense to have the fallback logic in the specific task service locator class. |
Would it be ok to add a parameter within the Overlord client to fallback to the older API? The locations are needed not only in |
I feel it is okay to have duplication in this case for the time being. |
).get(workerId); | ||
|
||
if (taskStatus != null | ||
&& TaskLocation.unknown().equals(taskStatus.getLocation())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo:
&& TaskLocation.unknown().equals(taskStatus.getLocation())) { | |
&& !TaskLocation.unknown().equals(taskStatus.getLocation())) { |
server/src/main/java/org/apache/druid/rpc/indexing/SpecificTaskServiceLocator.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM. @AmatyaAvadhanula , could you please check the CI failures?
…des (apache#16227)" This reverts commit ad6bd62.
Bug:
#15724 - introduced a bug where a rolling upgrade would cause all task locations returned by the Overlord on an older version to be unknown.
Prior to #15724,
getTaskStatus
for individual tasks fetched a TaskStatusResponse containing the location.getMultipleTaskStatuses
fetched task statuses in a batch from the metadata store. The metadata store doesn't contain the current location of an active task. Complete tasks do contain themAfter the changes,
getTaskStatus
remains unchanged.getMultipleTaskStatuses
fetches task statuses for in-memory tasks from the TaskQueue and enhances them with the location from the task runner. The method fetches task statuses for completed tasks from the db.The Overlord client was also changed to rely on the 2nd API to fetch the task status and location from memory.
During a rolling upgrade, the task is on a version with the PR's changes and queries the 2nd API. The overlord is still on the older version and fails to return the correct location for active tasks. This can lead to task failures during rolling upgrades.
Fix
The overlord client now falls back to the orignal API that always returns the task location if the 2nd API fails to return it during the rolling upgrade. After the rolling upgrade, the active tasks' statuses will be fetched from memory as expected.
Testing
The new overlord client was used on an upgraded Indexer / MM while the Overlord was on a version prior to #15724. The tasks succeeded as expected. (They would fail with a newer Indexer without this patch).
This PR has: