Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrying with a backward compatible task type on unknown task type error in parallel indexing #8905

Merged
merged 2 commits into from
Nov 20, 2019

Conversation

jihoonson
Copy link
Contributor

@jihoonson jihoonson commented Nov 19, 2019

Fixes #8836.

Description

I think I don't like this patch much, but don't see a better solution. Please leave comments if anyone has a better idea.

TaskMonitor.submit() creates a sub task for a given spec submits it to the overlord. Here, if the task type was unknown to the overlord (can happen during a rolling update), the overlord would return an HTTP error to TaskMonitor. HttpIndexingServiceClient would throw an IllegalStateException if the HTTP response was not 200.

To handle this, I added a new method SubTaskSpec.newSubTaskWithBackwardCompatibleType() which will be called if SubTaskSpec.newSubTask() fails with an IllegalStateExceptoin with a message starting with "Could not resolve type id".


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths.
  • added integration tests.
  • been tested in a test Druid cluster.


This change is Reviewable

@Override
public String getType()
{
return SinglePhaseSubTask.OLD_TYPE_NAME;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this really affect how the task is serialized? I thought the type field would end up based solely on whatever Jackson thinks the type code for that class is, based on @JsonTypeName or @JsonSubTypes annotations.

Could you include a test that makes sure it serializes correctly? (Maybe serialize it, then deserialize as a Map and check the type field.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning a different type works, but the class should be registered on Jackson properly. I fixed it and added a unit test.

{
T task = spec.newSubTask(numAttempts);
try {
indexingServiceClient.runTask(task);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this approach be able to retry immediately, or will it have to exhaust retries in the indexingServiceClient first? (The loop in DruidLeaderClient)

Ideally, this detects a problem on the first submission, and doesn't need to exhaust retries before it moves on to trying the backwards-compatible type.

Could you check it, if you haven't already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't retry immediately but will exhaust retries in DruidLeaderClient. Yes, ideally it should be able to detect the problem before it retries, but I'm not sure whether that refactoring is worth to do because 1) it's not easy to teach the logic of the caller to DruidLeaderClient, 2) it will happen only during a particular type of rolling update, 3) and retries won't take much time compared to the total indexing time.

@jihoonson
Copy link
Contributor Author

@gianm thanks for the review. I also tested this patch with a cluster of an overlord of 0.15.0 and middleManagers of this patch.

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 with the latest changes, thanks @jihoonson

@gianm gianm merged commit baefc65 into apache:master Nov 20, 2019
jon-wei pushed a commit to jon-wei/druid that referenced this pull request Nov 26, 2019
…ror in parallel indexing (apache#8905)

* Retrying with a backward compatible task type on unknown task type error in parallel indexing

* Register legacy class; add a serde test
jon-wei pushed a commit to jon-wei/druid that referenced this pull request Nov 27, 2019
…ror in parallel indexing (apache#8905)

* Retrying with a backward compatible task type on unknown task type error in parallel indexing

* Register legacy class; add a serde test
jon-wei added a commit that referenced this pull request Nov 27, 2019
…ask type error in parallel indexing (#8905) (#8949)

* Retrying with a backward compatible task type on unknown task type error in parallel indexing (#8905)

* Retrying with a backward compatible task type on unknown task type error in parallel indexing

* Register legacy class; add a serde test

* Backport fix, use firehoses
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallel indexing task fails during the rolling update
2 participants