Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix TaskQueue-HRTR deadlock #6212

Merged
merged 2 commits into from
Aug 25, 2018
Merged

fix TaskQueue-HRTR deadlock #6212

merged 2 commits into from
Aug 25, 2018

Conversation

himanshug
Copy link
Contributor

Fixes #6201

FWIW: RTR has same bug https://github.com/apache/incubator-druid/blob/master/indexing-service/src/main/java/io/druid/indexing/overlord/RemoteTaskRunner.java#L908 but rarity of that happening causes getting into the deadlock very unlikely.

from the design perspective, to guarantee safety, I think it would eventually make more sense to make sure that all kinds of listeners are always executed in something other than SameThreadExecutor or that listeners are never called with internal locks being held as you don't know what listeners might end up doing.

private void taskComplete(
HttpRemoteTaskRunnerWorkItem taskRunnerWorkItem,
WorkerHolder workerHolder,
TaskStatus taskStatus
)
{
Preconditions.checkArgument(!Thread.holdsLock(statusLock), "Current thread must not hold statusLock.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: checkState is more appropriate here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, fixed

@@ -376,36 +376,46 @@ private boolean runTaskOnWorker(
// on a worker - this avoids overflowing a worker with tasks
long waitMs = config.getTaskAssignmentTimeout().toStandardDuration().getMillis();
long waitStart = System.currentTimeMillis();
boolean isTaskAssignmentTimedout = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: isTaskAssignmentTimedOut (spelling: timed out is two words.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed :)

@gianm
Copy link
Contributor

gianm commented Aug 23, 2018

@himanshug,

from the design perspective, to guarantee safety, I think it would eventually make more sense to make sure that all kinds of listeners are always executed in something other than SameThreadExecutor or that listeners are never called with internal locks being held as you don't know what listeners might end up doing.

Probably the right thing to do is have the caller that sets a callback also pass an ExecutorService to use for that callback. That way the caller has some control over what thread pool to use, and the callee doesn't need to have its own callbacks thread pool that might never get used.

If it doesn't pass one in, then use a same-thread executor.

@gianm
Copy link
Contributor

gianm commented Aug 25, 2018

LGTM, thanks @himanshug

@gianm gianm merged commit c3aaf81 into apache:master Aug 25, 2018
@gianm gianm modified the milestones: 0.12.3, 0.13.0 Aug 25, 2018
@gianm
Copy link
Contributor

gianm commented Aug 25, 2018

@himanshug, I moved this to 0.13, since 0.12 doesn't have the HTTP remote task runner.

gianm pushed a commit to implydata/druid-public that referenced this pull request Aug 25, 2018
* fix TaskQueue-HRTR deadlock causing apache#6201

* address review comments
@himanshug
Copy link
Contributor Author

@gianm sure, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants