Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix worker thread pool exhaustion bug #3760

Merged
merged 3 commits into from Dec 9, 2016
Merged

fix worker thread pool exhaustion bug #3760

merged 3 commits into from Dec 9, 2016

Conversation

dclim
Copy link
Contributor

@dclim dclim commented Dec 8, 2016

Fixed a bug where the supervisor can hang waiting for a thread from the worker thread pool which never becomes available. Also add a timeout for async operations as a safety for these kind of issues, and fix the unit test which should have caught this issue but was broken.

@dclim dclim added the Bug label Dec 8, 2016
@dclim dclim added this to the 0.9.3 milestone Dec 8, 2016
@gianm gianm self-assigned this Dec 8, 2016
Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some minor comments.

@@ -255,6 +257,12 @@ public boolean apply(TaskRunnerWorkItem taskRunnerWorkItem)
}
};

this.futureTimeout = Math.max(
120,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well make this a constant too, along with the other stuff you made constants.

@@ -174,6 +175,7 @@ public TaskGroup(ImmutableMap<Integer, Long> partitionOffsets, Optional<DateTime
private final KafkaTuningConfig taskTuningConfig;
private final String supervisorId;
private final TaskInfoProvider taskInfoProvider;
private final long futureTimeout; // how long to wait for async operations to complete
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include "in seconds"; millis is more common so people will assume that if you leave the units off.

@@ -1420,7 +1432,7 @@ public Void apply(@Nullable List<Void> input)
{
return null;
}
}, workerExec
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment: this code would be more readable if this pattern was expressed in a function like ListenableFuture<Void> asVoidFuture(ListenableFuture<?> future)

Copy link
Contributor

@gianm gianm Dec 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may also be fine in a lot of situations to just cast it, since presumably nobody is actually trying to cast the returned Void value to a Void, so there should never actually be a bad cast at runtime. But, eh.

@gianm
Copy link
Contributor

gianm commented Dec 8, 2016

Tests failed to due checkstyle issue in #3567, @jon-wei could you take a look at that?

@gianm gianm closed this Dec 8, 2016
@gianm gianm reopened this Dec 8, 2016
}
}, workerExec
);
return asVoidFuture(Futures.successfulAsList(futures));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe stopTaskInGroup() could return ListenableFuture<?> and this wrapping asVoidFuture is not needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leventov I like this refactor, thank you

Copy link
Member

@nishantmonu51 nishantmonu51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@leventov leventov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fjy
Copy link
Contributor

fjy commented Dec 9, 2016

👍

@fjy fjy merged commit 0b9dff0 into apache:master Dec 9, 2016
dgolitsyn pushed a commit to metamx/druid that referenced this pull request Feb 14, 2017
* fix worker thread pool exhaustion bug

* code review changes

* code review changes
seoeun25 added a commit to seoeun25/incubator-druid that referenced this pull request Jan 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants