-
Notifications
You must be signed in to change notification settings - Fork 444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix multiple concurrency bugs in Master.gatherTableInformation() #546
Conversation
Master.gatherTableInformation() had the following problems : * If Property.MASTER_STATUS_THREAD_POOL_SIZE set > 1, then multiple threads would put into a TreeMap * Create a thread pool and never called shutdown now * Returns a reference to a treemap that threads in thread pool may still be adding to. This patch also attempts to address the issues brought up in apache#402 by switching to a cached thread pool. This will allow the thread pool to expand so that unresponsive tservers do not prevent gathering status from responsive tservers.
@@ -340,6 +340,7 @@ | |||
MASTER_REPLICATION_COORDINATOR_THREADCHECK("master.replication.coordinator.threadcheck.time", | |||
"5s", PropertyType.TIMEDURATION, | |||
"The time between adjustments of the coordinator thread pool"), | |||
@Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description should be updated to reflect when it was deprecated. I noticed while reviewing deprecated items that we hadn't been doing this for configuration properties, and it's very useful.
// Since an unbounded thread pool is being used, rate limit how fast task are added to the | ||
// executor. This prevents the threads from growing large unless there are lots of | ||
// unresponsive tservers. | ||
sleepUninterruptibly(5, TimeUnit.MILLISECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe this should be some fraction of the client timeout, because the client timeout indicates some user awareness of how long the RPCs should take.
|
||
// Because result is a ConcurrentSkipListMap will not see a concurrent modification exception, | ||
// even though background threads may still try to put. | ||
SortedMap<TServerInstance,TabletServerStatus> info = ImmutableSortedMap.copyOf(result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If something isn't in this map by the time we get here, we could just add the remaining directly to the badServers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bad server processing needs some attention, I will open up a follow on issue.
All ITs passed running against 8e29faa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder: the deprecated configuration property description should reflect which version it was deprecated since.
@@ -340,6 +340,10 @@ | |||
MASTER_REPLICATION_COORDINATOR_THREADCHECK("master.replication.coordinator.threadcheck.time", | |||
"5s", PropertyType.TIMEDURATION, | |||
"The time between adjustments of the coordinator thread pool"), | |||
/** | |||
* @deprecated since 1.9.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually meant in the text in the property description. The javadoc comment here won't really be useful. Also, it should be 1.9.2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yeah, should be be 1.9.2. Not exactly sure what you are looking for, can you give a code example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See line 348 below. s/The number of threads/Deprecated since 1.9.2. The number of threads/
The @Deprecated
annotation is sufficient for devs, but these String descriptions get published in our docs for users, where they are useful for both users and devs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the docs. I looked into the the code that generates prop docs and it generates documentation for deprecated props. So users looking at the docs will know which props are deprecated, but they will not know when. If ACCUMULO-4592 were done, could also add a deprecated version.
I was a bit uneasy with the unlimited thread pool, so I made the number of threads configurable again in 476db84. It defaults to an unlimited threads pool, but if that causes problems there is a config option to limit the thread pool size. |
@@ -1146,12 +1148,20 @@ private long balanceTablets() { | |||
|
|||
private SortedMap<TServerInstance,TabletServerStatus> gatherTableInformation( | |||
Set<TServerInstance> currentServers) { | |||
final long rpcTimeout = getConfiguration().getTimeInMillis(Property.GENERAL_RPC_TIMEOUT); | |||
int threads = Math.max(getConfiguration().getCount(Property.MASTER_STATUS_THREAD_POOL_SIZE), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line seems to imply that negative values will result in unlimited threads. However, the documentation only describes zero threads behaving this way. The docs should say "non-positive" or "zero and negative" just to leave no ambiguity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was not my intent. I will removed the max, then the following line would fail if negative.
Master.gatherTableInformation() had the following problems :
would put into a TreeMap
be adding to.
This patch also attempts to address the issues brought up in #402 by switching
to a cached thread pool. This will allow the thread pool to expand so that
unresponsive tservers do not prevent gathering status from responsive tservers.