Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multiple concurrency bugs in Master.gatherTableInformation() #546

Merged
merged 7 commits into from Jul 10, 2018

Conversation

Projects
None yet
2 participants
@keith-turner
Copy link
Contributor

keith-turner commented Jul 2, 2018

Master.gatherTableInformation() had the following problems :

  • If Property.MASTER_STATUS_THREAD_POOL_SIZE set > 1, then multiple threads
    would put into a TreeMap
  • Create a thread pool and never called shutdown now
  • Returns a reference to a treemap that threads in thread pool may still
    be adding to.

This patch also attempts to address the issues brought up in #402 by switching
to a cached thread pool. This will allow the thread pool to expand so that
unresponsive tservers do not prevent gathering status from responsive tservers.

keith-turner added some commits Jul 2, 2018

Fix multiple concurrency bugs in Master.gatherTableInformation()
Master.gatherTableInformation() had the following problems :

 * If Property.MASTER_STATUS_THREAD_POOL_SIZE set > 1, then multiple threads
   would put into a TreeMap
 * Create a thread pool and never called shutdown now
 * Returns a reference to a treemap that threads in thread pool may still
   be adding to.

This patch also attempts to address the issues brought up in #402 by switching
to a cached thread pool.  This will allow the thread pool to expand so that
unresponsive tservers do not prevent gathering status from responsive tservers.
@@ -340,6 +340,7 @@
MASTER_REPLICATION_COORDINATOR_THREADCHECK("master.replication.coordinator.threadcheck.time",
"5s", PropertyType.TIMEDURATION,
"The time between adjustments of the coordinator thread pool"),
@Deprecated

This comment has been minimized.

@ctubbsii

ctubbsii Jul 2, 2018

Member

Description should be updated to reflect when it was deprecated. I noticed while reviewing deprecated items that we hadn't been doing this for configuration properties, and it's very useful.

// Since an unbounded thread pool is being used, rate limit how fast task are added to the
// executor. This prevents the threads from growing large unless there are lots of
// unresponsive tservers.
sleepUninterruptibly(5, TimeUnit.MILLISECONDS);

This comment has been minimized.

@ctubbsii

ctubbsii Jul 2, 2018

Member

I think maybe this should be some fraction of the client timeout, because the client timeout indicates some user awareness of how long the RPCs should take.


// Because result is a ConcurrentSkipListMap will not see a concurrent modification exception,
// even though background threads may still try to put.
SortedMap<TServerInstance,TabletServerStatus> info = ImmutableSortedMap.copyOf(result);

This comment has been minimized.

@ctubbsii

ctubbsii Jul 2, 2018

Member

If something isn't in this map by the time we get here, we could just add the remaining directly to the badServers.

This comment has been minimized.

@keith-turner

keith-turner Jul 2, 2018

Author Contributor

The bad server processing needs some attention, I will open up a follow on issue.

@keith-turner

This comment has been minimized.

Copy link
Contributor Author

keith-turner commented Jul 3, 2018

All ITs passed running against 8e29faa

@ctubbsii
Copy link
Member

ctubbsii left a comment

Reminder: the deprecated configuration property description should reflect which version it was deprecated since.

@@ -340,6 +340,10 @@
MASTER_REPLICATION_COORDINATOR_THREADCHECK("master.replication.coordinator.threadcheck.time",
"5s", PropertyType.TIMEDURATION,
"The time between adjustments of the coordinator thread pool"),
/**
* @deprecated since 1.9.1

This comment has been minimized.

@ctubbsii

ctubbsii Jul 4, 2018

Member

I actually meant in the text in the property description. The javadoc comment here won't really be useful. Also, it should be 1.9.2.

This comment has been minimized.

@keith-turner

keith-turner Jul 4, 2018

Author Contributor

oh yeah, should be be 1.9.2. Not exactly sure what you are looking for, can you give a code example?

This comment has been minimized.

@ctubbsii

ctubbsii Jul 4, 2018

Member

See line 348 below. s/The number of threads/Deprecated since 1.9.2. The number of threads/
The @Deprecated annotation is sufficient for devs, but these String descriptions get published in our docs for users, where they are useful for both users and devs.

This comment has been minimized.

@keith-turner

keith-turner Jul 5, 2018

Author Contributor

I updated the docs. I looked into the the code that generates prop docs and it generates documentation for deprecated props. So users looking at the docs will know which props are deprecated, but they will not know when. If ACCUMULO-4592 were done, could also add a deprecated version.

@keith-turner

This comment has been minimized.

Copy link
Contributor Author

keith-turner commented Jul 9, 2018

I was a bit uneasy with the unlimited thread pool, so I made the number of threads configurable again in 476db84. It defaults to an unlimited threads pool, but if that causes problems there is a config option to limit the thread pool size.

@@ -1146,12 +1148,20 @@ private long balanceTablets() {

private SortedMap<TServerInstance,TabletServerStatus> gatherTableInformation(
Set<TServerInstance> currentServers) {
final long rpcTimeout = getConfiguration().getTimeInMillis(Property.GENERAL_RPC_TIMEOUT);
int threads = Math.max(getConfiguration().getCount(Property.MASTER_STATUS_THREAD_POOL_SIZE), 0);

This comment has been minimized.

@ctubbsii

ctubbsii Jul 9, 2018

Member

This line seems to imply that negative values will result in unlimited threads. However, the documentation only describes zero threads behaving this way. The docs should say "non-positive" or "zero and negative" just to leave no ambiguity.

This comment has been minimized.

@keith-turner

keith-turner Jul 10, 2018

Author Contributor

That was not my intent. I will removed the max, then the following line would fail if negative.

@keith-turner keith-turner merged commit 42a3534 into apache:1.9 Jul 10, 2018

2 checks passed

Jenkins This pull request looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@keith-turner keith-turner deleted the keith-turner:master-concurrent-bug branch Dec 6, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.