Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multiple concurrency bugs in Master.gatherTableInformation() #546

Merged
merged 7 commits into from
Jul 10, 2018

Conversation

keith-turner
Copy link
Contributor

Master.gatherTableInformation() had the following problems :

  • If Property.MASTER_STATUS_THREAD_POOL_SIZE set > 1, then multiple threads
    would put into a TreeMap
  • Create a thread pool and never called shutdown now
  • Returns a reference to a treemap that threads in thread pool may still
    be adding to.

This patch also attempts to address the issues brought up in #402 by switching
to a cached thread pool. This will allow the thread pool to expand so that
unresponsive tservers do not prevent gathering status from responsive tservers.

Master.gatherTableInformation() had the following problems :

 * If Property.MASTER_STATUS_THREAD_POOL_SIZE set > 1, then multiple threads
   would put into a TreeMap
 * Create a thread pool and never called shutdown now
 * Returns a reference to a treemap that threads in thread pool may still
   be adding to.

This patch also attempts to address the issues brought up in apache#402 by switching
to a cached thread pool.  This will allow the thread pool to expand so that
unresponsive tservers do not prevent gathering status from responsive tservers.
@@ -340,6 +340,7 @@
MASTER_REPLICATION_COORDINATOR_THREADCHECK("master.replication.coordinator.threadcheck.time",
"5s", PropertyType.TIMEDURATION,
"The time between adjustments of the coordinator thread pool"),
@Deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description should be updated to reflect when it was deprecated. I noticed while reviewing deprecated items that we hadn't been doing this for configuration properties, and it's very useful.

// Since an unbounded thread pool is being used, rate limit how fast task are added to the
// executor. This prevents the threads from growing large unless there are lots of
// unresponsive tservers.
sleepUninterruptibly(5, TimeUnit.MILLISECONDS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe this should be some fraction of the client timeout, because the client timeout indicates some user awareness of how long the RPCs should take.


// Because result is a ConcurrentSkipListMap will not see a concurrent modification exception,
// even though background threads may still try to put.
SortedMap<TServerInstance,TabletServerStatus> info = ImmutableSortedMap.copyOf(result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If something isn't in this map by the time we get here, we could just add the remaining directly to the badServers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bad server processing needs some attention, I will open up a follow on issue.

@keith-turner
Copy link
Contributor Author

All ITs passed running against 8e29faa

Copy link
Member

@ctubbsii ctubbsii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder: the deprecated configuration property description should reflect which version it was deprecated since.

@@ -340,6 +340,10 @@
MASTER_REPLICATION_COORDINATOR_THREADCHECK("master.replication.coordinator.threadcheck.time",
"5s", PropertyType.TIMEDURATION,
"The time between adjustments of the coordinator thread pool"),
/**
* @deprecated since 1.9.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually meant in the text in the property description. The javadoc comment here won't really be useful. Also, it should be 1.9.2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, should be be 1.9.2. Not exactly sure what you are looking for, can you give a code example?

Copy link
Member

@ctubbsii ctubbsii Jul 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See line 348 below. s/The number of threads/Deprecated since 1.9.2. The number of threads/
The @Deprecated annotation is sufficient for devs, but these String descriptions get published in our docs for users, where they are useful for both users and devs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the docs. I looked into the the code that generates prop docs and it generates documentation for deprecated props. So users looking at the docs will know which props are deprecated, but they will not know when. If ACCUMULO-4592 were done, could also add a deprecated version.

@keith-turner
Copy link
Contributor Author

I was a bit uneasy with the unlimited thread pool, so I made the number of threads configurable again in 476db84. It defaults to an unlimited threads pool, but if that causes problems there is a config option to limit the thread pool size.

@@ -1146,12 +1148,20 @@ private long balanceTablets() {

private SortedMap<TServerInstance,TabletServerStatus> gatherTableInformation(
Set<TServerInstance> currentServers) {
final long rpcTimeout = getConfiguration().getTimeInMillis(Property.GENERAL_RPC_TIMEOUT);
int threads = Math.max(getConfiguration().getCount(Property.MASTER_STATUS_THREAD_POOL_SIZE), 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line seems to imply that negative values will result in unlimited threads. However, the documentation only describes zero threads behaving this way. The docs should say "non-positive" or "zero and negative" just to leave no ambiguity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was not my intent. I will removed the max, then the following line would fail if negative.

@keith-turner keith-turner added v2.0.0 blocker This issue blocks any release version labeled on it. bug This issue has been verified to be a bug. labels Jul 10, 2018
@keith-turner keith-turner merged commit 42a3534 into apache:1.9 Jul 10, 2018
@keith-turner keith-turner deleted the master-concurrent-bug branch December 6, 2018 15:16
@ctubbsii ctubbsii added this to Done in 1.9.2 Jun 14, 2019
@ctubbsii ctubbsii added this to Done in 2.0.0 Jun 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker This issue blocks any release version labeled on it. bug This issue has been verified to be a bug.
Projects
No open projects
1.9.2
  
Done
2.0.0
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

2 participants