Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assign all TabletServers to default pool if empty #691

Merged
merged 5 commits into from Oct 18, 2018

Conversation

@adamjshook
Copy link
Contributor

commented Oct 11, 2018

This addresses an issue when using the HostRegexTableLoadBalancer when the default pool is empty. The load balancer will not assign the tablets at all. Here, we select a random pool to assign the tablets to. This behavior is on by default but can be disabled via property.

@adamjshook adamjshook requested a review from ctubbsii Oct 11, 2018

@@ -443,15 +443,95 @@ public void testUnassignedWithNoTServers() {
for (TServerInstance r : removals) {
current.remove(r);
}
this.getAssignments(Collections.unmodifiableSortedMap(allTabletServers),
this.getAssignments(Collections.unmodifiableSortedMap(current),

This comment has been minimized.

Copy link
@adamjshook

adamjshook Oct 11, 2018

Author Contributor

I think this is an issue with the existing test? My understanding is we want to pass in a partial set of the TServers.

@adamjshook adamjshook force-pushed the adamjshook:regex branch from 62f0c12 to f7ab314 Oct 11, 2018

@ctubbsii

This comment has been minimized.

Copy link
Member

commented Oct 11, 2018

I see you tagged me as a requested reviewer. Maybe @keith-turner can review this. I'm not that familiar with the regex balancers. However, I will note that it is likely the new Random() reference will trigger spotbugs as a potential security bug and fail the build. A @SuppressFBWarning(value = "...", justification = "...") will likely be needed to be added to the method. You can verify with mvn clean verify -Psec-bugs -DskipTests locally.

@adamjshook

This comment has been minimized.

Copy link
Contributor Author

commented Oct 11, 2018

Sure @ctubbsii Thanks for the feedback!

@adamjshook adamjshook requested review from keith-turner and removed request for ctubbsii Oct 11, 2018

@adamjshook

This comment has been minimized.

Copy link
Contributor Author

commented Oct 16, 2018

@dlmarion Could you take a look at this as well?

@dlmarion

This comment has been minimized.

Copy link
Contributor

commented Oct 16, 2018

@adamjshook - interesting. just so I understand the use-case here. All tservers are involved in defined groups such that the default group is empty?

@adamjshook

This comment has been minimized.

Copy link
Contributor Author

commented Oct 16, 2018

@dlmarion That'd either be the case, or the tablet servers in the default pool are unavailable. We use the regex balancer to isolate a few larger tables and every other table (including the accumulo.* system tables) are in the default pool. If all tservers in the default pool were to go down, then clients won't be able to access these tables (and in turn the system tables, which would cripple everything as you'd expect). This ensures that they'll be assigned to some other (random) pool.

@dlmarion

This comment has been minimized.

Copy link
Contributor

commented Oct 16, 2018

@adamjshook IIRC you can have regex's that overlap, for example you could spread table1 over rack1, table2 over rack2, and table3 over racks 1 and 2. I'm still looking at this PR, but I wonder if it would be simpler to just add all tservers to the default pool if it ends up being empty.

@dlmarion

This comment has been minimized.

Copy link
Contributor

commented Oct 16, 2018

@adamjshook - for example, at, check to see if the DEFAULT_POOL is empty, if so, add all of the current tservers to it.

@dlmarion

This comment has been minimized.

Copy link
Contributor

commented Oct 16, 2018

Looked over code and test. Code looks correct given approach taken, but I wonder if there is a hidden danger there. For example you may have a small table distributed over just a handful of tablet servers (say for example you pin the accumulo.metadata table to a handful of nodes). Given the current approach its probable that you could balance an entire table within that small group of servers.

@adamjshook

This comment has been minimized.

Copy link
Contributor Author

commented Oct 16, 2018

@dlmarion I see, so rather than selecting a random pool, we'd be better off just assigning all tablet servers into the default pool. I like this approach better. I'll make the adjustments today and update this. Thanks for the feedback!

@adamjshook adamjshook force-pushed the adamjshook:regex branch from 58e6121 to 7086e6e Oct 16, 2018

@@ -143,7 +143,21 @@
np.put(e.getKey(), e.getValue());
}
}

if (newPools.get(DEFAULT_POOL) == null) {
LOG.info("Default pool is empty; assigning all tablet servers to the default pool");

This comment has been minimized.

Copy link
@dlmarion

dlmarion Oct 16, 2018

Contributor
Suggested change
LOG.info("Default pool is empty; assigning all tablet servers to the default pool");
LOG.warn("Default pool is empty; assigning all tablet servers to the default pool");

if (newPools.get(DEFAULT_POOL) == null) {
LOG.info("Default pool is empty; assigning all tablet servers to the default pool");
for (Entry<TServerInstance,TabletServerStatus> e : current.entrySet()) {

This comment has been minimized.

Copy link
@dlmarion

dlmarion Oct 16, 2018

Contributor

You should be able to replace the loop I think:

SortedMap<TServerInstance,TabletServerStatus> dp = new TreeMap<>(current.comparator());
dp.putAll(current);
newPools.put(DEFAULT_POOL, dp);

@adamjshook adamjshook changed the title Assign tablets to random pool if no default Assign all TabletServers to default pool if empty Oct 16, 2018

@dlmarion

This comment has been minimized.

Copy link
Contributor

commented Oct 16, 2018

+1

@adamjshook

This comment has been minimized.

Copy link
Contributor Author

commented Oct 17, 2018

Thanks @dlmarion! I'll merge this later today and backport it to 1.9.

@adamjshook adamjshook merged commit b76d82c into apache:master Oct 18, 2018

2 checks passed

Jenkins This pull request looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@adamjshook adamjshook deleted the adamjshook:regex branch Oct 18, 2018

adamjshook added a commit to adamjshook/accumulo that referenced this pull request Oct 18, 2018

adamjshook added a commit to adamjshook/accumulo that referenced this pull request Oct 18, 2018

@ctubbsii ctubbsii removed the v1.9.3 label Oct 18, 2018

@ctubbsii ctubbsii added this to Done in 2.0.0 Jun 14, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
3 participants
You can’t perform that action at this time.