Skip to content

Conversation

@keith-turner
Copy link
Contributor

There was code in the manager that seemed to have the intent of clearing suspension markers in tablets for an offline table. This code was not working or tested. This commit fixes this code and adds a tests that validates that suspension markers are removed when a table is taken offline.

fixes #3314

There was code in the manager that seemed to have the intent of clearing
suspension markers in tablets for an offline table.  This code was not
working or tested.  This commit fixes this code and adds a tests that
validates that suspension markers are removed when a table is taken
offline.

fixes apache#3314
@EdColeman
Copy link
Contributor

With these changes. What is the suspend state when:

  1. A tablet server is stopped.
  2. A tablet server crashes (killed)

In both of those cases when the tablet(s) are reloaded, is the suspend state still set and subject to the timeout?

@keith-turner
Copy link
Contributor Author

In both of those cases when the tablet(s) are reloaded, is the suspend state still set and subject to the timeout?

Yeah I think so and the existing test was checking for this. I added the following line to the test after tservers were killed to be sure that tablets were in the suspended state.

assertTrue(ds.suspendedCount > 0);

ds = TabletLocations.retrieve(ctx, tableName);
log.info("Waiting for suspended {}", ds.suspended);
}
} else if (action == AfterSuspendAction.RESUME) {
Copy link
Contributor Author

@keith-turner keith-turner Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in this else block was not changed except for indentation and being placed in an else block.

private class ShutdownTserverKiller implements TServerKiller {

@Override
public void eliminateTabletServers(ClientContext ctx, TabletLocations locs, int count)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in the method was copied w/o change from an existing lambda in the test. This copy was done for reuse.

private class CrashTserverKiller implements TServerKiller {

@Override
public void eliminateTabletServers(ClientContext ctx, TabletLocations locs, int count)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in this methos was also copied w/o change from an existing lambda in the test

Copy link
Contributor

@EdColeman EdColeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look okay. As long as this preserves the behavior that the suspend duration can be used for rolling restarts with tservers either stopped by command or by a kill (crash).

Mainly, with the suspend delay set to some value and when a tserver is stopped by command or is killed, the tablets assigned will not be reassigned until the expiration of the suspend duration. When a tserver with suspended tablets comes back online, the suspended tablets are reassigned to that same server.

As long as this holds, I approve the changes.

@keith-turner
Copy link
Contributor Author

As long as this preserves the behavior that the suspend duration can be used for rolling restarts with tservers either stopped by command or by a kill (crash).

I did not change that behavior in this PR.

@keith-turner keith-turner merged commit bdec3a7 into apache:2.1 Feb 22, 2024
@keith-turner keith-turner deleted the accumulo-3314 branch February 22, 2024 23:35
@ctubbsii ctubbsii modified the milestones: 3.1.0, 2.1.3 Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants