Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to prevent half-closed Tablet due to failing minc #3677

Merged

Conversation

dlmarion
Copy link
Contributor

@dlmarion dlmarion commented Aug 3, 2023

Prior to this change Tablet.initiateClose would close the compactable, wait for the current minor compaction to finish, and then kick off another minor compaction. In the case where minor compactions are failing, the compactable will get closed and the call to wait for the current minor compaction to finish will hang indefinitely leaving the Tablet in a half-closed state. Tablet.initiateClose is called when either the TabletServer is closing the Tablet (due to migration or shutdown) or on a Tablet split. Therefore, a failing minor compaction will prevent Tablet migration and split or normal TabletServer shutdown.

This change adds some logic at the start of Tablet.initiateClose to try and detect a currently or previously failing minor compaction. In these cases an exception is thrown before the compactable is closed to prevent the Tablet from being in a half-closed state. This prevents the Tablet from being closed until the cause for the failing minor compaction is corrected. Causes could be a bad iterator applied at minor compaction or a bad table configuration option (like classloader context). When the cause of the failing minor compaction is corrected, then the Tablet should be able to be closed normally allowing normal migration, split, and TabletServer shutdown.

Fixes #3674

Prior to this change Tablet.initiateClose would close the
compactable, wait for the current minor compaction to finish,
and then kick off another minor compaction. In the case where
minor compactions are failing, the compactable will get closed
and the call to wait for the current minor compaction to finish
will hang indefinitely leaving the Tablet in a half-closed
state. Tablet.initiateClose is called when either the
TabletServer is closing the Tablet (due to migration or
shutdown) or on a Tablet split. Therefore, a failing minor
compaction will prevent Tablet migration and split or normal
TabletServer shutdown.

This change adds some logic at the start of Tablet.initiateClose
to try and detect a currently or previously failing minor
compaction. In these cases an exception is thrown before the
compactable is closed to prevent the Tablet from being in a
half-closed state. This prevents the Tablet from being closed
until the cause for the failing minor compaction is corrected.
Causes could be a bad iterator applied at minor compaction or a
bad table configuration option (like classloader context). When
the cause of the failing minor compaction is corrected, then the
Tablet should be able to be closed normally allowing normal
migration, split, and TabletServer shutdown.

Fixes apache#3674

Co-authored-by: Ed Coleman <edcoleman@apache.org>
Co-authored-by: dtspence <dtspence@users.noreply.github.com>
@dlmarion dlmarion added this to In progress in 2.1.2 via automation Aug 3, 2023
@dlmarion dlmarion self-assigned this Aug 3, 2023
@dlmarion
Copy link
Contributor Author

dlmarion commented Aug 3, 2023

Kicked off full IT build

@dtspence
Copy link
Contributor

dtspence commented Aug 3, 2023

@dlmarion We believe we found a code path that produces a test timeout. It appeared to be related to a minc thread exiting due to not being able to obtain a mapfile (i.e. volume chooser seeing invalid context). The code-path in the integration test saw an error within the MinorCompactor and logged a retrying minc operation (possibly leaving minc thread active).

The calling path that causes the thread to end (i.e. context error w/volume-chooser) logged the tablets were unable to unload.

The logs report the following (w/minc thread exit path) when attempting to shutdown:

2023-08-03T16:03:32,503 [tserver.UnloadTabletHandler] INFO : Tablet unload for extent 1<< requested.
2023-08-03T16:03:32,503 [tablet.Tablet] DEBUG: 1<< closeState OPEN tabletMemory.memoryReservedForMinC() true tabletMemory.getMemTable().getNumEntries() 0 updatingFlushID false
2023-08-03T16:03:32,503 [tablet.Tablet] WARN : Unable to initiate minc for close on 1<<. Tablet might be closed or deleting.
2023-08-03T16:03:32,503 [tserver.UnloadTabletHandler] ERROR: Failed to close tablet 1<<... Aborting migration
java.lang.RuntimeException: Unable to initiate minc for close on 1<<. Tablet might be closed or deleting.
        at org.apache.accumulo.tserver.tablet.Tablet.initiateClose(Tablet.java:925) ~[classes/:?]
        at org.apache.accumulo.tserver.tablet.Tablet.close(Tablet.java:898) ~[classes/:?]
        at org.apache.accumulo.tserver.UnloadTabletHandler.run(UnloadTabletHandler.java:92) [classes/:?]
        at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) [classes/:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at org.apache.accumulo.core.trace.TraceWrappedRunnable.run(TraceWrappedRunnable.java:52) [classes/:?]
        at java.lang.Thread.run(Thread.java:829) [?:?]

To replicate, the following is needed as base configuration:

    @Override
    public void configureMiniCluster(MiniAccumuloConfigImpl cfg, Configuration coreSite) {
      cfg.setNumTservers(1);
      cfg.setProperty("general.volume.chooser", "org.apache.accumulo.core.spi.fs.DelegatingChooser");
      cfg.setProperty("general.custom.volume.chooser.default",
              "org.apache.accumulo.core.spi.fs.PreferredVolumeChooser");
      cfg.setProperty("general.custom.volume.preferred.default", "file:/home/dtspen2/dev/git/dlmarion/accumulo/test/target/mini-tests/org.apache.accumulo.test.functional.HalfClosedTabletIT_SharedMiniClusterBase/accumulo");
    }

Then a flush() operation which caused the minc thread to end:

      tops.setProperty(tableName, Property.TABLE_CLASSLOADER_CONTEXT.getKey(), "invalid");

      tops.flush(tableName);

      Thread.sleep(500);

      // This should fail to split, but not leave the tablets in a state where they can't
      // be unloaded
      assertThrows(AccumuloServerException.class,
          () -> tops.addSplits(tableName, Sets.newTreeSet(List.of(new Text("b")))));

      tops.removeProperty(tableName, Property.TABLE_CLASSLOADER_CONTEXT.getKey());

@dlmarion
Copy link
Contributor Author

dlmarion commented Aug 4, 2023

I looked into this path where the minor compaction thread was dying because an invalid classloader context could not load the volume chooser to create the minor compaction output file. The issue is more serious that what I initially thought, but thankfully the fix is easy.

TLDR - A failed minor compaction thread might likely mean that subsequent minor compactions for a Tablet will not occur. The fix is to catch Exception rather than IOException in MinorCompactionTask when creating the output file as in this scenario a RuntimeException is thrown.

Longer analysis

TableOperationsImpl.flush calls ManagerClientServiceHandler.initiateFlush which increments
the value of the tables ZTABLE_FLUSH_ID node in ZooKeeper, and returns the new value. Then
TableOperationsImpl.flush calls ManagerClientServiceHandler.waitForFlush passing in the flushId.
ManagerClientServiceHandler.waitForFlush initially calls TabletClientHandler.flush on every tablet server in the cluster. If the client is not waiting for the flush to return then this returns and TableOperationsImpl.flush is done. If the client is waiting, then ManagerClientServiceHandler looks for tablets that are hosted or has walogs and whose flushId in the tablet metadata is less than the flushId returned from initiateFlush. Then waitForFlush continues to call TabletClientHandler.flush on the hosts that have tablets that meet this criteria for Long.MAX_VALUE iterations.

TabletClientHandler.flush does nothing if the TabletServer does not have any tablets for the table. Otherwise it gets the current value of the tables ZTABLE_FLUSH_ID, which should be the same as the flushId returned at the beginning of this process. Tablet.flush is called on each of the Tablets for the Table on this TabletServer.

If the Tablet has no entries in memory, then the tablet metadata is updated with the flushId and the variable lastFlushID is set to the flushId. If Tablet.flush() was called again on this Tablet with the same flushID, then it would do nothing.

If the Tablet has entries in memory, then Tablet.initiateMinorCompaction is called, which may or may not create a MinorCompactionTask. If one is created, then it is executed in the minorCompactionThreadPool. Tablet.initiateMinorCompaction will not return a MinorCompactionTask if the tablet is closing, is closed, if there are no entries in memory, if there is another thread updating lastFlushID, or if the Tablets memory is already reserved for a minor compaction (meaning if one is already running).

MinorCompactionTask when executed will create a new output file and call TabletServer.minorCompactionStarted, then it calls Tablet.minorCompact(). Tablet.minorCompact performs the minor compaction, then brings the minor compaction online by calling DatafileManager.bringMinorCompactionOnline, then calls TabletMemory.finalizeMinC. DatafileManager.bringMinorCompactionOnline sets lastFlushID and calls Tablet.flushComplete which sets lastFlushID and calls TabletMemory.finishedMinC.

If TabletMemory.finalizeMinC and TabletMemory.finishedMinC are not called, then it appears that the tablet memory is already reserved for a minor compaction and a subsequent minor compaction will never start. This appears to be what is happening when there is an invalid context on the table as the call to create a new output file returns a RuntimeException. Only IOExceptions are caught at this location and the Thread dies due to the uncaught exception.

@dlmarion dlmarion requested a review from ivakegg August 4, 2023 15:25
Copy link
Member

@ctubbsii ctubbsii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach here looks good to me.

Copy link
Member

@ctubbsii ctubbsii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small style and wording suggestion, but otherwise, seems fine to me.

Co-authored-by: Christopher Tubbs <ctubbsii@apache.org>
Comment on lines 114 to 120
// This should fail to split, but not leave the tablets in a state where they can't
// be unloaded
assertThrows(AccumuloServerException.class,
() -> tops.addSplits(tableName, Sets.newTreeSet(List.of(new Text("b")))));

removeInvalidClassLoaderContextPropertyWithoutValidation(getCluster().getServerContext(),
tableId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried removing the code in initiateClose and running the test locally. Made the following changes to this test to get it running.

Suggested change
// This should fail to split, but not leave the tablets in a state where they can't
// be unloaded
assertThrows(AccumuloServerException.class,
() -> tops.addSplits(tableName, Sets.newTreeSet(List.of(new Text("b")))));
removeInvalidClassLoaderContextPropertyWithoutValidation(getCluster().getServerContext(),
tableId);
Thread configFixer = new Thread(() -> {
UtilWaitThread.sleep(3000);
removeInvalidClassLoaderContextPropertyWithoutValidation(getCluster().getServerContext(),
tableId);
});
// grab this time before starting the thread starts and before it sleeps.
long t1 = System.nanoTime();
configFixer.start();
// The split will probably start running w/ bad config that will cause it to get stuck. However once the config is fixed by the background thread it should continue.
tops.addSplits(tableName, Sets.newTreeSet(List.of(new Text("b"))));
long t2 = System.nanoTime();
// expect that split took at least 3 seconds because that is the time it takes to fix the config
assertTrue(TimeUnit.NANOSECONDS.toMillis(t2-t1) > 3000);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that it works because I changed the catch clause in MinorCompactionTask from IOException to Exception. If you were to revert that single change, then I think this would fail and leave the tablet in a bad state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made these changes in 9f84ed4, however 2/3 of these new tests no longer work. I believe that is because I merged 3678 into 2.1 and into this branch. The tests that are failing are setting an invalid context, and when ClassLoaderUtil.getClassLoader is called, the context is not valid, and the non-context aware classloader is returned instead of the code returning null or throwing an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keith-turner - If #3683 is merged, then I think that these tests will work again.

@keith-turner
Copy link
Contributor

#3685 moves the first minor compaction in tablet close to happen before any variables are set that begin the close process.

@dlmarion
Copy link
Contributor Author

I re-ran the tests after #3683 and #3685 were merged into 2.1. Everything appears to be working. However, it's possible that the tests could pass without the minor compactions failing. This could happen if, for example, the minor compaction kicks off before the tablet server gets the property update to set an invalid context. I verified via log inspection that failures were happening and that it recovered. It would be good in the future to add some metrics or something that can be checked in the test to confirm that failures do happen on the minc before the configuration is corrected.

…sedTabletIT.java

Co-authored-by: Keith Turner <kturner@apache.org>
…sedTabletIT.java

Co-authored-by: Keith Turner <kturner@apache.org>
dlmarion and others added 4 commits August 11, 2023 16:20
…sedTabletIT.java

Co-authored-by: Keith Turner <kturner@apache.org>
…sedTabletIT.java

Co-authored-by: Keith Turner <kturner@apache.org>
…sedTabletIT.java

Co-authored-by: Keith Turner <kturner@apache.org>
@keith-turner
Copy link
Contributor

It would be good in the future to add some metrics or something that can be checked in the test to confirm that failures do happen on the minc before the configuration is corrected.

Wondering if a metric that counts failed minor compactions would be useful. Should normally be zero.

@dlmarion
Copy link
Contributor Author

It would be good in the future to add some metrics or something that can be checked in the test to confirm that failures do happen on the minc before the configuration is corrected.

Wondering if a metric that counts failed minor compactions would be useful. Should normally be zero.

Yeah, that's what I'm thinking.

@keith-turner
Copy link
Contributor

Yeah, that's what I'm thinking.

Ok, that is a good idea. Are you going to open an issue?

@dlmarion dlmarion merged commit e3e4c26 into apache:2.1 Aug 11, 2023
8 checks passed
2.1.2 automation moved this from In progress to Done Aug 11, 2023
@dlmarion dlmarion deleted the 3674-attempt-to-prevent-half-closed-tablet branch August 11, 2023 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
2.1.2
Done
Development

Successfully merging this pull request may close these issues.

Invalid context causes minc to fail, leaving Tablet in CLOSED state on split
6 participants