New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed a number of problems with forced recoveries #801

Merged
merged 24 commits into from Oct 3, 2018

Conversation

Projects
None yet
2 participants
@etschannen
Copy link
Contributor

etschannen commented Oct 3, 2018

No description provided.

etschannen added some commits Sep 18, 2018

added a workload which tests killing an entire region, and recovering…
… from the failure with data loss.

fix: we cannot pop the txs tag from remote logs until they have a full copy of the txnStateStore
fix: we have to modify all of history, we cannot stop after finding a local remote
fix: we have to use durableKnownCommittedVersion, because the is the …
…true lower bound on the recovery version of the remote logs

fixed a compiler error
fix: do not force a recovery if the master was already in the other r…
…egion (and therefore already recovered)

fix: reboot the remaining DC, because any storage server rejoins that were rolled back will cause that server to be unusable
fix: the cluster controller did not pass in its own locality when cre…
…ating its database object, therefore it was not using locality aware load balancing
fix: during forced recovery logs can be removed from the logSystemCon…
…fig. We need to avoid killing the removed logs as unneeded until we actually complete the recovery
fix: data distribution would use the wrong priority sometimes when fi…
…xing an incomplete movement, this lead to the cluster thinking the data was replicated in all regions before it actually was
fix: for remote logs, their known committed version cannot be set to …
…1, because they can be used when their durable version is 0, leading to a known committed version being greater than a queue committed version
fix: forced recovery should remove all references to the old primary …
…tlogs in all generations of logs to help the peek logic avoid attempting to read from them
fix: the latest generation of remote transaction logs might has less …
…data the a previous generation, because they take over at known committed version. Detect this case and end at the version that has the most data
fix: the storage server must always keep MAX_READ_TRANSACTION_LIFE_VE…
…RSIONS of history in memory, because forced recovery could roll back an epoch end.

fix: rollbacks were triggered unnecessarily

@etschannen etschannen requested a review from ajbeamon Oct 3, 2018

@etschannen etschannen merged commit 598788f into apple:release-6.0 Oct 3, 2018

for(auto &it : remoteLogs.get()) {
replies.push_back(brokenPromiseToNever( it.interf().getQueuingMetrics.getReply( TLogQueuingMetricsRequest() ) ));
}
Void _ = wait( waitForAll(replies) );

This comment has been minimized.

@ajbeamon

ajbeamon Oct 8, 2018

Contributor

Is there any value in also waiting for onChange here?

@@ -837,6 +839,14 @@ ACTOR Future<Void> commitBatch(
}
Void _ = wait(yield());

if(!self->txsPopVersions.size() || msg.popTo > self->txsPopVersions.back().second) {
if(self->txsPopVersions.size() > SERVER_KNOBS->MAX_TXS_POP_VERSION_HISTORY) {

This comment has been minimized.

@ajbeamon

ajbeamon Oct 8, 2018

Contributor

Probably not a big deal, but I think this is an off-by-one. Using >= would limit txsPopVersions to the correct size.

TLogQueuingMetricsReply reply;
reply.localTime = now();
reply.instanceID = self->instanceID;
reply.bytesInput = self->bytesInput;
reply.bytesDurable = self->bytesDurable;
reply.storageBytes = self->persistentData->getStorageBytes();
reply.v = self->prevVersion;
reply.v = logData->durableKnownCommittedVersion;

This comment has been minimized.

@ajbeamon

ajbeamon Oct 8, 2018

Contributor

I can't remember if this is intended to be temporary, but if so, should we add a comment saying so?

@@ -2571,17 +2572,26 @@ ACTOR Future<Void> update( StorageServer* data, bool* pReceivedUpdate )
data->version.set( ver ); // Triggers replies to waiting gets for new version(s)
if (data->otherError.getFuture().isReady()) data->otherError.getFuture().get();

Version maxVersionInMemory = SERVER_KNOBS->MAX_READ_TRANSACTION_LIFE_VERSIONS;

This comment has been minimized.

@ajbeamon

ajbeamon Oct 8, 2018

Contributor

I think renaming this maxVersionsInMemory may be a little more clear that it's a number of versions rather than a specific version.

@@ -2344,9 +2344,10 @@ class StorageUpdater {
}
// Don't let oldestVersion (and thus storageVersion) go into the rolled back range of versions
// Since we currently don't read from uncommitted log systems, seeing the lastEpochEnd implies that currentVersion is fully committed, so we can safely make it durable
newOldestVersion = currentVersion;
if ( rollbackVersion < fromVersion )
if ( rollbackVersion < fromVersion && rollbackVersion > restoredVersion )

This comment has been minimized.

@ajbeamon

ajbeamon Oct 8, 2018

Contributor

Move to the previous if block

@@ -2344,9 +2344,10 @@ class StorageUpdater {
}
// Don't let oldestVersion (and thus storageVersion) go into the rolled back range of versions
// Since we currently don't read from uncommitted log systems, seeing the lastEpochEnd implies that currentVersion is fully committed, so we can safely make it durable

This comment has been minimized.

@ajbeamon

ajbeamon Oct 8, 2018

Contributor

Comment can be deleted

}

ErrorOr<Void> _ = wait(clusterInterface->get().get().forceRecovery.tryGetReply( ForceRecoveryRequest() ));

This comment has been minimized.

@ajbeamon

ajbeamon Oct 8, 2018

Contributor

One more thing -- I just want to make sure there's no need to interrupt this wait if the cluster interface changes while it's in flight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment