New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not trigger better master exists if the cluster controller is excluded #784

Merged
merged 3 commits into from Sep 21, 2018

Conversation

Projects
None yet
2 participants
@etschannen
Copy link
Contributor

etschannen commented Sep 20, 2018

No description provided.

do not trigger better master exists if the cluster controller is excl…
…uded, since the master will change anyways once the cluster controller is moved

@etschannen etschannen requested a review from ajbeamon Sep 20, 2018

}

auto ccWorker = id_worker.find(clusterControllerProcessId.get());
if(ccWorker == id_worker.end()) {

This comment has been minimized.

@ajbeamon

ajbeamon Sep 20, 2018

Contributor

If I understand correctly, the only way this can happen is if the cluster controller worker is detected as failed in workerAvailabilityWatch. If that is possible for the cluster controller process, is there a reason we aren't clearing clusterControllerProcessId at the same time (which would make this check unnecessary)?

There are some places that access id_worker[clusterControllerProcessId] without checking for its presence as far as I can tell, and if the worker has already been erased this is going to recreate it. That could result in some memory being lost to these entries, and it may also result in wrong behavior (I haven't really checked).

return false;
}

auto ccWorker = id_worker.find(clusterControllerProcessId.get());

This comment has been minimized.

@ajbeamon

ajbeamon Sep 20, 2018

Contributor

It looks like id_worker has Optional<Standalone<StringRef>> as its key, so I don't think we need to call .get().

etschannen added some commits Sep 20, 2018

fix: tlog spill policy would spill everything when it wanted to spill…
… nothing

use a flow lock to protect updatePersistData and initPersistentState from committing simultaneously
The cluster controller should never consider itself as failed (that w…
…ill be handled by the coordinators)

Simplified the check that the cluster controller is excluded
@@ -802,6 +802,11 @@ class ClusterControllerData {
return false;
}

// Do not trigger better master exists if the cluster controller is excluded, since the master will change anyways once the cluster controller is moved
if(id_worker[clusterControllerProcessId].priorityInfo.isExcluded) {

This comment has been minimized.

@ajbeamon

ajbeamon Sep 21, 2018

Contributor

If clusterControllerProcessId is not set, then I think we end up with isExcluded being set to false (because we presumably get a default constructed WorkerInfo), and we proceed with running betterMasterExists. I haven't determined whether such a thing is possible, but if it is, I just want to confirm this is the intended behavior.

This comment has been minimized.

@etschannen

etschannen Sep 21, 2018

Contributor

The cluster controller cannot recruit a master before clusterControllerProcessId is set, and once it is set it (and its worker in id_worker) can never be removed, Therefore all the places that use it blindly are safe because all of that code (including this one) can only ever be run after the first master is recruited

@ajbeamon ajbeamon merged commit 14ab620 into apple:release-6.0 Sep 21, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment