-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Salt updates #1126
Comments
working on that |
I'm also adding a warning for stage0/3 when updates are pending that would cause the daemons to restart. |
Could we add logging and tracking of states in the orchestration so that we can "continue" at the point that failed or where the orchestration was stopped? The problem is not only updating a salt-master (that IMO should not restart the salt-master automatically), the problem is also a loss of the ssh session or a reboot that might be initiated during a long running orchestration job - i.e. while doing a migration of filestore to bluestore.. |
Either we suppress the salt-master/salt-minon restart during the upgrade until we can reboot the servers (similar to the kernel-update process) or we might need to find a way to detect the hanging / broken orchestration and exit on that. I also like the idea of detecting an update for the salt-master/salt-minion but in case we do not update these services as part of the orchestration - updating all of them manually is an other big "manual" task. It would be great to have a "salt-master/salt-minion" self-update as part of the orchestration.. but I am not sure if that is possible. |
Maybe setting DISABLE_RESTART_ON_UPDATE = 1 in /etc/sysconfig/services might be a solution. Set this, run zypper up salt-master/salt-minion, restart the services manually "async" right after exiting the orchestration with the message to the admin "please restart stage 0". |
probably possible with the
Not possible because you loose connection to the bus.
That's what the posted patch does. It's not ready though.
What do you mean by that?
We also spent a thought on that. It's definitely doable with the tools we have. |
Even in case the orchestration can not update salt-master/minions automatically and orchestrated - we need to find a way to do that from a single command in the right way for all the masters and minions. Not sure if that helps - but would https://docs.saltstack.com/en/2017.7/topics/orchestrate/orchestrate_runner.html#masterless-orchestration help to update the master and something like https://www.shellhacks.com/upgrade-salt-minions/ for the minions? Maybe we can just update the salt stack (master) always before running stage.0 tasks? |
In our meeting we agreed on the least aggressive solution which would be notifying the user that DS detected a salt version which needs to be updated before stage.0 should be run. For the CI this needs to be handled differently.
|
In case we follow this road we have to make sure to tell the admin "how to update salt-master and all salt-minions" in a proper way and in a large cluster we have to provide a tool that does it as manually connecting to all cluster nodes and manually updating salt-master and salt-minion packages without using salt might not be acceptable for deepsea users. Maybe this whole topic needs to be escalated to the salt maintainers as this "salt-self-update problem" seems to exist for all salt deployments. |
I don't think that we have to go that route. It's sufficient to kick off a master & minion update via a separate(salt) state/orchestration. You might not get a result back, but you will end up with the new versions. From there on you can just re-run the stage. |
Am 17.10.2018 um 17:22 schrieb Joshua Schmid ***@***.***>:
in a proper way and in a large cluster we have to provide a tool that does it as manually connecting to all cluster
nodes and manually updating salt-master and salt-minion packages without using salt might not be acceptable for deepsea
users
I don't think that we have to go that route. It's sufficient to kick off a master & minion update via a separate(salt)
state/orchestration. You might not get a result back, but you will end up with the new versions. From there on you can
just re-run the stage.
Then we will be at the same place where we are already! When we update the salt-master via orchestration - the
orchestration process hangs „forever“.
Or how do you want to „kickoff“ the update and show progress and result to the admin?
|
The current advise is to do:
This would be a state which is expected to hang. That's something we can't change and don't need to change imo. It's just a single command that is being executed and can verified by running a zypper info right after. |
Just trying to understand this ;-). To rephrase a bit and to ensure the actions are understood for sure - please correct me if this is not correct (and maybe this can be used as text for exiting the orchestration in case salt updates are detected)? Variant 1) In case you do have deepsea_minions defined using minion targeting i.e. via execute Variant 2) In case you have deepsea_minions defined using the deepsea grain via execute Basically I see some challenges - not sure if they are really existing ;-): |
stage.0 will be stopped as this is the only stage that does a
No, if the users didn't mess with their cluster, the updates are applied the same way. We do check if a salt version is incompatible with the ceph release you are running. Secondly, it's not critical to run different minor versions of salt as long as this will be fixed early (it will by applying the states)
Manual, in the sense that we call the states ourselves as oppsed to the orchestrations. Yes, master first, then the minions in parallel.
This
is independent of the 'variant'. It's only the targeting for the minions that changes.
If this would be a problem, you couldn't even call the state. If this is not a problem you'd apply the states and end up with the same patch level across all servers.
Yes.
No.
No, see comment for variants. The message doesn't seem clear enough. How would you rephrase it to make it more clear?
Yes
At least that there are no jobs running that would interfere. In this case no 'pkg.installed' |
Not sure if I understand this reply to my question right - why are you using ceph.updates.master sometimes and ceph.updates.salt ?... ok somewhat further in the thread it gets clear with the following:
Now that the states are clear to me this is the next open point:
"Yes" on an "either or" question does not seem to be the right answer ;-) We have to make sure that the salt command we use will work and give a proper status back on the update done, successful etc. In the past I have seen that when restarting the salt-minion or salt-master during the upgrade the minion or master does NOT reply anymore.. Maybe this problem is gone and then I am fine with the above as "command finishes with compete and $?=0". So basically could we simplify the whole salt update by this process:
(in case we update deepsea as part of ceph.updates.master we have to sync the modules, too - just in case the ceph.updates.salt would rely on one of the modules we deliver) I am also not sure how this deals with someone working in openATTIC using the salt-api to run actions at the same point in time.. maybe we have to stop openATTIC before updating the salt-master, too? |
Yes, the state hangs for a salt-master update and yes, it hangs also on a minion update (I'm not 100% sure on that anymore, in the meanwhile they might reconnect to the master and report back.) |
It can't. It's just technically(at least that I know of) not possible. The salt-master is connected to a bus where it reads messages from (what jobs got triggered, what the status of those jobs is, etc). If you restart this service it doens't automatically reconnect to this. The technicality behind is, is probably connected to session-keys expiring or similar.. |
We |
Wouldn't that be up to the operator? |
That means we have the same problem with the state execution for the salt-update as with running the orchestration of stage.0 for updating salt. So basically "updating salt with using salt" is not possible and we need a second way to work around this or run the upgrade "async" and check the status later. When ever we run an upgrade of anything - we must have a return for the admin "finished successful" or "error" - and just "hanging" is not acceptable.. this is how the whole discussion started. |
Today this is true - for the future - we never know ;-) |
If there are multiple people - one managing stuff in openATTIC and an other one applying updates it might not be one single person.. Think about an admin configuring a new iSCSI LUN via openATTIC while an other one is applying a salt update. It is not obvious to the two that this will break the iSCSI administration - and while the iSCSI change is operated - it would be not the best thing to update the salt-master and salt-minion at the same point in time. |
That's another question and it's a more fundamental issue. We tracked that somewhere in the issues #219 |
right, but as stated. This is not clear to me as to how to do this. |
Check this out: https://github.com/saltstack/salt/pull/39952/files (from saltstack/salt#39952) - did not do that myself and looks complicated - basically it is a hack because I believe a systems management tookit basically needs a second daemon that is used to manage the systems management agents during updates.. |
This is basically what we discussed earlier wrt(DISABLE_RESTART_ON_UPDATE = 1 in /etc/sysconfig/services) just adapted to the debian world. At first glance this example does:
But since we just call a state that does one thing, namely updating the salt package, it's more or less the same. Do I miss something? |
Basically I believe this is the trick: "one way of handling this (on Linux and UNIX-based operating systems) is to use --> update the rpm with a job that runs AFTER the salt state is finished. This way the salt state can complete and the real action happens later. We "just" need to run this state - then wait for some time (sleep 10 ;-)) and then check for the salt-minion version. |
And this is what we're doing now. Run one single command. 'salt update'. It's the last.
The next problem with this is that we can't order in orchestrations. And we structure our stages in orchestrations. So that's not a option. I'll play around with the mentioned concept and come back with this later. However this should not block #1431 |
One thing to keep in mind is that the default of the salt-master and salt-minion rpm update is that the rpm itself does a restart of the corresponding daemon as long as |
Description of Issue/Question
DeepSea cannot gracefully upgrade Salt. This is simply a function of how orchestrations work. When the Salt master restarts, the orchestration is terminated. Shoehorning this into the reactor would likely cause more issues than it solves.
Salt updates are not terribly frequent and DeepSea is completely dependent on a working Salt cluster. I believe Stage 0 needs to fail with instructions on installing and upgrading Salt. Some users may be using Salt for other purposes and bringing these changes to their attention should prevent any confusion. Also, updating and not restarting Salt daemons will likely end in heartache and confusion when a minion is upgraded prior to the master restarting.
The text was updated successfully, but these errors were encountered: