-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rolling update & upgrade #43
Conversation
8fa21f3
to
cfad9b7
Compare
So we will most likely end up with something like this: Phase1)
Repeat for all other roles -> osd -> rgw -> mds -> igw you now have applied the kernel updates without downtime. Phase2)
Repeat for all other roles -> osd -> rgw -> mds -> igw you are now running the latest version of ceph & kernel without downtime. In the grand scheme of things admins want to run this periodically and automated as a part of stage 0-5. Might be suitable to add this to the existing stages, although the behavior of due to the need of these two phases I separated The said sls files are not adapted yet and don't do what they are supposed to do. WIP installing everything BUT the kernel might be parsed inside jinja with something like:
but this might actually be a good candidate for a module as I can see it failing already. Zypper parsing might also be very helpful for the service.restart decision |
19e5f55
to
cfad9b7
Compare
the restarting condition for each service might look like this:
with the according unless block:
|
ceb8569
to
cfad9b7
Compare
any comments on that? |
I'm not sure I agree with this particular approach. Precisely because you want to reduce the amount of reboots. Also, the updates might depend on each other - e.g., a new ceph-* package depends on a glibc update, iSCSI requires a new kernel version, etc. I believe iterating over the nodes in the right sequence and upgrading them entirely in one step (stop, dup, restart, reboot if necessary) is the saner choice. 15 minutes really is not much. Two reboots are worse. (We'll have users upgrading the nodes one by one over days, I guarantee it.) |
Thanks for the feedback! I think Eric's initial plan was to run Stage 0-5 in a regular fashion ( 0..10 times a day). If you follow that path my proposed solution makes sense because minor updates won't effect ceph's ability to function properly hence no action is needed. That path acutually reduces the amount of reboots. And due to that assumption we decided to go for a differentiated approach of only applying a restart or a reboot if it's actually needed. Your point on dependencies is correct, though. We have to agree on packages that will cause a reboot - Currently it's only a kernel update. If you for example start the update process at the monitors the following scenario might eventuate: If you followed the solution of just If you only installed - if there is - the latest kernel & reboot the machine, nothing serious happens and the ceph versions are still in sync. Now you can safely apply the remaining updates, which won't cause any reboots, and sequentially restart the ceph services. |
Services might fail at any time anyway and then be restarted on the newer version. That's something that must work, otherwise we've got a potential problem. Or they could reboot for whatever reason. Restarting individual services on a node is pretty much a question of luck (unless containers were to be used). I don't think we need to over-optimize for this case. As for the reboot detection, zypper actually knows after the update has been applied: On other distros, this causes /var/run/reboot-required, not sure if this also happens on SUSE. |
If that's the case I'd support your approach aswell.
That is indeed pretty helpful, thanks for that valuable information. So what you propose is splitting the 'restart' and the 'update' part, right? If you as a user decide to update we iterate over the roles/nodes and let zypper decide if a reboot is necessary. The |
Perhaps. What I'd do would be to:
|
One thing I simply don't know is what could possibly happen if ceph runs on different versions for a longer period of time.Going with the all-in-one update and reboot strategy could theoretically cause a cluster to be out of sync(version-wise) for some hours. Secondly I'd like know what the advantage
over:
That would potentially save one round of restarts + a bit of cluster shakiness.. But If you know any potential problems that might occur with updating while the process is running, this might be a tradeoff we have to accept.
Well, true. Going with the update-reboot solution we loose the ability to control the order in which we restart services to some extend.. We might need some practical feedback/testing/eval if that causes any problems.. |
That should not be any problem at all. Ceph undergoes extensive upgrade testing upstream. Another question is whether the users will be prepared (psychologically speaking) for the update to take several hours. |
Thats good to hear.
I guess we need to document the process very precisely to make sure they understand why it might take so long. @smithfarm Does upstream also test a random order? That is not strating as recommended with the MON->OSD .. ? |
No, it's not random. But they do test different orders. See e.g. https://github.com/ceph/ceph/blob/master/qa/suites/upgrade/jewel-x/parallel/3-upgrade-sequence/upgrade-mon-osd-mds.yaml |
great, thanks for the hint. So the last question is whether to:
|
29fea7f
to
25e24e6
Compare
@l-mb @smithfarm Just for clarification, when either of you say "random" order, both of you are okay with a standard approach of monitors, then storage, then remaining in general? For the Ceph cluster with dedicated nodes, there's no real issue even with reboots. My concern is when things go bad, how does that leave the cluster and what is the admin left holding? I guess six partial upgrades leaves me edgy, where as two complete upgrades followed by a failure is easy to describe. |
@swiftgist Trying to order the nodes is fine, but ordering them is probably not a hard mandatory aspect. If the update fails for any node, flag that node, abort the update sequence? And once the admin has manually resolved whatever problem occurred, they can just restart the update process? We don't even need to start in the middle, since upgrading an already upgraded node is effectively a no-op and would just be skipped anyway. |
7c2c089
to
fbfc57b
Compare
That pullrequest tries to archive:
|
e4fd7b6
to
bc2be5d
Compare
Signed-off-by: Eric Jackson <ejackson@suse.com>
f697b0f
to
e48e47a
Compare
log.info("returncode: {}".format(proc.returncode)) | ||
if proc.returncode == 0: | ||
if os.path.isfile('/var/run/reboot-required'): | ||
self._reboot() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a total grasp of how all these pieces are fitting together, so I want to make sure I'm understanding it. From my understanding, I'm seeing a possibility for issuing mulitple reboots between this code (the PackageManager
classes) and the sls file that does upgrades.
On this line, I see that when up
or dup
is called, the package manager (Apt in this case, but a similar line exists for Zypper patch) issues a reboot automatically.
In the SLS file, there is also a reboot line. When PackageManager
issues a reboot, does it set auto_reboot
in the pillar to False
so there won't be a double reboot?
I know @jschmid1 also mentioned needing to check old vs new kernel version since zypper has some bugs/lack of reboot reporting to work around. Would it make sense to instead of issuing a reboot on this line to instead set the auto_reboot
to True
and let that be handled in the SLS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added autoreboot: False|True
to globally set the default behaviour on whether deepsea has the right to 'automagically reboot' in case the system requests it. So see it as a 'user variable' rather than an internal one.
My intention was to do it the way around. I mentioned in another comment that changing to 'zypper patch' will get rid of the ceph/update/reboot sls file.
and so technically double reboots won't be possible in zypper right now, as the 107 will never be received (due to zypper up's missing ret implementation). They may be with Apt but as Packagemanager
will cause a reboot, the next step in the orchestrate file (after the reboot) is ceph.upgrade.reboot which compares the 'installed' vs 'used' kernel, which is the same after the reboot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OH! Okay. I am following now. I didn't realize the auto_reboot
was a gate to allow that to happen at all. #samepage
Sounds all good. Zypper makes it non-ideal it seems, but I think I have a good idea of what's happening now. Thanks!
* allow to globally choose the update_method_init which can enable you to use zypper.patch * add zypper.patch handle register
merged into wip-updates-and-restart |
merged with #222, closing |
This serves the purpose of solving the problem of updating & rebooting. [0] -> detailed explanation
With this decoupled approach we update the kernel & reboot per-role sequentially and in correct order ( mon -> osd .. etc ) and hardfail if something goes wrong.
Now you safely run ( for now called maintenance - naming needs to be straightened out though ) which installs everything BUT the kernel ( not implemented ) and does a graceful restart which was previous implemented with
ceph.restart
.[0]:
If you currently run stage0 it will first install all updates and subsequently check if a kernel update was applied. If so DeepSea initializes a reboot. The issue with this is that you don't know whether you have a new ceph-* binary added to your system. In a larger scale cluster you want to run on different versions of ceph as short as possible. Reboots might take up to 15 minutes per node which means you end up with different versions of ceph-* for a rather long time.
Signed-off-by: Joshua Schmid jschmid@suse.de