Make sure upgrades are sequential #48

anton-johansson · 2019-06-17T15:18:52Z

This way, we won't stop for example the API server or kube-proxy on all
masters or nodes at the same time, causing downtime.

To do:

Wait for containerd to start (see Metrics for containerd #47)
Verify that running the playbook work without downtime
Verify that actual upgrades work without downtime (I guess we can try this with 1.14.3 or 1.15.0-rc.1)

Discussions:

Are the delays and timeouts OK? 5 second before start checking, then check each second and timeout after 30 seconds.
Should these be configurable with Ansible parameters?

anton-johansson · 2019-06-19T12:08:42Z

Testing right now, noticed an issue. With serial: 1, facts aren't gathered for all hosts, which is necessary for etcd (and maybe more). Looking into it.

Edit: Solved by forcing facts gathering before running etcd tasks, as per 75785de.

anton-johansson · 2019-06-19T13:50:04Z

I've tested running the playbook on an existing cluster without upgrading, and it looks like it does what it's supposed to. One task on one hosts at a time.

I'll probably test this again when 1.15.0 is released (or if I'm bold enough to try a release candidate).

Would be nice to get the containerd up and running. Any ideas for how we can do that, @amimof? Is #47 an alternative?

amimof · 2019-06-20T06:35:13Z

@anton-johansson I'm not sure if this is the right approach. To run all tasks in serial. The serial property is not supported in Ansible on a per-task basis, which means that all tasks under a play would run in serial. In large clusters that would increase the execution time significantly.

I don't believe that it's necessary to install masters with a rolling strategy. Maybe that even applies to the kubelet and kube-proxy as well, and only containerd should be restarted one host at a time. It all depends on the type of downtime. Is it workload or APIs?

What is your experience of upgrading a production cluster with KTRW?

anton-johansson · 2019-06-20T08:05:04Z

Hmm, okay.

I did lose connectivity to my services during a period when I upgraded from 1.13 to 1.14. I'm not sure what caused it, but my guess was kube-proxy, but it could of course be containerd. I don't see how though, because containers don't die just because containerd dies, right? kube-proxy however, handles the network proxying to the correct container, which feels super-dangerous if they all are restarted at the same time?

I understand that masters don't need to be in serial, but I would still feel more safe if I could reach my API server during the upgrade, which I can't if I restart them all at the same time.

which means that all tasks under a play would run in serial

This is pretty much what I wanted to accomplish. :) Do you know if this setting is controlled with a variable somehow? I guess it would be nice if it would default to 100%, but you could set it using some parameter. I'd set it to 1 (or maybe 50%, which would effectively be 1/3 or 2/5, etc) in my case.

Or maybe a constant 50% would suffice? It would of course increase the execution time for large clusters, but it wouldn't be that bad with 50%?

amimof · 2019-06-20T08:10:07Z

@anton-johansson Yes kube-proxy downtime will cause network connectivity to containers to be interrupted. And i'm pretty sure that containers are restarted when containerd is restarted. Here is an example of how we could restart a service per host without use of serial. But I still haven't figured out how to check if the service is up in order to move on to the next host.

- name: Start and enable kubelet
  systemd:
    name: kubelet
    state: restarted
    enabled: True
  with_items: "{{ groups['nodes'] }}"
  run_once: True
  delegate_to: "{{ item }}"

Another solution would be to restart everything at the very end, putting a section in install.yml. Something like below. However this would make the individual roles a bit less independent in my opinion. So my idea is to pass in a state variable to the roles that can controll if the component should be restarted after it is installed. The default would be true since we want the roles to be self-contained. So the resulting install.yml could look something like this:

- hosts: localhost
  connection: local
  roles:
    - certificates

- hosts: etcd
  roles:
    - { role: etcd, vars: { restart: false }}

- hosts: masters
  roles:
    - { role: kube-apiserver, vars: { restart: false }}
    - { role: kube-controller-manager, vars: { restart: false }}
    - { role: kube-scheduler, vars: { restart: false }}

- hosts: nodes
  roles:
    - cni
    - { role: containerd, vars: { restart: false }}
    - runc
    - { role: kube-proxy, vars: { restart: false }}
    - { role: kubelet, vars: { restart: false }}

# Restart components in serial
- hosts: nodes
  serial: 1
  roles:
    - { role: containerd, vars: { restart: true }}
    - { role: kube-proxy, vars: { restart: true }}
    - { role: kubelet, vars: { restart: true }}

- hosts: etcd
  serial: 1
  roles:
    - { role: etcd, vars: { restart: true }}

- hosts: masters
  serial: 1
  roles:
    - { role: containerd, vars: { restart: true }}
    - { role: kube-proxy, vars: { restart: true }}
    - { role: kubelet, vars: { restart: true }}

anton-johansson · 2019-06-20T08:43:36Z

I like the last suggestion, where we only restart the services in serial.

But wouldn't we need two variables? One that indicates that we should prepare everything (download binaries, update configuration files, etc) and one that indicates that the service should be restarted. Both could default to true so the role is independant and can be executed standalone. But in the install playbook, first part (without serial) would only prepare, while the last part (in serial) would only restart (and wait for service to come back up)?

EDIT: Good to know that stopping containerd stops containers, I wasn't aware of that.

amimof · 2019-06-20T08:45:54Z

Yeah you are right, we need two variables. state: present|absent and restart: true|false

anton-johansson · 2019-06-20T10:16:45Z

I can see if I can get something like that up and running.

amimof · 2019-07-12T09:19:17Z

@anton-johansson So i've been doing some experimenting with my initial suggestion where I proposed putting all restarts in the end of the playbook. I have come to the conclusion that this is an anti-pattern in terms of Ansible best practices. This is due to how everything evolves around Plays and not Playbooks.

The "best" possible implementation (Ansible-wise) is to use Handlers. But handlers alone doesn't solve the sequential restart problem. However by combining handlers with Rolling Updates (serial) we can achieve what we wan't in a good way.

So I propose that we make sure that users may set the serial value to all plays (with a default). That way it would be up to the user to determine how updates are applied. For example in large clusters (>50 nodes) it may be perfectly fine to have 2 or more nodes down for maintenance which would translate to serial: 2. What is cool is that Ansible supports passing a percentage value to serial. So one could set serial: 5%.

I've opened a PR that adds restart-handlers to all appropriate roles. Have a look and let me know what you think

anton-johansson · 2019-07-12T09:31:48Z

Yeha, that sounds good. What would be a good default for this value though? I mentioned earlier in this PR that if we use 50% as the default, it would effectively be 1/3 or 2/5, etc, which feels like a good default for high availability.

Or would you want it to be 100% by default, indicating pure parallellism?

anton-johansson · 2019-07-12T09:39:15Z

We can move the discussion over to #52. Closing this!

anton-johansson · 2019-07-12T09:54:41Z

My mistake, this PR is not really related to #52. Re-opening and see if I can make the appropriate changes.

anton-johansson · 2019-07-12T11:22:11Z

I've made some changes @amimof, but I have yet to test this thoroughly. I'll remove [WIP] from the title when tested.

anton-johansson · 2019-07-12T11:45:15Z

It appears that host variables cannot be used in the scope of the playbook, only tasks within the playbook.

See ansible/ansible#18131

How about using --extra-vars?

For example:

$ ansible-playbook --inventory ansible-inventory --extra-vars "serial_masters=50%" ~/projects/kubernetes-the-right-way/install.yml

anton-johansson · 2019-07-12T13:13:39Z

I've made some changes to utilize the command line argument --extra-vars. This works well. As you can see, I had to gather facts for all etcd hosts, because when using serial it wont gather facts for all of them immediately.

I think this needs to be done for kube-apiserver as well, if we want plays to be independant of other plays. Could you confirm this?

Also, this means that we'll run the facts gathering even if we don't need to (i.e. if we don't use serial). Do you know if there's a way to prevent this? Maybe use when? Something like:

when: hostvars[item]['ansible_hostname'] is not defined

amimof · 2019-07-12T14:13:45Z

I've made some changes to utilize the command line argument --extra-vars. This works well. As you can see, I had to gather facts for all etcd hosts, because when using serial it wont gather facts for all of them immediately.

I think this needs to be done for kube-apiserver as well, if we want plays to be independant of other plays. Could you confirm this?

Also, this means that we'll run the facts gathering even if we don't need to (i.e. if we don't use serial). Do you know if there's a way to prevent this? Maybe use when? Something like:
when: hostvars[item]['ansible_hostname'] is not defined

Nice job! I also encountered that serial_* needs to passed as an extra variable but I think it's fine for now. I can confirm that gathering facts is also required for kube-apiserver. See:

kubernetes-the-right-way/roles/kube-apiserver/templates/kube-apiserver.service.j2

Line 21 in 1499f86

    
           --etcd-servers={% for h in groups['etcd'] %}https://{{ hostvars[h].ansible_default_ipv4.address }}:2379{% if not loop.last %},{% endif %}{% endfor %} \

Also I don't mind gathering facts even if it isn't needed. It should be possible with a simple when but this needs to be tested.

anton-johansson · 2019-07-12T14:19:01Z

Fixed kube-apiserver.

Also squashed together some commits to avoid redundant messages. Kept some commit separate because they're actually informational.

README.md

This is useful for high availability cluster upgrades or modifications.

If we use `serial_all` or `serial_etcd`, not all hosts will have their facts gathered and therefore the templates won't be generated properly.

…server` These facts usually already exists because you run the `etcd` play before `kube-apiserver`. However, to make the plays independant of eachother, we must make sure that the `kube-apiserver` play also gathers facts about `etcd` hosts, since it needs them in their systemd unit file.

amimof · 2019-07-12T14:40:58Z

@anton-johansson It might be a good idea to also make use of serial in the CI pipeline by passing -e "serial_all=50%" to

kubernetes-the-right-way/.travis.yml

Line 22 in 1499f86

- ansible-playbook -b test/main.yml

Other than that LGTM and should be good to merge

This way, we'll catch errors caused by missing facts due to limits from serial executions

anton-johansson · 2019-07-12T14:44:10Z

Good idea! Done, lets see what Travis says. :)

anton-johansson closed this Jul 12, 2019

amimof mentioned this pull request Jul 12, 2019

Restart services with handlers #52

Merged

anton-johansson reopened this Jul 12, 2019

anton-johansson force-pushed the sequential-upgrades branch from 75785de to 7d90381 Compare July 12, 2019 11:20

anton-johansson force-pushed the sequential-upgrades branch 3 times, most recently from b554eca to 953e0c6 Compare July 12, 2019 11:55

anton-johansson changed the title ~~[WIP] Make sure upgrades are sequential~~ Make sure upgrades are sequential Jul 12, 2019

amimof reviewed Jul 12, 2019

View reviewed changes

README.md Outdated Show resolved Hide resolved

anton-johansson added 3 commits July 12, 2019 16:21

Allow executing playbook in serial

a62a101

This is useful for high availability cluster upgrades or modifications.

Make sure to gather facts for all etcd hosts

a1b5d31

If we use `serial_all` or `serial_etcd`, not all hosts will have their facts gathered and therefore the templates won't be generated properly.

anton-johansson force-pushed the sequential-upgrades branch from 7f7b480 to 6502bf5 Compare July 12, 2019 14:21

Run the playbook on Travis in serial

73d01cd

This way, we'll catch errors caused by missing facts due to limits from serial executions

amimof merged commit 9283797 into amimof:master Jul 12, 2019

anton-johansson deleted the sequential-upgrades branch July 12, 2019 15:28

anton-johansson mentioned this pull request Feb 25, 2020

Downtime during upgrades #62

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure upgrades are sequential #48

Make sure upgrades are sequential #48

anton-johansson commented Jun 17, 2019 •

edited

anton-johansson commented Jun 19, 2019 •

edited

anton-johansson commented Jun 19, 2019

amimof commented Jun 20, 2019 •

edited

anton-johansson commented Jun 20, 2019

amimof commented Jun 20, 2019 •

edited

anton-johansson commented Jun 20, 2019 •

edited

amimof commented Jun 20, 2019

anton-johansson commented Jun 20, 2019 •

edited

amimof commented Jul 12, 2019

anton-johansson commented Jul 12, 2019

anton-johansson commented Jul 12, 2019

anton-johansson commented Jul 12, 2019

anton-johansson commented Jul 12, 2019

anton-johansson commented Jul 12, 2019

anton-johansson commented Jul 12, 2019

amimof commented Jul 12, 2019 •

edited

anton-johansson commented Jul 12, 2019 •

edited

amimof commented Jul 12, 2019

anton-johansson commented Jul 12, 2019

Make sure upgrades are sequential #48

Make sure upgrades are sequential #48

Conversation

anton-johansson commented Jun 17, 2019 • edited

anton-johansson commented Jun 19, 2019 • edited

anton-johansson commented Jun 19, 2019

amimof commented Jun 20, 2019 • edited

anton-johansson commented Jun 20, 2019

amimof commented Jun 20, 2019 • edited

anton-johansson commented Jun 20, 2019 • edited

amimof commented Jun 20, 2019

anton-johansson commented Jun 20, 2019 • edited

amimof commented Jul 12, 2019

anton-johansson commented Jul 12, 2019

anton-johansson commented Jul 12, 2019

anton-johansson commented Jul 12, 2019

anton-johansson commented Jul 12, 2019

anton-johansson commented Jul 12, 2019

anton-johansson commented Jul 12, 2019

amimof commented Jul 12, 2019 • edited

anton-johansson commented Jul 12, 2019 • edited

amimof commented Jul 12, 2019

anton-johansson commented Jul 12, 2019

anton-johansson commented Jun 17, 2019 •

edited

anton-johansson commented Jun 19, 2019 •

edited

amimof commented Jun 20, 2019 •

edited

amimof commented Jun 20, 2019 •

edited

anton-johansson commented Jun 20, 2019 •

edited

anton-johansson commented Jun 20, 2019 •

edited

amimof commented Jul 12, 2019 •

edited

anton-johansson commented Jul 12, 2019 •

edited