Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure upgrades are sequential #48

Merged
merged 4 commits into from Jul 12, 2019

Conversation

anton-johansson
Copy link
Collaborator

@anton-johansson anton-johansson commented Jun 17, 2019

This way, we won't stop for example the API server or kube-proxy on all
masters or nodes at the same time, causing downtime.

To do:

  • Wait for containerd to start (see Metrics for containerd #47)
  • Verify that running the playbook work without downtime
  • Verify that actual upgrades work without downtime (I guess we can try this with 1.14.3 or 1.15.0-rc.1)

Discussions:

  • Are the delays and timeouts OK? 5 second before start checking, then check each second and timeout after 30 seconds.
  • Should these be configurable with Ansible parameters?

@anton-johansson
Copy link
Collaborator Author

anton-johansson commented Jun 19, 2019

Testing right now, noticed an issue. With serial: 1, facts aren't gathered for all hosts, which is necessary for etcd (and maybe more). Looking into it.

Edit: Solved by forcing facts gathering before running etcd tasks, as per 75785de.

@anton-johansson
Copy link
Collaborator Author

I've tested running the playbook on an existing cluster without upgrading, and it looks like it does what it's supposed to. One task on one hosts at a time.

I'll probably test this again when 1.15.0 is released (or if I'm bold enough to try a release candidate).

Would be nice to get the containerd up and running. Any ideas for how we can do that, @amimof? Is #47 an alternative?

@amimof
Copy link
Owner

amimof commented Jun 20, 2019

@anton-johansson I'm not sure if this is the right approach. To run all tasks in serial. The serial property is not supported in Ansible on a per-task basis, which means that all tasks under a play would run in serial. In large clusters that would increase the execution time significantly.

I don't believe that it's necessary to install masters with a rolling strategy. Maybe that even applies to the kubelet and kube-proxy as well, and only containerd should be restarted one host at a time. It all depends on the type of downtime. Is it workload or APIs?

What is your experience of upgrading a production cluster with KTRW?

@anton-johansson
Copy link
Collaborator Author

Hmm, okay.

I did lose connectivity to my services during a period when I upgraded from 1.13 to 1.14. I'm not sure what caused it, but my guess was kube-proxy, but it could of course be containerd. I don't see how though, because containers don't die just because containerd dies, right? kube-proxy however, handles the network proxying to the correct container, which feels super-dangerous if they all are restarted at the same time?

I understand that masters don't need to be in serial, but I would still feel more safe if I could reach my API server during the upgrade, which I can't if I restart them all at the same time.

which means that all tasks under a play would run in serial

This is pretty much what I wanted to accomplish. :) Do you know if this setting is controlled with a variable somehow? I guess it would be nice if it would default to 100%, but you could set it using some parameter. I'd set it to 1 (or maybe 50%, which would effectively be 1/3 or 2/5, etc) in my case.

Or maybe a constant 50% would suffice? It would of course increase the execution time for large clusters, but it wouldn't be that bad with 50%?

@amimof
Copy link
Owner

amimof commented Jun 20, 2019

@anton-johansson Yes kube-proxy downtime will cause network connectivity to containers to be interrupted. And i'm pretty sure that containers are restarted when containerd is restarted. Here is an example of how we could restart a service per host without use of serial. But I still haven't figured out how to check if the service is up in order to move on to the next host.

- name: Start and enable kubelet
  systemd:
    name: kubelet
    state: restarted
    enabled: True
  with_items: "{{ groups['nodes'] }}"
  run_once: True
  delegate_to: "{{ item }}"

Another solution would be to restart everything at the very end, putting a section in install.yml. Something like below. However this would make the individual roles a bit less independent in my opinion. So my idea is to pass in a state variable to the roles that can controll if the component should be restarted after it is installed. The default would be true since we want the roles to be self-contained. So the resulting install.yml could look something like this:

- hosts: localhost
  connection: local
  roles:
    - certificates

- hosts: etcd
  roles:
    - { role: etcd, vars: { restart: false }}

- hosts: masters
  roles:
    - { role: kube-apiserver, vars: { restart: false }}
    - { role: kube-controller-manager, vars: { restart: false }}
    - { role: kube-scheduler, vars: { restart: false }}

- hosts: nodes
  roles:
    - cni
    - { role: containerd, vars: { restart: false }}
    - runc
    - { role: kube-proxy, vars: { restart: false }}
    - { role: kubelet, vars: { restart: false }}

# Restart components in serial
- hosts: nodes
  serial: 1
  roles:
    - { role: containerd, vars: { restart: true }}
    - { role: kube-proxy, vars: { restart: true }}
    - { role: kubelet, vars: { restart: true }}

- hosts: etcd
  serial: 1
  roles:
    - { role: etcd, vars: { restart: true }}

- hosts: masters
  serial: 1
  roles:
    - { role: containerd, vars: { restart: true }}
    - { role: kube-proxy, vars: { restart: true }}
    - { role: kubelet, vars: { restart: true }}

@anton-johansson
Copy link
Collaborator Author

anton-johansson commented Jun 20, 2019

I like the last suggestion, where we only restart the services in serial.

But wouldn't we need two variables? One that indicates that we should prepare everything (download binaries, update configuration files, etc) and one that indicates that the service should be restarted. Both could default to true so the role is independant and can be executed standalone. But in the install playbook, first part (without serial) would only prepare, while the last part (in serial) would only restart (and wait for service to come back up)?

EDIT: Good to know that stopping containerd stops containers, I wasn't aware of that.

@amimof
Copy link
Owner

amimof commented Jun 20, 2019

Yeah you are right, we need two variables. state: present|absent and restart: true|false

@anton-johansson
Copy link
Collaborator Author

anton-johansson commented Jun 20, 2019

I can see if I can get something like that up and running.

@amimof
Copy link
Owner

amimof commented Jul 12, 2019

@anton-johansson So i've been doing some experimenting with my initial suggestion where I proposed putting all restarts in the end of the playbook. I have come to the conclusion that this is an anti-pattern in terms of Ansible best practices. This is due to how everything evolves around Plays and not Playbooks.

The "best" possible implementation (Ansible-wise) is to use Handlers. But handlers alone doesn't solve the sequential restart problem. However by combining handlers with Rolling Updates (serial) we can achieve what we wan't in a good way.

So I propose that we make sure that users may set the serial value to all plays (with a default). That way it would be up to the user to determine how updates are applied. For example in large clusters (>50 nodes) it may be perfectly fine to have 2 or more nodes down for maintenance which would translate to serial: 2. What is cool is that Ansible supports passing a percentage value to serial. So one could set serial: 5%.

I've opened a PR that adds restart-handlers to all appropriate roles. Have a look and let me know what you think

@anton-johansson
Copy link
Collaborator Author

Yeha, that sounds good. What would be a good default for this value though? I mentioned earlier in this PR that if we use 50% as the default, it would effectively be 1/3 or 2/5, etc, which feels like a good default for high availability.

Or would you want it to be 100% by default, indicating pure parallellism?

@anton-johansson
Copy link
Collaborator Author

We can move the discussion over to #52. Closing this!

@anton-johansson
Copy link
Collaborator Author

My mistake, this PR is not really related to #52. Re-opening and see if I can make the appropriate changes.

@anton-johansson
Copy link
Collaborator Author

I've made some changes @amimof, but I have yet to test this thoroughly. I'll remove [WIP] from the title when tested.

@anton-johansson
Copy link
Collaborator Author

It appears that host variables cannot be used in the scope of the playbook, only tasks within the playbook.

See ansible/ansible#18131

How about using --extra-vars?

For example:

$ ansible-playbook --inventory ansible-inventory --extra-vars "serial_masters=50%" ~/projects/kubernetes-the-right-way/install.yml

@anton-johansson anton-johansson force-pushed the sequential-upgrades branch 3 times, most recently from b554eca to 953e0c6 Compare July 12, 2019 11:55
@anton-johansson
Copy link
Collaborator Author

I've made some changes to utilize the command line argument --extra-vars. This works well. As you can see, I had to gather facts for all etcd hosts, because when using serial it wont gather facts for all of them immediately.

I think this needs to be done for kube-apiserver as well, if we want plays to be independant of other plays. Could you confirm this?

Also, this means that we'll run the facts gathering even if we don't need to (i.e. if we don't use serial). Do you know if there's a way to prevent this? Maybe use when? Something like:

when: hostvars[item]['ansible_hostname'] is not defined

@anton-johansson anton-johansson changed the title [WIP] Make sure upgrades are sequential Make sure upgrades are sequential Jul 12, 2019
@amimof
Copy link
Owner

amimof commented Jul 12, 2019

I've made some changes to utilize the command line argument --extra-vars. This works well. As you can see, I had to gather facts for all etcd hosts, because when using serial it wont gather facts for all of them immediately.

I think this needs to be done for kube-apiserver as well, if we want plays to be independant of other plays. Could you confirm this?

Also, this means that we'll run the facts gathering even if we don't need to (i.e. if we don't use serial). Do you know if there's a way to prevent this? Maybe use when? Something like:

when: hostvars[item]['ansible_hostname'] is not defined

Nice job! I also encountered that serial_* needs to passed as an extra variable but I think it's fine for now. I can confirm that gathering facts is also required for kube-apiserver. See:

--etcd-servers={% for h in groups['etcd'] %}https://{{ hostvars[h].ansible_default_ipv4.address }}:2379{% if not loop.last %},{% endif %}{% endfor %} \

Also I don't mind gathering facts even if it isn't needed. It should be possible with a simple when but this needs to be tested.

@anton-johansson
Copy link
Collaborator Author

anton-johansson commented Jul 12, 2019

Fixed kube-apiserver.

Also squashed together some commits to avoid redundant messages. Kept some commit separate because they're actually informational.

README.md Outdated Show resolved Hide resolved
This is useful for high availability cluster upgrades or modifications.
If we use `serial_all` or `serial_etcd`, not all hosts will have their
facts gathered and therefore the templates won't be generated properly.
…server`

These facts usually already exists because you run the `etcd` play before `kube-apiserver`. However,
to make the plays independant of eachother, we must make sure that the `kube-apiserver` play also
gathers facts about `etcd` hosts, since it needs them in their systemd unit file.
@amimof
Copy link
Owner

amimof commented Jul 12, 2019

@anton-johansson It might be a good idea to also make use of serial in the CI pipeline by passing -e "serial_all=50%" to

- ansible-playbook -b test/main.yml

Other than that LGTM and should be good to merge

This way, we'll catch errors caused by missing facts due to limits from serial executions
@anton-johansson
Copy link
Collaborator Author

Good idea! Done, lets see what Travis says. :)

@amimof amimof merged commit 9283797 into amimof:master Jul 12, 2019
@anton-johansson anton-johansson deleted the sequential-upgrades branch July 12, 2019 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants