Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: ADR Multi-Node Kubernetes Clusters on init #2243

75 changes: 75 additions & 0 deletions adr/0023-multi-node-kubernetes.md
Copy link
Member

@lucasrod16 lucasrod16 Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the k3s documentation, I believe the minimum steps to create a highly available control plane would be as follows:

  1. Launch a server node with the --cluster-init flag to enable clustering and a token that will be used as a shared secret to join additional servers to the cluster.

  2. After launching the first server, join the second and third servers to the cluster using the shared secret and pointing to the first launched server node with the --server flag.

    • The docs state:

    An HA K3s cluster with embedded etcd is composed of:
    Three or more server nodes that will serve the Kubernetes API and run other control plane services, as well as host the embedded etcd datastore.

  3. Optionally, add agent nodes to the cluster to run workloads on.

Could you update the ADR to reflect this?

https://docs.k3s.io/datastore/ha-embedded

Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# 22. Multi-Node Kubernetes with `zarf init`

Date: 2024-01-23

## Status

Pending

## Context

Currently today, when allowing `zarf init` to handle cluster creation, Zarf doesn't have ability to automatically or semi-automatically provision itself across multiple nodes. The idea here would be to allow for horizontal scalability across multiple virtual or physical nodes for site reliability and automatic failover.

The first pass would consider scaling horizontally with the [High Availability Embedded etcd](https://docs.k3s.io/datastore/ha-embedded) model. There would be minimal changes on the current k3s.service config. The change required here would be to include a shared token. By default, if K3S doesn't receive a token it will auto generate one. You can reset this on an existing cluster by running `k3s token rotate --new-token=foo`.

If one wanted to specify a token in advance they could simply modify their existing `zarf deploy/init`, `--set K3S_ARGS` command to include `--token=foo`

For example:
```shell
zarf init --components=k3s,git-server --confirm --set K3S_ARGS=\"--disable=traefik --disable=metrics-server --disable=servicelb --tls-san=1.2.3.4 --token=foo\""
```

This results in a line as such for example:

```ini
ExecStart=/usr/sbin/k3s server --write-kubeconfig-mode=700 --write-kubeconfig /root/.kube/config --disable=traefik --disable=metrics-server --disable=servicelb --tls-san=1.2.3.4 --token=foo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we will need to use the cluster-init flag to enable clustering in HA embedded etcd mode:

To get started, first launch a server node with the cluster-init flag to enable clustering and a token that will be used as a shared secret to join additional servers to the cluster.

https://docs.k3s.io/datastore/ha-embedded
https://docs.k3s.io/cli/server#cluster-options

This is what tells k3s to use the embedded etcd datastore rather than the default SQLite datastore

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are right, I wrote this before testing out multiple master nodes.

```

The difference on the agent side requires a few changes. We must specify three pieces of information:

* That we want to spin up a K3S agent only, not any other Zarf components.
* The IP of the `server`.
* The shared token specified when creating the `server`.

This would need to be the results k3s.service file.

```ini
ExecStart=/usr/sbin/k3s agent --server=https://1.2.3.4:6443 --token=foo
```

One approach could be to introduce constants into [k3s.service](packages/distros/k3s/common/k3s.service), that would allow us to reuse it. A new component would essentially set some of those variables.

For example:

| Variable | Server | Agent |
|---------------------------------|---------------------------------------------------|---------|
| `###ZARF_CONST_K3S_MODE###` | `server` | `agent` |
| `###ZARF_CONST_K3S_INTERNAL###` | ` --write-kubeconfig-mode=700 --write-kubeconfig` | empty |
| `###ZARF_VAR_K3S_ARGS###` | `--token=foo` | `--server https://1.2.3.4:6443 --token=foo` |

The new k3s.service file would look like:

```init
ExecStart=/usr/sbin/k3s ###ZARF_CONST_K3S_MODE######ZARF_CONST_K3S_INTERNAL### ###ZARF_VAR_K3S_ARGS###
```

If this were the case then adding a new k3s agent would be run as (assuming that we had an init package that only had k3s with both the `k3s` and `k3s-agent` as optional packages and nothing else required:

```shell
zarf init --components=k3s-agent --confirm --set K3S_ARGS=\"--server=https://1.2.3.4:6443 --token=foo\""
```

References:

* https://github.com/defenseunicorns/zarf-package-bare-metal
* https://github.com/defenseunicorns/zarf/issues/1002
* https://docs.k3s.io/datastore/ha-embedded
* https://docs.k3s.io/cli/agent

## Decision

TBD

## Consequences

...