Skip to content

Commit

Permalink
Merge branch 'main' into feat/k8s-1.28.3-occm-1.28.1
Browse files Browse the repository at this point in the history
  • Loading branch information
garloff committed Jan 26, 2024
2 parents 1e4e363 + 5b7e1f9 commit 8e02d4f
Show file tree
Hide file tree
Showing 7 changed files with 52 additions and 33 deletions.
30 changes: 17 additions & 13 deletions doc/Upgrade-Guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,9 @@ state: Draft (v0.7)
This document explains the steps to upgrade the SCS Kubernetes cluster-API
based cluster management solution as follows:
- from the R2 (2022-03) to the R3 (2022-09) state
- from the R3 (2022-09) to the R4 state
- from the R4 (2023-09) to the R5 state
- from the R3 (2022-09) to the R4 (2023-03) state
- from the R4 (2023-03) to the R5 (2022-09) state

The document explains how the management cluster and the workload clusters can be
upgraded without disruption. It is highly recommended to do a step-by-step upgrade
across major releases i.e. upgrade from R2 to R3 and then to R4 in the case of
Expand All @@ -23,22 +24,22 @@ take, and it is advisable that cluster operators get some experience with
this kind of cluster management before applying this to customer clusters
that carry important workloads.

Note that while the detailed steps are tested and targeted to a R2 -> R3 move,
Note that while the detailed steps are tested and targeted to an R2 -> R3 move,
R3 -> R4 move or R4 -> R5 move, many of the steps are a generic approach that will apply also for other
upgrades, so expect a lot of similar steps when moving beyond R5.

Upgrades from cluster management prior to R2 is difficult; many changes before
Upgrades from cluster management prior to R2 are difficult; many changes before
R2 assumed that you would redeploy the management cluster. Redeploying the
management cluster can of course always be done, but it's typically disruptive
to your workload clusters, unless you move your cluster management state into
a new management cluster with `clusterctl move`.

## Management host (cluster) vs. Workload clusters

When you initially deployed the SCS k8s-cluster-api-provider, you create a
When you initially deployed the SCS k8s-cluster-api-provider, you created a
VM with a [kind](https://kind.sigs.k8s.io/) cluster inside and a number of
templates, scripts and binaries that are then used to do the cluster management.
This is your management host (or more precisely you single-host management
This is your management host (or more precisely your single-host management
cluster). Currently, all cluster management including upgrading etc. is done
by connecting to this host via ssh and performing commands there. (You don't
need root privileges to do cluster management there, the normal ubuntu user
Expand Down Expand Up @@ -134,8 +135,11 @@ You can now apply the upgrade by executing the following command:
clusterctl upgrade apply --contract v1beta1
```

You can then upgrade the components. You can do them one-by-one or simply do
`clusterctl upgrade apply --contract v1beta1`
You can then upgrade the components. You can do them one-by-one, e.g.:
```bash
clusterctl upgrade apply --infrastructure capo-system/openstack:v0.7.3 --core capi-system/cluster-api:v1.5.1 -b capi-kubeadm-bootstrap-system/kubeadm:v1.5.1 -c capi-kubeadm-control-plane-system/kubeadm:v1.5.1
```
Or simply do `clusterctl upgrade apply --contract v1beta1`.

#### New templates

Expand Down Expand Up @@ -337,11 +341,11 @@ If you decide to migrate your existing Kubernetes cluster from R4 to R5 be aware

Follow the below steps if you want to migrate an existing cluster from R4 to R5:
1. Access your management node
2. Checkout R5 tag
2. Checkout R5 branch
```bash
cd k8s-cluster-api-provider
git pull
git checkout tags/v6.0.0
git checkout maintained/v6.x
```
3. Backup an existing cluster configuration files (recommended)
```bash
Expand All @@ -357,7 +361,7 @@ Follow the below steps if you want to migrate an existing cluster from R4 to R5:
and are not directly mentioned in the cluster configuration files, but they are hardcoded
in R5 scripts (e.g. ingress nginx controller, metrics server). Hence, read carefully the
R5 release notes too. Also see that Kubernetes version was not updated, and it is still v1.25.6.
6. Update an existing cluster (expect Kubernetes version)
6. Update an existing cluster (except Kubernetes version)
```bash
create_cluster.sh <CLUSTER_NAME>
```
Expand All @@ -380,7 +384,7 @@ Follow the below steps if you want to migrate an existing cluster from R4 to R5:
10. Bump Kubernetes version to R5 v1.27.5 and increase the generation counter for node and control plane nodes
```bash
sed -i 's/^KUBERNETES_VERSION: v1.26.8/KUBERNETES_VERSION: v1.27.5/' <CLUSTER_NAME>/clusterctl.yaml
sed -i 's/^OPENSTACK_IMAGE_NAME: ubuntu-capi-image-v1.26.8 /OPENSTACK_IMAGE_NAME: ubuntu-capi-image-v1.27.5/' <CLUSTER_NAME>/clusterctl.yaml
sed -i 's/^OPENSTACK_IMAGE_NAME: ubuntu-capi-image-v1.26.8/OPENSTACK_IMAGE_NAME: ubuntu-capi-image-v1.27.5/' <CLUSTER_NAME>/clusterctl.yaml
sed -r 's/(^CONTROL_PLANE_MACHINE_GEN: genc)([0-9][0-9])/printf "\1%02d" $((\2+1))/ge' -i <CLUSTER_NAME>/clusterctl.yaml
sed -r 's/(^WORKER_MACHINE_GEN: genw)([0-9][0-9])/printf "\1%02d" $((\2+1))/ge' -i <CLUSTER_NAME>/clusterctl.yaml
```
Expand All @@ -395,7 +399,7 @@ OCCM, CNI (calico/cilium), CSI

### New versions for optional components

nginx, metrics (nothing to do), cert-manager, flux
nginx, metrics server, cert-manager, flux

### etcd leader changes

Expand Down
4 changes: 2 additions & 2 deletions doc/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ Parameters controlling the Cluster-API management server (capi management server
| `kind_flavor` | | SCS | `SCS-2V-4` | Flavor to be used for the k8s capi mgmt server |
| `image` | | SCS | `Ubuntu 22.04` | Image for the capi mgmt server. Use `Ubuntu 22.04` or `Debian 12`. Check also the `ssh_username` parameter |
| `ssh_username` | | SCS | `ubuntu` | Name of the default user for the `image` |
| `clusterapi_version` | | SCS | `1.5.3` <!-- renovate: datasource=github-releases depName=kubernetes-sigs/cluster-api --> | Version of the cluster-API incl. `clusterctl` |
| `capi_openstack_version` | | SCS | `0.8.0` <!-- renovate: datasource=github-releases depName=kubernetes-sigs/cluster-api-provider-openstack --> | Version of the cluster-api-provider-openstack (needs to fit the CAPI version) |
| `clusterapi_version` | | SCS | `1.6.1` <!-- renovate: datasource=github-releases depName=kubernetes-sigs/cluster-api --> | Version of the cluster-API incl. `clusterctl` |
| `capi_openstack_version` | | SCS | `0.9.0` <!-- renovate: datasource=github-releases depName=kubernetes-sigs/cluster-api-provider-openstack --> | Version of the cluster-api-provider-openstack (needs to fit the CAPI version) |
| `cilium_binaries` | | SCS | `v0.15.7;v0.12.0` | Versions of the cilium and hubble CLI in the vA.B.C;vX.Y.Z format |
| `restrict_mgmt_server` | | SCS | `["0.0.0.0/0"]` | Allows restricting access to the management server by the given list of CIDRs. Empty value (default) means public. |
| `mgmt_cidr` | | SCS | `10.0.0.0/24` | IPv4 address range (CIDR notation) for management cluster |
Expand Down
6 changes: 3 additions & 3 deletions terraform/cleanup/cleanup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ while read LB FIP; do
if test -z "$DEBUG"; then $OPENSTACK floating ip delete $FID; fi
fi
done < <(echo "$LBS")
SRV=$(resourcelist server "$CAPIPRE-$CLUSTER\|$CLUSTER-control-plane")
SRV=$(resourcelist server "$CAPIPRE-$CLUSTER\|$CLUSTER-")
SRVVOL=$(server_vols $SRV)
if test -n "$DEBUG"; then echo "### Attached volumes to "${SRV}": $SRVVOL"; fi
#cleanup server $CAPIPRE-$CLUSTER
Expand All @@ -219,7 +219,7 @@ if test -n "$NOCASCADE"; then
else
cleanup_list loadbalancer 1 "--cascade" "$LBS"
fi
cleanup port "$CAPIPRE-$CLUSTER\|$CLUSTER-control-plane"
cleanup port "$CAPIPRE-$CLUSTER\|$CLUSTER-"
RTR=$(resourcelist router "$CAPIPRE2ALL")
SUBNETS=$(resourcelist subnet "$CAPIPRE2ALL")
if test -n "$RTR" -a -n "$SUBNETS"; then
Expand All @@ -245,7 +245,7 @@ cleanup_list volume "" "" "$SRVVOL"
#cleanup "image" ubuntu-capi-image
cleanup "server group" "$CAPIPRE-$CLUSTER"
# Normally, the volumes should be all gone, but if there's one left, take care of it
cleanup volume "$CAPIPRE-$CLUSTER\|$CLUSTER-control-plane"
cleanup volume "$CAPIPRE-$CLUSTER\|$CLUSTER-"
cleanup "application credential" "$CAPIPRE-$CLUSTER-appcred"
cleanup container "$CAPIPRE-$CLUSTER-harbor-registry"

Expand Down
2 changes: 1 addition & 1 deletion terraform/files/bin/install_k9s.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
. /etc/profile.d/proxy.sh

# install k9s
K9S_VERSION=v0.30.6 # renovate: datasource=github-releases depName=derailed/k9s
K9S_VERSION=v0.31.7 # renovate: datasource=github-releases depName=derailed/k9s
echo "# install k9s $K9S_VERSION"
ARCH=$(uname -m | sed 's/x86_64/amd64/')
# TODO: Check signature
Expand Down
3 changes: 1 addition & 2 deletions terraform/files/bin/update-R4-to-R5.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,7 @@ cp -p cluster-template.yaml cluster-template.yaml.backup
cp -p clusterctl.yaml clusterctl.yaml.backup

# Update cluster-template.yaml
# TODO: Update `update-cluster-template.diff` file when R5 release will be stabilized
if grep -q "SERVICE_CIDR\|POD_CIDR\|# Defragment & backup & trim script for SCS k8s-cluster-api-provider etcd cluster.\|# Allow to configure registry hosts in containerd" cluster-template.yaml; then
if grep -q "SERVICE_CIDR\|POD_CIDR\|# Defragment & backup & trim script for SCS k8s-cluster-api-provider etcd cluster.\|# Allow to configure registry hosts in containerd\|tweak-kubeapi-memlimit" cluster-template.yaml; then
echo "cluster-template.yaml already updated"
else
# The default template file `cluster-defaults/cluster-template.yaml` of version R4 still references the old `k8s.gcr.io` container registry.
Expand Down
36 changes: 26 additions & 10 deletions terraform/files/update/R4_to_R5/update-cluster-template.diff
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
diff --git a/terraform/files/template/cluster-template.yaml b/terraform/files/template/cluster-template.yaml
index 43560bc..469b829 100644
index 43560bc..c0b6f82 100644
--- a/terraform/files/template/cluster-template.yaml
+++ b/terraform/files/template/cluster-template.yaml
@@ -7,10 +7,12 @@ metadata:
Expand Down Expand Up @@ -101,9 +101,9 @@ index 43560bc..469b829 100644
+ echo "Warning: forced defragmentation on non leader!"
+ else
+ echo "Exit on non leader (use --force-nonleader optional argument if you want to force defragmentation on non leader)"
exit 0
+ exit 0
+ fi
fi
+ fi
+
+ # Check health of all etcd members
+ while read MEMBER; do
Expand All @@ -130,9 +130,9 @@ index 43560bc..469b829 100644
+ echo "Warning: forced defragmentation on single member etcd!"
+ else
+ echo "Exit on single member etcd (use --force-single optional argument if you want to force defragmentation on single member etcd)"
+ exit 0
exit 0
+ fi
+ fi
fi
+
+ # Skip step-by-step defragmentation if the defragmentation on single member etcd is forced
+ if test -z "$FORCE_SINGLE"; then
Expand Down Expand Up @@ -171,15 +171,31 @@ index 43560bc..469b829 100644
chmod 0600 /root/etcd-backup
xz -f /root/etcd-backup
fstrim -v /var/lib/etcd
@@ -127,7 +227,6 @@ spec:
@@ -127,12 +227,22 @@ spec:

[Timer]
OnCalendar=*-*-* 02:30:00
- RandomizedDelaySec=15m

[Install]
WantedBy=timers.target
@@ -144,12 +243,25 @@ spec:
+ - path: /root/tweak-kubeapi-memlimit.sh
+ owner: root:root
+ permissions: "0755"
+ content: |
+ #!/bin/bash
+ grep '^ limits:' /etc/kubernetes/manifests/kube-apiserver.yaml >/dev/null 2>&1 && exit 0
+ MEM=$(free -m | grep '^Mem:' | awk '{print $2;}')
+ CPU=$(grep '^processor' /proc/cpuinfo | wc -l)
+ sed -i "/^ *requests:/i\ limits:\n memory: $((10+3*$MEM/4))M\n cpu: $((750*$CPU))m" /etc/kubernetes/manifests/kube-apiserver.yaml
+ sed -i "/^ *requests:/a\ memory: 512M" /etc/kubernetes/manifests/kube-apiserver.yaml
postKubeadmCommands:
- if test "${ETCD_UNSAFE_FS}" = "true"; then mount -o remount,barrier=0,commit=20 /; sed -i 's@errors=remount-ro@errors=remount-ro,barrier=0,commit=20@' /etc/fstab; fi
+ - /root/tweak-kubeapi-memlimit.sh
- sync; systemctl restart kubelet # We should no longer need this
- while test -z "$EPID"; do sleep 5; EPID=`pgrep etcd`; done; renice -10 $EPID; ionice -c2 -n0 -p $EPID
- systemctl enable etcd-defrag.service
@@ -144,12 +254,25 @@ spec:
- apt-get update -y
- TRIMMED_KUBERNETES_VERSION=$(echo ${KUBERNETES_VERSION} | sed 's/\./\./g' | sed 's/^v//')
- RESOLVED_KUBERNETES_VERSION=$(apt-cache policy kubelet | sed 's/\*\*\*//' | awk -v VERSION=$${TRIMMED_KUBERNETES_VERSION} '$1~ VERSION { print $1 }' | head -n1)
Expand All @@ -194,7 +210,7 @@ index 43560bc..469b829 100644
- systemctl daemon-reload
+ - systemctl restart containerd.service
+ # Install etcdctl
+ - ETCDCTL_VERSION=v3.5.7
+ - ETCDCTL_VERSION=v3.5.9
+ - curl -L https://github.com/coreos/etcd/releases/download/$${ETCDCTL_VERSION}/etcd-$${ETCDCTL_VERSION}-linux-amd64.tar.gz -o /tmp/etcd-$${ETCDCTL_VERSION}-linux-amd64.tar.gz
+ - tar xzvf /tmp/etcd-$${ETCDCTL_VERSION}-linux-amd64.tar.gz -C /tmp/
+ - sudo cp /tmp/etcd-$${ETCDCTL_VERSION}-linux-amd64/etcdctl /usr/local/bin/
Expand All @@ -207,7 +223,7 @@ index 43560bc..469b829 100644
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha6
kind: OpenStackMachineTemplate
metadata:
@@ -191,11 +303,11 @@ spec:
@@ -191,11 +314,11 @@ spec:
kind: KubeadmConfigTemplate
infrastructureRef:
name: "${PREFIX}-${CLUSTER_NAME}-md-0-${WORKER_MACHINE_GEN}"
Expand All @@ -221,7 +237,7 @@ index 43560bc..469b829 100644
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha6
kind: OpenStackMachineTemplate
metadata:
@@ -236,7 +348,14 @@ spec:
@@ -236,7 +359,14 @@ spec:
- TRIMMED_KUBERNETES_VERSION=$(echo ${KUBERNETES_VERSION} | sed 's/\./\./g' | sed 's/^v//')
- RESOLVED_KUBERNETES_VERSION=$(apt-cache policy kubelet | sed 's/\*\*\*//' | awk -v VERSION=$${TRIMMED_KUBERNETES_VERSION} '$1~ VERSION { print $1 }' | head -n1)
- apt-get install -y ca-certificates socat jq ebtables apt-transport-https cloud-utils prips containerd kubelet=$${RESOLVED_KUBERNETES_VERSION} kubeadm=$${RESOLVED_KUBERNETES_VERSION} kubectl=$${RESOLVED_KUBERNETES_VERSION}
Expand Down
4 changes: 2 additions & 2 deletions terraform/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -67,13 +67,13 @@ variable "calico_version" {
variable "clusterapi_version" {
description = "desired version of cluster-api"
type = string
default = "1.5.3" # renovate: datasource=github-releases depName=kubernetes-sigs/cluster-api
default = "1.6.1" # renovate: datasource=github-releases depName=kubernetes-sigs/cluster-api
}

variable "capi_openstack_version" {
description = "desired version of the OpenStack cluster-api provider"
type = string
default = "0.8.0" # renovate: datasource=github-releases depName=kubernetes-sigs/cluster-api-provider-openstack
default = "0.9.0" # renovate: datasource=github-releases depName=kubernetes-sigs/cluster-api-provider-openstack
}

variable "kubernetes_version" {
Expand Down

0 comments on commit 8e02d4f

Please sign in to comment.