Merge branch 'main' into feat/k8s-1.28.3-occm-1.28.1

SovereignCloudStack · Jan 26, 2024 · 8e02d4f · 8e02d4f
2 parents 1e4e363 + 5b7e1f9
commit 8e02d4f
Show file tree

Hide file tree

Showing 7 changed files with 52 additions and 33 deletions.
diff --git a/doc/Upgrade-Guide.md b/doc/Upgrade-Guide.md
@@ -10,8 +10,9 @@ state: Draft (v0.7)
 This document explains the steps to upgrade the SCS Kubernetes cluster-API
 based cluster management solution as follows:
 - from the R2 (2022-03) to the R3 (2022-09) state
-- from the R3 (2022-09) to the R4 state
-- from the R4 (2023-09) to the R5 state
+- from the R3 (2022-09) to the R4 (2023-03) state
+- from the R4 (2023-03) to the R5 (2022-09) state
+
 The document explains how the management cluster and the workload clusters can be
 upgraded without disruption. It is highly recommended to do a step-by-step upgrade
 across major releases i.e. upgrade from R2 to R3 and then to R4 in the case of
@@ -23,22 +24,22 @@ take, and it is advisable that cluster operators get some experience with
 this kind of cluster management before applying this to customer clusters
 that carry important workloads.
 
-Note that while the detailed steps are tested and targeted to a R2 -> R3 move,
+Note that while the detailed steps are tested and targeted to an R2 -> R3 move,
 R3 -> R4 move or R4 -> R5 move, many of the steps are a generic approach that will apply also for other
 upgrades, so expect a lot of similar steps when moving beyond R5.
 
-Upgrades from cluster management prior to R2 is difficult; many changes before
+Upgrades from cluster management prior to R2 are difficult; many changes before
 R2 assumed that you would redeploy the management cluster. Redeploying the
 management cluster can of course always be done, but it's typically disruptive
 to your workload clusters, unless you move your cluster management state into
 a new management cluster with `clusterctl move`.
 
 ## Management host (cluster) vs. Workload clusters
 
-When you initially deployed the SCS k8s-cluster-api-provider, you create a
+When you initially deployed the SCS k8s-cluster-api-provider, you created a
 VM with a [kind](https://kind.sigs.k8s.io/) cluster inside and a number of
 templates, scripts and binaries that are then used to do the cluster management.
-This is your management host (or more precisely you single-host management
+This is your management host (or more precisely your single-host management
 cluster). Currently, all cluster management including upgrading etc. is done
 by connecting to this host via ssh and performing commands there. (You don't
 need root privileges to do cluster management there, the normal ubuntu user
@@ -134,8 +135,11 @@ You can now apply the upgrade by executing the following command:
 clusterctl upgrade apply --contract v1beta1
 ```
 
-You can then upgrade the components. You can do them one-by-one or simply do
-`clusterctl upgrade apply --contract v1beta1`
+You can then upgrade the components. You can do them one-by-one, e.g.:
+```bash
+clusterctl upgrade apply --infrastructure capo-system/openstack:v0.7.3 --core capi-system/cluster-api:v1.5.1 -b capi-kubeadm-bootstrap-system/kubeadm:v1.5.1 -c capi-kubeadm-control-plane-system/kubeadm:v1.5.1
+```
+Or simply do `clusterctl upgrade apply --contract v1beta1`.
 
 #### New templates
 
@@ -337,11 +341,11 @@ If you decide to migrate your existing Kubernetes cluster from R4 to R5 be aware
 
 Follow the below steps if you want to migrate an existing cluster from R4 to R5:
 1. Access your management node
-2. Checkout R5 tag
+2. Checkout R5 branch
    ```bash
    cd k8s-cluster-api-provider
    git pull
-   git checkout tags/v6.0.0
+   git checkout maintained/v6.x
    ```
 3. Backup an existing cluster configuration files (recommended)
    ```bash
@@ -357,7 +361,7 @@ Follow the below steps if you want to migrate an existing cluster from R4 to R5:
    and are not directly mentioned in the cluster configuration files, but they are hardcoded
    in R5 scripts (e.g. ingress nginx controller, metrics server). Hence, read carefully the
    R5 release notes too. Also see that Kubernetes version was not updated, and it is still v1.25.6.
-6. Update an existing cluster (expect Kubernetes version)
+6. Update an existing cluster (except Kubernetes version)
    ```bash
    create_cluster.sh <CLUSTER_NAME>
    ```
@@ -380,7 +384,7 @@ Follow the below steps if you want to migrate an existing cluster from R4 to R5:
 10. Bump Kubernetes version to R5 v1.27.5 and increase the generation counter for node and control plane nodes
    ```bash
    sed -i 's/^KUBERNETES_VERSION: v1.26.8/KUBERNETES_VERSION: v1.27.5/' <CLUSTER_NAME>/clusterctl.yaml
-   sed -i 's/^OPENSTACK_IMAGE_NAME: ubuntu-capi-image-v1.26.8 /OPENSTACK_IMAGE_NAME: ubuntu-capi-image-v1.27.5/' <CLUSTER_NAME>/clusterctl.yaml
+   sed -i 's/^OPENSTACK_IMAGE_NAME: ubuntu-capi-image-v1.26.8/OPENSTACK_IMAGE_NAME: ubuntu-capi-image-v1.27.5/' <CLUSTER_NAME>/clusterctl.yaml
    sed -r 's/(^CONTROL_PLANE_MACHINE_GEN: genc)([0-9][0-9])/printf "\1%02d" $((\2+1))/ge' -i <CLUSTER_NAME>/clusterctl.yaml
    sed -r 's/(^WORKER_MACHINE_GEN: genw)([0-9][0-9])/printf "\1%02d" $((\2+1))/ge' -i <CLUSTER_NAME>/clusterctl.yaml
    ```
@@ -395,7 +399,7 @@ OCCM, CNI (calico/cilium), CSI
 
 ### New versions for optional components
 
-nginx, metrics (nothing to do), cert-manager, flux
+nginx, metrics server, cert-manager, flux
 
 ### etcd leader changes
 

diff --git a/doc/configuration.md b/doc/configuration.md
@@ -21,8 +21,8 @@ Parameters controlling the Cluster-API management server (capi management server
 | `kind_flavor`            |                 | SCS        | `SCS-2V-4`        | Flavor to be used for the k8s capi mgmt server                                                             |
 | `image`                  |                 | SCS        | `Ubuntu 22.04`    | Image for the capi mgmt server. Use `Ubuntu 22.04` or `Debian 12`. Check also the `ssh_username` parameter |
 | `ssh_username`           |                 | SCS        | `ubuntu`          | Name of the default user for the `image`                                                                   |
-| `clusterapi_version`     |                 | SCS        | `1.5.3` <!-- renovate: datasource=github-releases depName=kubernetes-sigs/cluster-api -->          | Version of the cluster-API incl. `clusterctl`                                                              |
-| `capi_openstack_version` |                 | SCS        | `0.8.0` <!-- renovate: datasource=github-releases depName=kubernetes-sigs/cluster-api-provider-openstack -->         | Version of the cluster-api-provider-openstack (needs to fit the CAPI version)                              |
+| `clusterapi_version`     |                 | SCS        | `1.6.1` <!-- renovate: datasource=github-releases depName=kubernetes-sigs/cluster-api -->          | Version of the cluster-API incl. `clusterctl`                                                              |
+| `capi_openstack_version` |                 | SCS        | `0.9.0` <!-- renovate: datasource=github-releases depName=kubernetes-sigs/cluster-api-provider-openstack -->         | Version of the cluster-api-provider-openstack (needs to fit the CAPI version)                              |
 | `cilium_binaries`        |                 | SCS        | `v0.15.7;v0.12.0` | Versions of the cilium and hubble CLI in the vA.B.C;vX.Y.Z format                                          |
 | `restrict_mgmt_server` |                                 | SCS        | `["0.0.0.0/0"]`                      | Allows restricting access to the management server by the given list of CIDRs. Empty value (default) means public.           |
 | `mgmt_cidr`              |                 | SCS        | `10.0.0.0/24` | IPv4 address range (CIDR notation) for management cluster                                          |

diff --git a/terraform/cleanup/cleanup.sh b/terraform/cleanup/cleanup.sh
@@ -209,7 +209,7 @@ while read LB FIP; do
 		if test -z "$DEBUG"; then $OPENSTACK floating ip delete $FID; fi
 	fi
 done < <(echo "$LBS")
-SRV=$(resourcelist server "$CAPIPRE-$CLUSTER\|$CLUSTER-control-plane")
+SRV=$(resourcelist server "$CAPIPRE-$CLUSTER\|$CLUSTER-")
 SRVVOL=$(server_vols $SRV)
 if test -n "$DEBUG"; then echo "### Attached volumes to "${SRV}": $SRVVOL"; fi
 #cleanup server $CAPIPRE-$CLUSTER
@@ -219,7 +219,7 @@ if test -n "$NOCASCADE"; then
 else
 	cleanup_list loadbalancer 1 "--cascade" "$LBS"
 fi
-cleanup port "$CAPIPRE-$CLUSTER\|$CLUSTER-control-plane"
+cleanup port "$CAPIPRE-$CLUSTER\|$CLUSTER-"
 RTR=$(resourcelist router "$CAPIPRE2ALL")
 SUBNETS=$(resourcelist subnet "$CAPIPRE2ALL")
 if test -n "$RTR" -a -n "$SUBNETS"; then
@@ -245,7 +245,7 @@ cleanup_list volume "" "" "$SRVVOL"
 #cleanup "image" ubuntu-capi-image
 cleanup "server group" "$CAPIPRE-$CLUSTER"
 # Normally, the volumes should be all gone, but if there's one left, take care of it
-cleanup volume "$CAPIPRE-$CLUSTER\|$CLUSTER-control-plane"
+cleanup volume "$CAPIPRE-$CLUSTER\|$CLUSTER-"
 cleanup "application credential" "$CAPIPRE-$CLUSTER-appcred"
 cleanup container "$CAPIPRE-$CLUSTER-harbor-registry"
 

diff --git a/terraform/files/bin/install_k9s.sh b/terraform/files/bin/install_k9s.sh
@@ -5,7 +5,7 @@
 . /etc/profile.d/proxy.sh
 
 # install k9s
-K9S_VERSION=v0.30.6 # renovate: datasource=github-releases depName=derailed/k9s
+K9S_VERSION=v0.31.7 # renovate: datasource=github-releases depName=derailed/k9s
 echo "# install k9s $K9S_VERSION"
 ARCH=$(uname -m | sed 's/x86_64/amd64/')
 # TODO: Check signature

diff --git a/terraform/files/bin/update-R4-to-R5.sh b/terraform/files/bin/update-R4-to-R5.sh
@@ -46,8 +46,7 @@ cp -p cluster-template.yaml cluster-template.yaml.backup
 cp -p clusterctl.yaml clusterctl.yaml.backup
 
 # Update cluster-template.yaml
-# TODO: Update `update-cluster-template.diff` file when R5 release will be stabilized
-if grep -q "SERVICE_CIDR\|POD_CIDR\|# Defragment & backup & trim script for SCS k8s-cluster-api-provider etcd cluster.\|# Allow to configure registry hosts in containerd" cluster-template.yaml; then
+if grep -q "SERVICE_CIDR\|POD_CIDR\|# Defragment & backup & trim script for SCS k8s-cluster-api-provider etcd cluster.\|# Allow to configure registry hosts in containerd\|tweak-kubeapi-memlimit" cluster-template.yaml; then
   echo "cluster-template.yaml already updated"
 else
   # The default template file `cluster-defaults/cluster-template.yaml` of version R4 still references the old `k8s.gcr.io` container registry.

diff --git a/terraform/files/update/R4_to_R5/update-cluster-template.diff b/terraform/files/update/R4_to_R5/update-cluster-template.diff
@@ -1,5 +1,5 @@
 diff --git a/terraform/files/template/cluster-template.yaml b/terraform/files/template/cluster-template.yaml
-index 43560bc..469b829 100644
+index 43560bc..c0b6f82 100644
 --- a/terraform/files/template/cluster-template.yaml
 +++ b/terraform/files/template/cluster-template.yaml
 @@ -7,10 +7,12 @@ metadata:
@@ -101,9 +101,9 @@ index 43560bc..469b829 100644
 +              echo "Warning: forced defragmentation on non leader!"
 +            else
 +              echo "Exit on non leader (use --force-nonleader optional argument if you want to force defragmentation on non leader)"
-               exit 0
++              exit 0
 +            fi
-           fi
++          fi
 +
 +          # Check health of all etcd members
 +          while read MEMBER; do
@@ -130,9 +130,9 @@ index 43560bc..469b829 100644
 +              echo "Warning: forced defragmentation on single member etcd!"
 +            else
 +              echo "Exit on single member etcd (use --force-single optional argument if you want to force defragmentation on single member etcd)"
-+              exit 0
+               exit 0
 +            fi
-+          fi
+           fi
 +
 +          # Skip step-by-step defragmentation if the defragmentation on single member etcd is forced
 +          if test -z "$FORCE_SINGLE"; then
@@ -171,15 +171,31 @@ index 43560bc..469b829 100644
            chmod 0600 /root/etcd-backup
            xz -f /root/etcd-backup
            fstrim -v /var/lib/etcd
-@@ -127,7 +227,6 @@ spec:
+@@ -127,12 +227,22 @@ spec:
 
            [Timer]
            OnCalendar=*-*-* 02:30:00
 -          RandomizedDelaySec=15m
 
            [Install]
            WantedBy=timers.target
-@@ -144,12 +243,25 @@ spec:
++      - path: /root/tweak-kubeapi-memlimit.sh
++        owner: root:root
++        permissions: "0755"
++        content: |
++          #!/bin/bash
++          grep '^      limits:' /etc/kubernetes/manifests/kube-apiserver.yaml >/dev/null 2>&1 && exit 0
++          MEM=$(free -m | grep '^Mem:' | awk '{print $2;}')
++          CPU=$(grep '^processor' /proc/cpuinfo | wc -l)
++          sed -i "/^ *requests:/i\      limits:\n        memory: $((10+3*$MEM/4))M\n        cpu: $((750*$CPU))m" /etc/kubernetes/manifests/kube-apiserver.yaml
++          sed -i "/^ *requests:/a\        memory: 512M" /etc/kubernetes/manifests/kube-apiserver.yaml
+     postKubeadmCommands:
+       - if test "${ETCD_UNSAFE_FS}" = "true"; then mount -o remount,barrier=0,commit=20 /; sed -i 's@errors=remount-ro@errors=remount-ro,barrier=0,commit=20@' /etc/fstab; fi
++      - /root/tweak-kubeapi-memlimit.sh
+       - sync; systemctl restart kubelet    # We should no longer need this
+       - while test -z "$EPID"; do sleep 5; EPID=`pgrep etcd`; done; renice -10 $EPID; ionice -c2 -n0 -p $EPID
+       - systemctl enable etcd-defrag.service
+@@ -144,12 +254,25 @@ spec:
        - apt-get update -y
        - TRIMMED_KUBERNETES_VERSION=$(echo ${KUBERNETES_VERSION} | sed 's/\./\./g' | sed 's/^v//')
        - RESOLVED_KUBERNETES_VERSION=$(apt-cache policy kubelet | sed 's/\*\*\*//' | awk -v VERSION=$${TRIMMED_KUBERNETES_VERSION} '$1~ VERSION { print $1 }' | head -n1)
@@ -194,7 +210,7 @@ index 43560bc..469b829 100644
        - systemctl daemon-reload
 +      - systemctl restart containerd.service
 +      # Install etcdctl
-+      - ETCDCTL_VERSION=v3.5.7
++      - ETCDCTL_VERSION=v3.5.9
 +      - curl -L https://github.com/coreos/etcd/releases/download/$${ETCDCTL_VERSION}/etcd-$${ETCDCTL_VERSION}-linux-amd64.tar.gz -o /tmp/etcd-$${ETCDCTL_VERSION}-linux-amd64.tar.gz
 +      - tar xzvf /tmp/etcd-$${ETCDCTL_VERSION}-linux-amd64.tar.gz -C /tmp/
 +      - sudo cp /tmp/etcd-$${ETCDCTL_VERSION}-linux-amd64/etcdctl /usr/local/bin/
@@ -207,7 +223,7 @@ index 43560bc..469b829 100644
  apiVersion: infrastructure.cluster.x-k8s.io/v1alpha6
  kind: OpenStackMachineTemplate
  metadata:
-@@ -191,11 +303,11 @@ spec:
+@@ -191,11 +314,11 @@ spec:
            kind: KubeadmConfigTemplate
        infrastructureRef:
          name: "${PREFIX}-${CLUSTER_NAME}-md-0-${WORKER_MACHINE_GEN}"
@@ -221,7 +237,7 @@ index 43560bc..469b829 100644
  apiVersion: infrastructure.cluster.x-k8s.io/v1alpha6
  kind: OpenStackMachineTemplate
  metadata:
-@@ -236,7 +348,14 @@ spec:
+@@ -236,7 +359,14 @@ spec:
          - TRIMMED_KUBERNETES_VERSION=$(echo ${KUBERNETES_VERSION} | sed 's/\./\./g' | sed 's/^v//')
          - RESOLVED_KUBERNETES_VERSION=$(apt-cache policy kubelet | sed 's/\*\*\*//' | awk -v VERSION=$${TRIMMED_KUBERNETES_VERSION} '$1~ VERSION { print $1 }' | head -n1)
          - apt-get install -y ca-certificates socat jq ebtables apt-transport-https cloud-utils prips containerd kubelet=$${RESOLVED_KUBERNETES_VERSION} kubeadm=$${RESOLVED_KUBERNETES_VERSION} kubectl=$${RESOLVED_KUBERNETES_VERSION}

diff --git a/terraform/variables.tf b/terraform/variables.tf
@@ -67,13 +67,13 @@ variable "calico_version" {
 variable "clusterapi_version" {
   description = "desired version of cluster-api"
   type        = string
-  default     = "1.5.3" # renovate: datasource=github-releases depName=kubernetes-sigs/cluster-api
+  default     = "1.6.1" # renovate: datasource=github-releases depName=kubernetes-sigs/cluster-api
 }
 
 variable "capi_openstack_version" {
   description = "desired version of the OpenStack cluster-api provider"
   type        = string
-  default     = "0.8.0" # renovate: datasource=github-releases depName=kubernetes-sigs/cluster-api-provider-openstack
+  default     = "0.9.0" # renovate: datasource=github-releases depName=kubernetes-sigs/cluster-api-provider-openstack
 }
 
 variable "kubernetes_version" {