Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure security hardening of the KaaS monitoring solution - inner cluster communication #495

Open
9 tasks
matofeder opened this issue Nov 29, 2023 · 4 comments
Labels
Ops Issues or pull requests relevant for Team 3: Ops Tooling security Issues or pull requests that are security-relevant

Comments

@matofeder
Copy link
Member

As a CSP, I require a secure KaaS cluster monitoring solution I offer

Explore the k8s-observability repository and identify the TODOs that highlight instances where an insecure (HTTP) connection is used for the KaaS monitoring solution to collect metrics from Kubernetes control plane components (kube-controller-manager, etcd, kube-proxy, scheduler, etc.)

Definition of Ready:

  • User Story is small enough to be finished within one sprint
  • User Story is clear and understood by the whole team
  • Acceptance criteria are defined
  • Acceptance criteria are clear and understood by the whole team

Definition of Done:

  • All acceptance criteria are met
  • Changes have been reviewed
  • CI tests have run successfully
  • Documentation has been updated
  • Release Notes have been updated
@matofeder matofeder added the Ops Issues or pull requests relevant for Team 3: Ops Tooling label Nov 29, 2023
@matofeder
Copy link
Member Author

matofeder commented Feb 1, 2024

@matofeder matofeder changed the title Ensure security hardening of the KaaS monitoring solution Ensure security hardening of the KaaS monitoring solution - inner cluster communication Feb 2, 2024
@matofeder
Copy link
Member Author

See the following GitHub comment on how to secure connections between Kubernetes control plane components (kube-controller-manager, kube-scheduler, kube-proxy, etcd) and Prometheus Server: prometheus-community/helm-charts#204 (comment)

@matofeder
Copy link
Member Author

matofeder commented Apr 25, 2024

kube-controller-manager, kube-scheduler

After a deeper investigation, it seems that the Prometheus server running in the worker node can not fully securely access the kube-controller-manager and kube-scheduler metrics endpoints (without some workaround), read an explanation here.

  • Both kube-scheduler and kube-controller-manager have 127.0.0.1 as DN in TLS certificate. So scraping from a non-master host is only possible with insecureSkipVerify: true
  • By default, both components bind to 127.0.0.1. Binding with extraArgs: {bind-address: "0.0.0.0"} works, but exposes the components too broadly: cluster-wide and probably even outside the cluster, depending on bound interfaces.

Two approaches are discussed to workaround the above issues:

etcd

Etcd DB can expose metrics via HTTPS. The metrics endpoint should be configured as follows --listen-metrics-urls=https://0.0.0.0:PORT.
Then the client (i.e. Prometheus) should shrape the metrics endpoint with a valid client certificate, see prometheus-community/helm-charts#204 (comment)

kube-apiserver

kube-prometheus-stack creates a token and mounts k8s/PKI dir to the prometheus container. Prometheus uses them, so the connection from prometheus to the Kubernetes API metrics endpoint is already secure.
See the Prometheus config using the default kube-prometheus-stack deployment:

- job_name: serviceMonitor/default/kube-prometheus-apiserver/0
  metrics_path: /metrics
  scheme: https
  authorization:
    type: Bearer
    credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    server_name: kubernetes
    insecure_skip_verify: false

kube-proxy

It seems that kube-proxy does not support TLS, hence secure communication with its metrics endpoint is not possible (directly), see kubernetes/kubernetes#106870

As a possible workaround the ztunnel is recommended.

kubelet

  • By default the kubelet serving certificate deployed by kubeadm is self-signed. This means a connection from external services like the metrics-server/kube-api or Prometheus to a kubelet cannot be secured with TLS, see the docs
  • Hence the kube-promtheus-stack skips the TLS verification as well as others (metrics-server (--kubelet-insecure-tls), kubeapi)

The above could be resolved as follows:

  • kubelet could obtain properly signed serving certificates (signed by the Kubernetes CA) via the serverTLSBootstrap: true field. This enables the bootstrap of kubelet server certificates by requesting them from the certificates.k8s.io API
  • One known limitation is that the CSRs (Certificate Signing Requests) for these certificates cannot be automatically approved by the default signer in the kube-controller-manager - kubernetes.io/kubelet-serving. This will require action from the user or a third party controller (e.g. kubelet-csr-approver )
  • Then the kube-Prometheus-stack should be adjusted to not skip the TLS verification. The rest needed for TLS communication is already covered by the kube-promtheus-stack servicemonitor settings when the https is used:
- job_name: serviceMonitor/default/kube-prometheus-kubelet/0
  metrics_path: /metrics
  scheme: https
  authorization:
    type: Bearer
    credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true

The above scenario has been successfully tested using kind:

  • Set serverTLSBootstrap: true in KubeletConfiguration, see e.g. this tutorial
  • Deploy kube-prometheus-stack
  • Adjust kubelet service monitor and set insecureSkipVerify: false
  • Prometheus scrapes kubelet metrics endpoints via TLS

@bitkeks
Copy link
Member

bitkeks commented May 7, 2024

Good research for TLS-by-default, thanks! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ops Issues or pull requests relevant for Team 3: Ops Tooling security Issues or pull requests that are security-relevant
Projects
None yet
Development

No branches or pull requests

2 participants