Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for Clustermesh setup through Helm Chart #19057

Open
kaworu opened this issue Mar 7, 2022 · 49 comments
Open

Add documentation for Clustermesh setup through Helm Chart #19057

kaworu opened this issue Mar 7, 2022 · 49 comments
Assignees
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. area/helm Impacts helm charts and user deployment experience help-wanted Please volunteer for this by adding yourself as an assignee! kind/enhancement This would improve or streamline existing functionality. pinned These issues are not marked stale by our issue bot.

Comments

@kaworu
Copy link
Member

kaworu commented Mar 7, 2022

Follow-up issue of #17851 which introduced support to connect multiple Clustermesh clusters using the Helm Chart.

@kaworu kaworu added kind/enhancement This would improve or streamline existing functionality. kind/documentation labels Mar 7, 2022
@github-actions
Copy link

github-actions bot commented May 7, 2022

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label May 7, 2022
@kaworu kaworu removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label May 17, 2022
@aanm
Copy link
Member

aanm commented May 24, 2022

@samueltorres do you have any update regarding this?

@aanm aanm added the sig/agent Cilium agent related. label May 24, 2022
@samueltorres
Copy link
Contributor

samueltorres commented Jun 3, 2022

hey @aanm I'm currently working on it ! 🚀 Already gathered feedback with @kaworu

@sayboras sayboras added help-wanted Please volunteer for this by adding yourself as an assignee! area/helm Impacts helm charts and user deployment experience area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. labels Jul 21, 2022
@datsabk
Copy link

datsabk commented Jul 22, 2022

FWIW, I tried to set this up and faced bunch of difficulties. List out the roadblocks here:

  1. We need to manually enter details of every cluster we want to connect
  2. The API Server CA needs to be manually entered without which the configuration wouldn't go through
  3. The DNS records are not created by Helm - So we need an add-on like external-dns that can create records based on the provided annotation.

@xinity
Copy link

xinity commented Sep 12, 2022

@datsabk any chances you could share details on how you achieved the clustermesh setup through helm chart ?

@datsabk
Copy link

datsabk commented Sep 12, 2022

@xinity Wasn't able to. There is too much of certificate configuration complexity and some values are just not being accepted. The approach i took was - use cilium cli to install and connect two clusters, gather the parameters from the live install and try to use that in helm chart.

@dgiebert
Copy link

dgiebert commented Oct 17, 2022

I am currently trying to achieve the cluster mesh with helm, and this needs to be a two-step process, in my opinion:

For this example the final setup will be cluster01 <-> cluster02

  1. Enable the API Server on both clusters that you want to join the mesh (Untested and constructed from HelmChartConfig below)
helm upgrade cilium cilium/cilium --version 1.12.3 \
   --namespace kube-system \
   --reuse-values \
   --set ipv4NativeRoutingCIDR=10.0.0.0/9 \
   --set kubeProxyReplacement=strict \
   --set l7Proxy=false \
   --set k8sServiceHost=x.x.x.x \
   --set k8sServicePort=6443 \
   --set cluster.name=cluster01 \
   --set cluster.id=1 \
   --set externalWorkloads.enabled=true \
   --set clustermesh.useAPIServer=true
  1. Extract the CA, Cert, and Key for every cluster from the secret clustermesh-apiserver-remote-cert in kube-system
  2. Create the link on both cluster with the needed values (Untested and constructed from HelmChartConfig below)
helm upgrade cilium cilium/cilium --version 1.12.3 \
   --namespace kube-system \
   --reuse-values \
   --set ipv4NativeRoutingCIDR=10.0.0.0/9 \
   --set kubeProxyReplacement=strict \
   --set l7Proxy=false \
   --set k8sServiceHost=x.x.x.x \
   --set k8sServicePort=6443 \
   --set cluster.name=cluster01 \
   --set cluster.id=1 \
   --set externalWorkloads.enabled=true \
   --set clustermesh.useAPIServer=true \
   --set clustermesh.config.enabled=true \
   --set clustermesh.config.clusters[0].name=cluster02 \
   --set clustermesh.config.clusters[0].ips[0]=x.x.x.x \
   --set clustermesh.config.clusters[0].port=32379 \
   --set clustermesh.config.clusters[0].tls.cert=.... \
   --set clustermesh.config.clusters[0].tls.key=.... \
   --set clustermesh.apiserver.tls.ca.cert=.... \

This actually done using rke2, here is the HelmChartConfig for it:

apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rke2-cilium
  namespace: kube-system
spec:
  valuesContent: |-
    kubeProxyReplacement: strict
    k8sServiceHost: x.x.x.x
    k8sServicePort: 6443
    ipv4NativeRoutingCIDR: 10.0.0.0/9
    l7Proxy: false
    cluster:
      name: cluster01
      id: 1
    externalWorkloads: 
      enabled: true
    clustermesh:
      useAPIServer: true
      config:
        enabled: true
        clusters:
        - name: cilium04
          ips:
          - x.x.x.x
          port: 32379
          tls:
            cert: ...
            key: ...
      apiserver:
        tls:
          ca:
            cert: ....

This will for now only work with two clusters as the CA is not configurable per cluster.
@samueltorres what was the design decision for this?

Edit: The CA needs to be created in the first and passed along for it to fully function (docs).

@datavisorzhizhu
Copy link

@dgiebert where is clustermesh-apiserver-remote-cert being used?

@dgiebert
Copy link

dgiebert commented Oct 25, 2022

It is the certificate used to connect to the apiserver of the remote cluster, so it is used in clustermesh.config.cluster[x].tls.cert with the corresponding key. (cluster1 will need to have the cert + key for cluster2 and vice versa)

Maybe this gist will help you.

@datavisorzhizhu
Copy link

datavisorzhizhu commented Oct 25, 2022

@dgiebert thanks for explanation
I have some doubt on this part:

clustermesh:
useAPIServer: true
config:
enabled: true
clusters:
- name: cilium01
ips:
- x.x.x.x
port: 32379
tls:
cert: "Check clustermesh-remote-server-cert on kube-system in cluster01"
key: "Check clustermesh-remote-server-cert on kube-system in cluster01"

        if clustermesh consists of 3 kubernetes cluster, how do you fill values here?
        
        I mean:
           tls:
             cert: 
             key: 
             
         Where are above certificated used in Helm Chart? cannot find places to use them

@dgiebert
Copy link

On cilium01 (and on the other cluster you will need to adapt that accordingly):

        clusters:
        - name: cilium02
          ips:
          - x.x.x.x
          port: 32379
          tls:
            cert: "Check clustermesh-remote-server-cert on kube-system in cluster02"
            key: "Check clustermesh-remote-server-cert on kube-system in cluster02"
        - name: cilium03
          ips:
          - x.x.x.x
          port: 32379
          tls:
            cert: "Check clustermesh-remote-server-cert on kube-system in cluster03"
            key: "Check clustermesh-remote-server-cert on kube-system in cluster03"

Using helm they should be set in the array:

   --set clustermesh.config.clusters[0].name=cluster02 \
  [...]
   --set clustermesh.config.clusters[0].tls.key=.... \
   --set clustermesh.config.clusters[1].name=cluster03 \
  [...]
   --set clustermesh.config.clusters[1].tls.key=.... \

Keep in mind that I did only test it using HelmChartConfig!

@datavisorzhizhu
Copy link

datavisorzhizhu commented Oct 25, 2022

@dgiebert thanks a lot
Another questions:

  1. where is secret clustermesh-apiserver-client-cert being used?
  2. Can I curl clustermesh-apiserver?

@aanm aanm added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. area/clustermesh Relates to multi-cluster routing functionality in Cilium. and removed sig/agent Cilium agent related. labels Nov 9, 2022
@darkstarmv
Copy link

darkstarmv commented Dec 8, 2022

I tried to follow cilium/clustermesh-apiserver/README.md#deploy-using-helm steps, but my clustermesh can't initialize:

==== detail from pod cilium-gnrrx , on node xxxxxxxxx
KVStore:                Ok   Disabled
Kubernetes:             Ok   1.23 () [linux/amd64]
Kubernetes APIs:        ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumEndpoint", "cilium/v2::CiliumNetworkPolicy", "cilium/v2::CiliumNode", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:   Disabled
Host firewall:          Disabled
CNI Chaining:           none
Cilium:                 Ok   1.12.4 (v1.12.4-6eaecaf)
NodeMonitor:            Listening for events on 32 CPUs with 64x4096 of shared memory
Cilium health daemon:   Ok
IPAM:                   IPv4: 17/254 allocated from xxx.xxx.xxx.xxx/24,
ClusterMesh:            0/1 clusters ready, 0 global-services
   Cluster1: not-ready, 0 nodes, 0 identities, 0 services, 0 failures (last: never)
   └  Waiting for initial connection to be established
BandwidthManager:        Disabled
Host Routing:            Legacy
Masquerading:            IPTables [IPv4: Enabled, IPv6: Disabled]
Controller Status:       97/97 healthy
Proxy Status:            OK, ip XXX.XXX.X.XXX, 0 redirects active on ports 10000-20000
Global Identity Range:   min 256, max 65535
Hubble:                  Ok   Current/Max Flows: 4095/4095 (100.00%), Flows/s: 82.22   Metrics: Ok
Encryption:              Disabled
Cluster health:          3/3 reachable   (2022-12-08T23:04:44Z)

usage: sleep seconds

@datavisorzhizhu
Copy link

@darkstarmv you can follow this blog: https://www.sobyte.net/post/2022-05/cilium-cluster-mesh/ using cilium command;
when you understand steps for setting up cilium cluster mesh, you can convert the steps into helm chart based on the blog and above discussion

@github-actions
Copy link

github-actions bot commented Feb 7, 2023

This issue has been automatically marked as stale because it has not
had recent activity. It will be closed if no further activity occurs.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Feb 7, 2023
@m-yosefpor
Copy link
Contributor

still relevant

@mayur2281
Copy link

mayur2281 commented Apr 22, 2024

Hi I am still facing certificate issues wile trying to configure clustermesh between 2 clusters using helm and ArgoCD. Cant we have an init script to get the necessary certificate secrets in sync with both the clusters?

@mayur2281
Copy link

#32076 (comment)

Here is the difference in cli install and helm install.

@mayur2281
Copy link

Hi @ALL Should i add the secrets manually like this as explained in this blog:

https://docs.kubermatic.com/kubermatic/v2.25/tutorials-howtos/networking/cilium-cluster-mesh/

Is there a way to automate this using helm charts itself?

@dazmc
Copy link

dazmc commented Apr 23, 2024

Note: I don't use clustermesh.apiserver.tls..ca and clustermesh.apiserver.config.clusters[].tls in favour of setting it in tls.ca

@mayur2281 You can create the cert/key and base64 encoded it and then have it in the helm values for both clusters. I normally just deploy the first cluster export the certs and then keep it safe (keystore). Then as i build other clusters i put the base64 encoded values in tls.ca in helm values file

@mayur2281
Copy link

@dazmc When you deploy the first cluster you export the secret called cilium-ca right? and then put that tls.ca section? Other than that there is no other configuration?

@mayur2281
Copy link

@samueltorres Can we have another section for clustermesh helm install in these docs?: https://docs.cilium.io/en/stable/network/clustermesh/clustermesh/#gs-clustermesh

@mayur2281
Copy link

@dazmc Pls tell me what secrets from the first cluster are to be used in the second cluster and in which configuration blocks in the helm values! It is really confusing. Are you suggesting to use only this block here: https://github.com/cilium/cilium/blob/main/install/kubernetes/cilium/values.yaml#L2248

@mayur2281
Copy link

Thanks @dazmc Got the clustermesh working by having the same CA as the first cluster in my second cluster in the tls.ca section in the helm values.

@dazmc
Copy link

dazmc commented Apr 23, 2024

@mayur2281 yes, that's correct. If you notice here it says "These fields can (and should) be omitted in case the CA is shared across clusters" and yes it is confusing ;-)

@mayur2281
Copy link

@dazmc Yes i saw that and was clear not to use that section, but what was confusing for me was the clustermesh.apiserver.tls section. This section is purely used for communication within the cluster for the clustermesh apiserver right?

Anyways i am writing a blog regarding this and will share it here, Pls share your review on it!

@dazmc
Copy link

dazmc commented Apr 23, 2024

@mayur2281 i then have key/cert in a files (base64 encoded contents, don't do this in production) and use envsubst to fill a template helm values file, including other variables.

#mesh_helm.tmpl
.....
ipv4NativeRoutingCIDR: ${CILIUM_CIDR}
cluster:
  name: CLUSTER1
  id: 1
tls:
  ca:
    cert: $CA_CERT
    key: $CA_KEY
export CA_CERT=$(cat /path/to/CA_FILE)
export CA_KEY=(cat /path/to/CA_KEY)
 cat mesh_helm.tmpl | envsubst > mesh_helm.yaml

@dazmc
Copy link

dazmc commented Apr 23, 2024

@mayur2281 that section is deprecated, if i look in the the clustermesh.apiserver.tls section i no longer see ca here section which ties in with it being gone in 1.15.

in 1.13.9 helm values that section is there, in 1.14.9 it says the following

ca:
# -- Deprecated in favor of tls.ca.cert. To be removed in 1.15.
# Optional CA cert. If it is provided, it will be used by the 'cronJob' method to
# generate all other certificates. Otherwise, an ephemeral CA is generated.
cert: ""
# -- Deprecated in favor of tls.ca.key. To be removed in 1.15.
# Optional CA private key. If it is provided, it will be used by the 'cronJob' method to
# generate all other certificates. Otherwise, an ephemeral CA is generated.

So I'm using 1.14 now so i have stuck with just using tls.ca section

@mayur2281
Copy link

#19057 (comment)

I cant do this because i'm using ArgoCD to deploy the clustermesh and have to insert the cert and key manually in the helm values and pass it to the second cluster in argocd.

@dazmc
Copy link

dazmc commented Apr 23, 2024

@mayur2281 there is nothing stopping you using the cert/key on both cluster1 helm values as well as cluster2. I generated that first cluster to get the cert/key format then i blew it away. Now i have the cert/key so i just put it in the helm values when i deploy cluster 1 or cluster 2 etc...

You can manually create the cert/key and use it on any cluster (1, 2, 3 etc...) . So all my clusters have the same key. Sometimes i generate a new cert/key so then i can have cluster1,2,3 in one mesh and cluster4,5,6 in a different mesh. Just keep the cert/keys you need/want.

Note: I don't use ArgoCD I prefer flux ;-)

@mayur2281
Copy link

@dazmc So then what would be the best practice for Production?

@dazmc
Copy link

dazmc commented Apr 23, 2024

@mayur2281 Having the ca key/certs laying around in a file on your laptop isn't a good idea imho. So i would at least encrypt the files (sops) but better still store the cert/key in something like hashicorp vault or external-secrets (AWS /Azure/GCP keyvault).

@mayur2281
Copy link

Yes will do that

@mayur2281
Copy link

mayur2281 commented Apr 23, 2024

@dazmc Pls provide your feedback on this: https://notes.mayurbn.site/Blogs/Cilium-Clustermesh-Deployment-on-Helm-or-ArgoCD-on-AKS

PS: If you get a 404 or if the site isnt loading, the site is most probably migrated from mayurbn.site to mayurbn.top

@samueltorres
Copy link
Contributor

Hey folks,

Just created an example repo on how to configure clustermesh through the helm chart:
https://github.com/samueltorres/cilium-clustermesh-helm/

I aim in the future as soon as I have some free time to document the clustermesh through Helm docs :)

@dazmc
Copy link

dazmc commented Apr 24, 2024

@mayur2281 It looks like line 13 needs indenting in first yaml and the same for line 18 in the second yaml.

You also don't mention how/where you get the load balancer addresses in the yamls

@mayur2281
Copy link

@dazmc I have made the changes. Pls take a look at it again and share the feedback.
https://notes.mayurbn.site/Blogs/Cilium-Clustermesh-Deployment-on-Helm-or-ArgoCD-on-AKS

@dazmc
Copy link

dazmc commented Apr 25, 2024

@mayur2281 corporate firewall blocks your website so will have to check later.

@samueltorres nice, I tend to set clustermesh.apiserver.tls.auto.enabled to true

  apiserver:
    tls:
      auto:
        enabled: true

Then all the certs are autogenerated for admin, server, client etc... here for mTLS. Any reason you set it to false and provide the certs manually? just interested in another viewpoint

@mayur2281
Copy link

Its set to false because he creates all his certs through the shell script.

@dazmc
Copy link

dazmc commented Apr 25, 2024

@mayur2281 yes i understand, that but I'm trying to understand why you wouldn't just leave it set to auto, then those admin, server, client are autogenerated. it seems like more work as you have to prepare create/store/manage them. You could take all those lines out of the script. I'm sure there is a reason that @samueltorres has done it that way.

@mayur2281
Copy link

Oh yes Surely he'd have his reasons

@samueltorres
Copy link
Contributor

In my case I'd be leveraging Vault as a PKI store, so I tried to emulate the same behaviour here. In my company we install Cilium through Terraform so this is all easy to do from within our Terraform configuration.

@julianwiedmann julianwiedmann removed this from the 1.14 milestone Jun 10, 2024
@mayur2281
Copy link

@dazmc I would like to store the certs in Azure Key Vault and use a Secret store CSI driver to create a secret from there. But there is no option in the values to provide a secret name all I have is tls.ca.cert and tls.ca.key.

@mayur2281
Copy link

@samueltorres Any suggestions on the above comment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. area/documentation Impacts the documentation, including textual changes, sphinx, or other doc generation code. area/helm Impacts helm charts and user deployment experience help-wanted Please volunteer for this by adding yourself as an assignee! kind/enhancement This would improve or streamline existing functionality. pinned These issues are not marked stale by our issue bot.
Projects
None yet
Development

No branches or pull requests