New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support TLS certificate auto-generation using certmanager #17238
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @dungdm93 and thank you for the PR!
I have some suggestions that were required for me to test the PR. Once fixed, here are some steps I did using a kind cluster.
install cert-manager
$ kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.5.3/cert-manager.yaml
Now here we have a "chicken and egg problem": Because the Nodes are in NotReady
state waiting on the CNI, cert-manager Pods are in Pending
state waiting on the Nodes:
$ kubectl describe -n cert-manager pod -l app.kubernetes.io/name=cert-manager
…
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 24s (x13 over 12m) default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
Issuer
Configure a CA issuer named hubble-ca
(inspired from #15443 (comment)):
$ cat <<EOF > hubble-ca-issuer.yaml
---
apiVersion: v1
kind: Secret
metadata:
name: hubble-ca-keypair
namespace: kube-system
data:
tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUMrVENDQWVHZ0F3SUJBZ0lKQUtQR3dLRGwvNUhuTUEwR0NTcUdTSWIzRFFFQkN3VUFNQk14RVRBUEJnTlYKQkFNTUNHcHZjMmgyWVc1c01CNFhEVEU1TURneU1qRTJNRFUxT0ZvWERUSTVNRGd4T1RFMk1EVTFPRm93RXpFUgpNQThHQTFVRUF3d0lhbTl6YUhaaGJtd3dnZ0VpTUEwR0NTcUdTSWIzRFFFQkFRVUFBNElCRHdBd2dnRUtBb0lCCkFRQ3doU0IvcVc2L2tMYjJ6cHUrRUp2RDl3SEZhcStRQS8wSkgvTGxseW83ekFGeCtISHErQ09BYmsrQzhCNHQKL0hVRXNuczVSTDA5Q1orWDRqNnBiSkZkS2R1UHhYdTVaVllua3hZcFVEVTd5ZzdPU0tTWnpUbklaNzIzc01zMApSNmpZbi9Ecmo0eFhNSkVmSFVEcVllU1dsWnIzcWkxRUZhMGM3ZlZEeEgrNHh0WnROTkZPakg3YzZEL3ZXa0lnCldRVXhpd3Vzc2U2S01PV2pEbnYvNFZyamVsMlFnVVlVYkhDeWVaSG1jdGkrSzBMV0Nmby9SZzZQdWx3cmJEa2gKam1PZ1l0MzBwZGhYME9aa0F1a2xmVURIZnA4YmpiQ29JMnRhWUFCQTZBS2pLc08zNUxBRVU3OUNMMW1MVkh1WgpBQ0k1VWppamEzVlBXVkhTd21KUEp5dXhBZ01CQUFHalVEQk9NQjBHQTFVZERnUVdCQlFtbDVkVEFaaXhGS2hqCjkzd3VjUldoYW8vdFFqQWZCZ05WSFNNRUdEQVdnQlFtbDVkVEFaaXhGS2hqOTN3dWNSV2hhby90UWpBTUJnTlYKSFJNRUJUQURBUUgvTUEwR0NTcUdTSWIzRFFFQkN3VUFBNElCQVFCK2tsa1JOSlVLQkxYOHlZa3l1VTJSSGNCdgpHaG1tRGpKSXNPSkhac29ZWGRMbEcxcFpORmpqUGFPTDh2aDQ0Vmw5OFJoRVpCSHNMVDFLTWJwMXN1NkNxajByClVHMWtwUkJlZitJT01UNE1VN3ZSSUNpN1VPbFJMcDFXcDBGOGxhM2hQT2NSYjJ5T2ZGcVhYeVpXWGY0dDBCNDUKdEhpK1pDTkhCOUZ4alNSeWNiR1lWaytUS3B2aEphU1lOTUdKM2R4REthUDcrRHgzWGNLNnNBbklBa2h5SThhagpOVSttdzgvdG1Sa1A0SW4va1hBUitSaTBxVW1Iai92d3ZuazRLbTdaVXkxRllIOERNZVM1TmtzbisvdUhsUnhSClY3RG5uMDM5VFJtZ0tiQXFONzJnS05MbzVjWit5L1lxREFZSFlybjk4U1FUOUpEZ3RJL0svQVRwVzhkWAotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
tls.key: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFb3dJQkFBS0NBUUVBc0lVZ2Y2bHV2NUMyOXM2YnZoQ2J3L2NCeFdxdmtBUDlDUi95NVpjcU84d0JjZmh4CjZ2Z2pnRzVQZ3ZBZUxmeDFCTEo3T1VTOVBRbWZsK0krcVd5UlhTbmJqOFY3dVdWV0o1TVdLVkExTzhvT3praWsKbWMwNXlHZTl0N0RMTkVlbzJKL3c2NCtNVnpDUkh4MUE2bUhrbHBXYTk2b3RSQld0SE8zMVE4Ui91TWJXYlRUUgpUb3grM09nLzcxcENJRmtGTVlzTHJMSHVpakRsb3c1Ny8rRmE0M3Bka0lGR0ZHeHdzbm1SNW5MWXZpdEMxZ242ClAwWU9qN3BjSzJ3NUlZNWpvR0xkOUtYWVY5RG1aQUxwSlgxQXgzNmZHNDJ3cUNOcldtQUFRT2dDb3lyRHQrU3cKQkZPL1FpOVppMVI3bVFBaU9WSTRvMnQxVDFsUjBzSmlUeWNyc1FJREFRQUJBb0lCQUNFTkhET3JGdGg1a1RpUApJT3dxa2UvVVhSbUl5MHlNNHFFRndXWXBzcmUxa0FPMkFDWjl4YS96ZDZITnNlanNYMEM4NW9PbmtrTk9mUHBrClcxVS94Y3dLM1ZpRElwSnBIZ09VNzg1V2ZWRXZtU3dZdi9Fb1V3eHFHRVMvcnB5Z1drWU5WSC9XeGZGQlg3clMKc0dmeVltbXJvM09DQXEyLzNVVVFiUjcrT09md3kzSHdUdTBRdW5FSnBFbWU2RXdzdWIwZzhTTGp2cEpjSHZTbQpPQlNKSXJyL1RjcFRITjVPc1h1Vm5FTlVqV3BBUmRQT1NrRFZHbWtCbnkyaVZURElST3NGbmV1RUZ1NitXOWpqCmhlb1hNN2czbkE0NmlLenUzR0YwRWhLOFkzWjRmeE42NERkbWNBWnphaU1vMFJVaktWTFVqbVlQSEUxWWZVK3AKMkNYb3dNRUNnWUVBMTgyaU52UEkwVVlWaUh5blhKclNzd1YrcTlTRStvVi90U2ZSUUNGU2xsV0d3KzYyblRiVwpvNXpoL1RDQW9VTVNSbUFPZ0xKWU1LZUZ1SWdvTEoxN1pvWjN0U1czTlVtMmRpT0lPSHorcTQxQzM5MDRrUzM5CjkrYkFtVmtaSFA5VktLOEMraS9tek5mSkdHZEJadGIweWtTM2t3OUIxTHdnT3o3MDhFeXFSQ2tDZ1lFQTBXWlAKbzF2MThnV2tMK2FnUDFvOE13eDRPZlpTN3dKY3E0Z0xnUWhjYS9pSkttY0x0RFN4cUJHckJ4UVo0WTIyazlzdQpzTFVrNEJobGlVM29iUUJNaUdtMGtITHVBSEFRNmJvdWZBMUJwZjN2VFdHSkhSRjRMeFJsNzc2akw4UXI4VnpxClpURVBtY0R0T0hpYjdwb2I1Z2IzSDhiVGhYeUhmdGZxRW55alhFa0NnWUVBdk9DdDZZclZhTlQrWThjMmRFYk4Kd3dJOExBaUZtdjdkRjZFUjlCODJPWDRCeGR0WTJhRDFtNTNqN2NaVnpzNzFYOE1TN25FcDN1dkFqaElkbDI3KwpZbTJ1dUUyYVhIbDN5VTZ3RzBETFpUcnVIU0Z5TVI4ZithbHRTTXBDd0s1NXluSGpHVFp6dXpYaVBBbWpwRzdmCk1XbVRncE1IK3puc3UrNE9VNFBHUW9FQ2dZQWNqdUdKbS84YzlOd0JsR2lDZTJIK2JGTHhSTURteStHcm16QkcKZHNkMENqOWF3eGI3aXJ3MytjRGpoRUJMWExKcjA5YTRUdHdxbStrdElxenlRTG92V0l0QnNBcjVrRThlTVVBcAp0djBmRUZUVXJ0cXVWaldYNWlaSTNpMFBWS2ZSa1NSK2pJUmVLY3V3aWZKcVJpWkw1dU5KT0NxYzUvRHF3Yk93CnRjTHAwUUtCZ0VwdEw1SU10Sk5EQnBXbllmN0F5QVBhc0RWRE9aTEhNUGRpL2dvNitjSmdpUmtMYWt3eUpjV3IKU25QSG1TbFE0aEluNGMrNW1lbHBDWFdJaklLRCtjcTlxT2xmQmRtaWtYb2RVQ2pqWUJjNnVGQ1QrNWRkMWM4RwpiUkJQOUNtWk9GL0hOcHN0MEgxenhNd1crUHk5Q2VnR3hhZ0ZCekxzVW84N0xWR2h0VFFZCi0tLS0tRU5EIFJTQSBQUklWQVRFIEtFWS0tLS0tCg==
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: hubble-ca
namespace: kube-system
spec:
ca:
secretName: hubble-ca-keypair
EOF
$ kubectl apply -f hubble-ca-issuer.yaml
secret/hubble-ca-keypair created
Error from server (InternalError): error when creating "hubble-ca-issuer.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.31.2.53:443: connect: connection refused
Here the Secret is created, but the Issuer creation failed.
Install Cilium
Install using the certgen method:
$ helm install cilium ./install/kubernetes/cilium \
--namespace kube-system \
--set image.tag=stable \
--set hubble.relay.image.tag=stable \
--set hubble.ui.backend.image.tag=stable \
--set hubble.ui.frontend.image.tag=stable \
--set operator.image.tag=stable \
--set preflight.image.tag=stable \
--set clustermesh.apiserver.image.tag=latest \
--set nodeinit.enabled=true \
--set kubeProxyReplacement=partial \
--set hostServices.enabled=false \
--set externalIPs.enabled=true \
--set nodePort.enabled=true \
--set hostPort.enabled=true \
--set bpf.masquerade=false \
--set image.pullPolicy=IfNotPresent \
--set ipam.mode=kubernetes \
--set hubble.relay.enabled=true \
--set hubble.tls.auto.method=certmanager \
--set clustermesh.apiserver.tls.auto.method=certmanager \
--set hubble.tls.auto.certManagerIssuerRef.name=hubble-ca \
--set clustermesh.apiserver.tls.auto.certManagerIssuerRef.name=hubble-ca
Error: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": dial tcp 10.31.2.53:443: connect: connection refused
Now at that point all Pods but the Relay Pod are eventually in a Running
state, but the TLS Secrets couldn't be created:
$ kubectl logs -n kube-system ds/cilium | grep subsys=hubble
Found 3 pods, using pod/cilium-4p7xv
level=info msg="Configuring Hubble server" eventQueueSize=8192 maxFlows=4095 subsys=hubble
level=info msg="Starting local Hubble server" address="unix:///var/run/cilium/hubble.sock" subsys=hubble
level=info msg="Waiting for Hubble server TLS certificate and key files to be created" subsys=hubble
$ kubectl describe pod -n kube-system -l k8s-app=hubble-relay
…
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 3m49s (x2 over 8m18s) kubelet Unable to attach or mount volumes: unmounted volumes=[tls], unattached volumes=[tls hubble-sock-dir config]: timed out waiting for the condition
Warning FailedMount 95s (x2 over 6m4s) kubelet Unable to attach or mount volumes: unmounted volumes=[tls], unattached volumes=[hubble-sock-dir config tls]: timed out waiting for the condition
Warning FailedMount 5s (x13 over 10m) kubelet MountVolume.SetUp failed for volume "tls" : secret "hubble-relay-client-certs" not found
Fixing the TLS stuff
Re-creating the Issuer would now work:
$ kubectl apply -f hubble-ca-issuer.yaml
secret/hubble-ca-keypair unchanged
issuer.cert-manager.io/hubble-ca created
Forcing an upgrade would re-try to create the certificate requests:
$ helm upgrade cilium ./install/kubernetes/cilium \
--namespace kube-system \
--set image.tag=stable \
--set hubble.relay.image.tag=stable \
--set hubble.ui.backend.image.tag=stable \
--set hubble.ui.frontend.image.tag=stable \
--set operator.image.tag=stable \
--set preflight.image.tag=stable \
--set clustermesh.apiserver.image.tag=latest \
--set nodeinit.enabled=true \
--set kubeProxyReplacement=partial \
--set hostServices.enabled=false \
--set externalIPs.enabled=true \
--set nodePort.enabled=true \
--set hostPort.enabled=true \
--set bpf.masquerade=false \
--set image.pullPolicy=IfNotPresent \
--set ipam.mode=kubernetes \
--set hubble.relay.enabled=true \
--set hubble.tls.auto.method=certmanager \
--set clustermesh.apiserver.tls.auto.method=certmanager \
--set hubble.tls.auto.certManagerIssuerRef.name=hubble-ca \
--set clustermesh.apiserver.tls.auto.certManagerIssuerRef.name=hubble-ca
Release "cilium" has been upgraded. Happy Helming!
NAME: cilium
LAST DEPLOYED: Mon Aug 30 17:11:10 2021
NAMESPACE: kube-system
STATUS: deployed
REVISION: 2
TEST SUITE: None
NOTES:
You have successfully installed Cilium with Hubble Relay.
Your release version is 1.10.90.
For any further help, visit https://docs.cilium.io/en/v1.10/gettinghelp
Now eventually all TLS Secrets are created, and both Hubble and Relay are up.
tl;dr
I have no clue how to handle the fact that cert-manager needs the CNI to be initialized, but This PR invoke cert-manager's Certificate CRD creation (requiring cert-manager to be ready) when we install Cilium.
install/kubernetes/cilium/templates/clustermesh-apiserver/tls-certmanager/client-secret.yaml
Show resolved
Hide resolved
install/kubernetes/cilium/templates/clustermesh-apiserver/tls-certmanager/remote-secret.yaml
Show resolved
Hide resolved
install/kubernetes/cilium/templates/clustermesh-apiserver/tls-certmanager/server-secret.yaml
Outdated
Show resolved
Hide resolved
install/kubernetes/cilium/templates/hubble/tls-certmanager/relay-client-secret.yaml
Outdated
Show resolved
Hide resolved
install/kubernetes/cilium/templates/hubble/tls-certmanager/relay-client-secret.yaml
Outdated
Show resolved
Hide resolved
install/kubernetes/cilium/templates/hubble/tls-certmanager/server-secret.yaml
Outdated
Show resolved
Hide resolved
install/kubernetes/cilium/templates/hubble/tls-certmanager/server-secret.yaml
Outdated
Show resolved
Hide resolved
install/kubernetes/cilium/templates/hubble/tls-certmanager/server-secret.yaml
Outdated
Show resolved
Hide resolved
install/kubernetes/cilium/templates/hubble/tls-certmanager/ui-client-certs.yaml
Outdated
Show resolved
Hide resolved
install/kubernetes/cilium/templates/hubble/tls-certmanager/ui-client-certs.yaml
Outdated
Show resolved
Hide resolved
To be sure the cilium containers can start prior to cert-manager deploying the certificates, its required that volume mounts that target those secrets are mounted as |
@dungdm93 thanks for addressing the suggestions. However, I think there is a fundamental problem here as we can't expect cert-manager to be functional without network, and thus the Certificate request will fail when we Since there are many ways to setup cert-manager, I think the issue is primarily a documentation issue (we have #13590 to track it). |
Hello @kaworu, it's not exactly chicken-and-egg problem. cilium/install/kubernetes/cilium/templates/cilium-agent/daemonset.yaml Lines 575 to 587 in 1695d9c
So, cilium-agent can be start without TLS and the startup order will be following:
Your above problem come from cert-manager webhook, not cilium or cert-manager it self.
In the CI/CD environment, I think option 2 is most suitable. |
I don't think so, issuer is in
Yes. Where should I update the document? |
It is under the |
The user needs to create an issuer that can issue certificates with our cilium.io internal domain names along with the associated certificate requests. My clusters include the following: apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: selfsigned
spec:
selfSigned: {} apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: cilium-ca
namespace: kube-system
spec:
ca:
secretName: cilium-ca-cert
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: cilium-ca-cert
namespace: kube-system
spec:
secretName: cilium-ca-cert
dnsNames:
- '*.cilium.io'
- cilium.io
keyAlgorithm: ecdsa
keySize: 384
isCA: true
issuerRef:
name: selfsigned
kind: ClusterIssuer These are very specific to CIlium and not a part of cert-manager installation. |
Hello @kaworu, I just update docs on |
@seanmwinn actually, you don't need create a dedicated issuer for cilium. For example in my org, we use a Vault issuer to manage all private certificates (not only for cilium) in multiple environments and clusters as well. And as I mention above, setup issuer should not be in scope of cilium. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating the documentation @dungdm93! I have some comments but overall it's great.
@kaworu update as your suggestion |
Thank you @dungdm93! There is one missed but other than that LGTM. Some warnings that need to be fixed from the deploy/netlify build:
|
test-me-please Job 'Cilium-PR-K8s-1.16-net-next' hit: #17437 (97.22% similarity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed a couple of other minor formatting issues and I have one question about issuer.yaml
below.
.. code-block:: bash | ||
|
||
# We assum cilium installed in kube-system namespace | ||
kubectl label namespace kube-system cert-manager.io/disable-validation=true | ||
|
||
helm install cert-manager ... | ||
kubectl apply -f issuer.yaml | ||
helm install cilium ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these ones, if you want to format them as a shell-session then you need to do:
.. code-block:: bash | |
# We assum cilium installed in kube-system namespace | |
kubectl label namespace kube-system cert-manager.io/disable-validation=true | |
helm install cert-manager ... | |
kubectl apply -f issuer.yaml | |
helm install cilium ... | |
.. code-block:: shell-session | |
$ # We assume cilium installed in kube-system namespace | |
$ kubectl label namespace kube-system cert-manager.io/disable-validation=true | |
$ helm install cert-manager ... | |
$ kubectl apply -f issuer.yaml | |
$ helm install cilium ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, looking at these snippets, I see issuer.yaml
mentioned but I'm not sure where that comes from. How does the user generate issuer.yaml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joestringer I also add a note that where issuer.yaml
is come from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nits :)
test-me-please Job 'Cilium-PR-K8s-1.19-kernel-5.4' failed and has not been observed before, so may be related to your PR: Click to show.Test Name
Failure Output
If it is a flake, comment Job 'Cilium-PR-K8s-1.16-net-next' failed and has not been observed before, so may be related to your PR: Click to show.Test Name
Failure Output
If it is a flake, comment |
Signed-off-by: Đặng Minh Dũng <dungdm93@live.com>
create CA issuer when certManagerIssuerRef is not specify Signed-off-by: Đặng Minh Dũng <dungdm93@live.com>
/test |
test-1.16-net-next run hit #14598. CI 3.0 runs didn't trigger for some reason, retriggering manually. |
/ci-gke |
/ci-eks |
/ci-aks |
/ci-awscni |
test-1.16-netnext job hit what looks like #17401, should be fixed recently in the master tree. Checkpatch warnings don't need to be addressed. All reviews are in. Ready to merge. |
Thanks @joestringer |
It seems we forgot to update `helm-values.rst` in cilium#17238, yielding issues when building documentation locally with `make render-docs`. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
It seems we forgot to update `helm-values.rst` in #17238, yielding issues when building documentation locally with `make render-docs`. Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
This is fourth part of #16792 that add support tls autogen using cert-manager
cert-manager (under CNCF umbrella) has become de-facto way to create and manage certificates.
Compare to other methods that cilium chart currently supported, this has some advantages: