New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add the ability to communicate with Vault via mTLS #6614
Conversation
Hi @rodrigorfk. Thanks for your PR. I'm waiting for a cert-manager member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
8c8464f
to
b120669
Compare
xref: #4330 |
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rodrigorfk
This looks like a great start.
This feature will need documenting and that documentation will help me test this PR, so please open a cert-manager/website PR branched from the release-next
branch,
describing how to configure Vault for mTLS and how to configure the Vault issuer with a client certificate.
- https://github.com/cert-manager/website/blob/release-next/content/docs/configuration/vault.md
Explain how to quickly create a suitable client certificate, using one of openssl, cfssl or step-cli.
Explain whether the client certificate is used for authentication , and if so which of the client certificate fields are used as the username for Vault.
Link to any relevant Vault documentation and tutorials on the subject.
You will need to rebase this PR, because I recently updated the vault E2E code to use vault-client-go rather than vault/api:
Find me in slack or come to one of our development meetings if you want to chat about any of this.
@@ -208,7 +232,7 @@ func (v *VaultInitializer) Init() error { | |||
return fmt.Errorf("error parsing proxy URL: %s", err.Error()) | |||
} | |||
var lastError error | |||
err = wait.PollUntilContextTimeout(context.TODO(), time.Second, 20*time.Second, true, func(ctx context.Context) (bool, error) { | |||
err = wait.PollUntilContextTimeout(context.TODO(), time.Second, 45*time.Second, true, func(ctx context.Context) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain why you increased the timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to avoid a race condition while the Vault e2e addon is starting, here a new Vault addon with mTLS is synchronously started, however the creation of Vault client and the port-forward in the e2e framework was already dispatched asynchronously, the previous 20 seconds was a valid timeout when only one Vault addon was being created, since there is two now, that timeout has to be increased, otherwise the port-forward would fail because the second Vault addon was not yet ready.
// Reference to a Secret containing a PEM-encoded Client Certificate to use when the | ||
// Vault server requires mTLS. | ||
// +optional | ||
ClientCertSecretRef *cmmeta.SecretKeySelector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should say "PEM-encoded Client Certificate chain" because the client needs to present the client certificate and any intermediate certificates that complete the chain to the CA certificate that the Vault server is using to verify the client cert.
// Certificates contains one or more certificate chains to present to the
// other side of the connection. The first certificate compatible with the
// peer's requirements is selected automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder if if should say (and support) "chains" (plural).
Consider the case where cert-manager is connecting to a Vault service backed by a cluster of Vault servers.
If a new client CA certificate has been created and is being rolled out, the vault servers may be using a mixture of original and new client CA certificates, so the clients would each need client certificates chains for both the original and the new client CAs....at least until all the servers were using the new CA.
I don't know if that's a realistic scenario. Please explain briefly how you intend to rotate the client CA in your infrastructure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @wallrj , thanks for your comment, while what you said that a chain of multiple client certificates could be provided, the client certificate don't have to present the CA certificate once again, client certificates only contains the certificate of the client itself, the CA certificate is configured at server side under the tls_client_ca_file
configuration of the TCP listener as documented here, and the server is responsible for validating if any of the client certificates provided matches the CA configured at server side.
Circling back to your suggestion of adding the word "chain" to the description, I can definitely do that, but realistic, based on what I said above, a client certificate is composed by only one certificate.
Please explain briefly how you intend to rotate the client CA in your infrastructure.
- The initial work is done at Server side, you have to update the CA certificate bundle file used in the
tls_client_ca_file
configuration to include the new CA certificate chain as following:- you will keep the existing CA bundle in the file for existing client certificates to keep working
- you will add the new CA bundle to get the server start trusting the new client certificates issued by it.
- Once you have the server updated with the new client CA bundle, you can start updating the client related secrets and adding the new client certificate and private keys, but once again, there is no need for providing the client CA bundled in the client certificate, the server already have it and will be responsible for validating the client certificate.
Thanks for your comments @wallrj , I will make sure I go over each one of them and proper address it. |
@rodrigorfk does this PR overlap with #4330? |
Hi @inteon , thanks for your question, the implementation in both PRs could perhaps look similar, mostly because both cases are talking about client certificates, but they are essentially different cases, what is being implement in the #4330 so far is related to adding a new authentication method based on TLS certificates auth method, which is used to obtain the Vault Token from the As I've mentioned in #6613, there was other issues, already closed, like #4774, #2926 and #4256 which were requesting a case similar to mine, once again, |
Thank you for the extra information. |
b120669
to
e115adf
Compare
Signed-off-by: Rodrigo Fior Kuntzer <rodrigo@miro.com>
e115adf
to
5b11ec3
Compare
Hi @wallrj and @inteon , thanks again for your initial review and comments, is there anything else you would like me to change or clarify? I would be willing to help in anyway possible to help you reviewing this change, FYI, I have been running internally in my organization a custom build of cert-manger v1.13.3 containing those changes and it is working pretty well. |
@rodrigorfk Great to hear you are actively testing this using real use cases. I think @wallrj said that he would review this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rodrigorfk Sorry for delay.
I've finally got round to testing this and can confirm that it works for me with a Kind cluster and a Vault server manually installed and running in-cluster.
It seems that you have adjusted the E2E test framework to install two Vault servers:
the original vault server and a new one configured to only accept connections with client certificates.
And I think you have modified one set of conformance tests to target the mTLS server.
Have I understood correctly and is that what you intended?
I disagree with that change because it conflates the testing of two features.
And it is not clear from the test output that Vault + client certificates
is being tested at all.
I suggest creating one new clearly documented E2E test for Vault + Client Cert and placing it in a new file in:
el = append(el, field.Invalid(fldPath.Child("clientKeySecretRef"), "<snip>", "clientKeySecretRef must be provided when defining the clientCertSecretRef")) | ||
} else if iss.ClientCertSecretRef == nil && iss.ClientKeySecretRef != nil { | ||
el = append(el, field.Invalid(fldPath.Child("clientCertSecretRef"), "<snip>", "clientCertSecretRef must be provided when defining the clientKeySecretRef")) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I observed this error while testing 👍
$ kubectl apply -f vault-issuer.yaml
...
Resource: "cert-manager.io/v1, Resource=issuers", GroupVersionKind: "cert-manager.io/v1, Kind=Issuer"
Name: "vault-issuer", Namespace: "application-1"
for: "vault-issuer.yaml": error when patching "vault-issuer.yaml": admission webhook "webhook.cert-manager.io" denied the request: spec.vault.clientKeySecretRef: Invalid value: "": clientKeySecretRef must be provided when defining the clientCertSecretRef
secretPrivateKey, err := v.secretsLister.Secrets(v.namespace).Get(refPrivateKey.Name) | ||
if err != nil { | ||
return nil, fmt.Errorf("could not access Secret '%s/%s': %s", v.namespace, refPrivateKey.Name, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning an error here will cause the reconciler to re-reconcile this Issuer periodically, with backoff.
So that when the Secret is eventually created or when the necessary permissions are eventually granted to the cert-manager controller, cert-manager will eventually have another attempt at loading the client key and cert.
This error message appears in the Issuer / ClusterIssuer status and in the Events associated with that Issuer
and it gives enough information for the user to know how to fix the problem
$ kubectl describe issuer -n application-1
...
Status:
Conditions:
Last Transition Time: 2024-02-14T13:55:36Z
Message: Failed to initialize Vault client: failed to load vault client certificate: could not access Secret 'application-1/vault-client-tls': secrets "vault-client-tls" not found
Observed Generation: 3
Reason: VaultError
Status: False
Type: Ready
Warning ErrInitIssuer 16s (x2 over 16s) cert-manager-issuers Error initializing issuer: failed to load vault client certificate: could not access Secret 'application-1/vault-client-tls': secrets "vault-client-tls" not found
In theory, we could program the Issuer reconciler to be triggered by modifications to any Secrets that are referred to by an Issuer resource.
Then we could return nil here and avoid the unnecessary work created by the error backoff loop.
And the Issuer or ClusterIssuer would immediatly be reconciled when the Secret was created or updated,
rather than waiting for up to 5 minutes (depending on how much the backoff has increased).
But that's an improvement which can be implemented in another PR.
The caBundleSecretRef
was implemented in exactly the way you've used here.
keyPrivate = refPrivateKey.Key | ||
} else { | ||
keyPrivate = corev1.TLSPrivateKeyKey | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked this and observed that in the absence of a key field,
the defaults of tls.crt
and tls.key
were used.
And the documentation of the CRD alludes to this.
$ kubectl explain issuer.spec.vault.clientKeySecretRef
GROUP: cert-manager.io
KIND: Issuer
VERSION: v1
FIELD: clientKeySecretRef <Object>
DESCRIPTION:
Reference to a Secret containing a PEM-encoded Client Private Key to use
when the Vault server requires mTLS.
FIELDS:
key <string>
The key of the entry in the Secret resource's `data` field to be used. Some
instances of this field may be defaulted, in others it may be required.
return nil, fmt.Errorf("could not parse the TLS certificate from Secrets '%s/%s'(cert) and '%s/%s'(key): %s", v.namespace, refCert.Name, v.namespace, refPrivateKey.Name, err) | ||
} | ||
return &cert, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I observed this error by adding a random string to the key file in the Secret
Status:
Conditions:
Last Transition Time: 2024-02-14T13:55:36Z
Message: Failed to initialize Vault client: failed to load vault client certificate: could not parse the TLS certificate from Secrets 'application-1/vault-client-tls'(cert) and 'application-1/vault-client-tls'(key): tls: failed to find any PEM data in key input
Observed Generation: 4
Reason: VaultError
Status: False
Type: Ready
Warning ErrInitIssuer 17s (x2 over 17s) cert-manager-issuers Error initializing issuer: failed to load vault client certificate: could not parse the TLS certificate from Secrets 'application-1/vault-client-tls'(cert) and 'application-1/vault-client-tls'(key): tls: failed to find any PEM data in key input
{ | ||
Key: "server.extraEnvironmentVars.VAULT_CLIENT_KEY", | ||
Value: "/vault/tls/client.key", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This came in handy while testing your documentation for this feature,
I set up my own Vault server in a Kind cluster and after configuring vault to require client certificates I noticed the readiness probes were failing.
Adding these env vars and mounting the client cert and key solved the problem.
This was my minimal Helm values file:
global:
tlsDisable: false
injector:
enabled: false
server:
dataStorage:
enabled: false
standalone:
enabled: true
config: |
listener "tcp" {
address = "[::]:8200"
cluster_address = "[::]:8201"
tls_disable = false
tls_client_ca_file = "/vault/tls/client-ca.crt"
tls_cert_file = "/vault/tls/server.crt"
tls_key_file = "/vault/tls/server.key"
tls_require_and_verify_client_cert = true
}
extraArgs: "-dev-tls -dev-listen-address=[::]:8202"
extraEnvironmentVars:
VAULT_TLSCERT: /vault/tls/server.crt
VAULT_TLSKEY: /vault/tls/server.key
VAULT_CLIENT_CERT: /vault/tls/client.crt
VAULT_CLIENT_KEY: /vault/tls/client.key
volumes:
- name: vault-tls
secret:
defaultMode: 420
secretName: vault-tls
volumeMounts:
- mountPath: /vault/tls
name: vault-tls
readOnly: true
@wallrj thanks for circling back to this and for the thoughtful review.
yes, this statement is completely correct.
sure thing, the suggestion completely make sense, I was actually not sure if changing one existing conformance test would make it clear or not, I will follow your advise here and provide the test in a new file. |
Signed-off-by: Rodrigo Fior Kuntzer <rodrigo@miro.com>
5b11ec3
to
0e51dc7
Compare
@wallrj thanks once again for your review, I have addressed your suggestion related to move the mTLS related e2e test and created the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rodrigorfk
I tested this again using the latest website documentation,
and it all seems to work well.
Let's merge it and I'll ask you to do some more testing of this feature when we release the first cert-manager 1.15 alpha pre-release.
And at that time, perhaps you'd help us to write a nice release note for this feature too.
We appreciate your contribution.
/lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great. I can see the output of this new E2E test in the logs:
[cert-manager] Vault Issuer [mtls] should fail to init with missing Vault Token
/home/prow/go/src/github.com/cert-manager/cert-manager/test/e2e/suite/issuers/vault/mtls.go:165
Timeline >>
STEP: Creating a kubernetes client @ 02/15/24 17:35:25.635
STEP: Creating an API extensions client @ 02/15/24 17:35:25.637
STEP: Creating a cert manager client @ 02/15/24 17:35:25.637
STEP: Creating a controller-runtime client @ 02/15/24 17:35:25.637
STEP: Creating a gateway-api client @ 02/15/24 17:35:25.637
STEP: Building a namespace api object @ 02/15/24 17:35:25.637
STEP: Using the namespace e2e-tests-create-vault-issuer-99xmf @ 02/15/24 17:35:25.646
STEP: Building a ResourceQuota api object @ 02/15/24 17:35:25.646
STEP: Configuring the Vault server @ 02/15/24 17:35:25.664
STEP: creating a service account for Vault authentication @ 02/15/24 17:35:29.948
STEP: creating a client certificate for Vault mTLS @ 02/15/24 17:35:29.989
STEP: Creating an Issuer @ 02/15/24 17:35:30.023
STEP: Waiting for Issuer to become Ready @ 02/15/24 17:35:30.055
Feb 15 17:35:30.055: INFO: Waiting for issuer test-vault-issuer condition v1.IssuerCondition{Type:"Ready", Status:"False", LastTransitionTime:<nil>, Reason:"", Message:"", ObservedGeneration:0}
Feb 15 17:35:30.563: INFO: Waiting for issuer test-vault-issuer condition v1.IssuerCondition{Type:"Ready", Status:"False", LastTransitionTime:<nil>, Reason:"", Message:"", ObservedGeneration:0} (took 0s)
STEP: Cleaning up AppRole @ 02/15/24 17:35:30.563
STEP: Cleaning up Kubernetes @ 02/15/24 17:35:30.613
STEP: Cleaning up Vault @ 02/15/24 17:35:30.633
STEP: Deleting test namespace @ 02/15/24 17:35:30.661
<< Timeline
------------------------------
• [31.208 seconds]
[cert-manager] Vault Issuer [mtls] should be ready with a valid serviceAccountRef
/home/prow/go/src/github.com/cert-manager/cert-manager/test/e2e/suite/issuers/vault/mtls.go:440
Timeline >>
STEP: Creating a kubernetes client @ 02/15/24 17:34:59.497
STEP: Creating an API extensions client @ 02/15/24 17:34:59.498
STEP: Creating a cert manager client @ 02/15/24 17:34:59.498
STEP: Creating a controller-runtime client @ 02/15/24 17:34:59.498
STEP: Creating a gateway-api client @ 02/15/24 17:34:59.498
STEP: Building a namespace api object @ 02/15/24 17:34:59.498
STEP: Using the namespace e2e-tests-create-vault-issuer-hkxj2 @ 02/15/24 17:34:59.504
STEP: Building a ResourceQuota api object @ 02/15/24 17:34:59.504
STEP: Configuring the Vault server @ 02/15/24 17:34:59.519
STEP: creating a service account for Vault authentication @ 02/15/24 17:35:29.842
STEP: creating a client certificate for Vault mTLS @ 02/15/24 17:35:29.902
STEP: Creating the Role and RoleBinding to let cert-manager use TokenRequest for the ServiceAccount @ 02/15/24 17:35:29.938
STEP: Creating an Issuer @ 02/15/24 17:35:30.004
STEP: Waiting for Issuer to become Ready @ 02/15/24 17:35:30.056
Feb 15 17:35:30.056: INFO: Waiting for issuer test-vault-issuer condition v1.IssuerCondition{Type:"Ready", Status:"True", LastTransitionTime:<nil>, Reason:"", Message:"", ObservedGeneration:0}
Feb 15 17:35:30.565: INFO: Waiting for issuer test-vault-issuer condition v1.IssuerCondition{Type:"Ready", Status:"True", LastTransitionTime:<nil>, Reason:"", Message:"", ObservedGeneration:0} (took 0s)
STEP: Cleaning up AppRole @ 02/15/24 17:35:30.612
STEP: Cleaning up Kubernetes @ 02/15/24 17:35:30.652
STEP: Cleaning up Vault @ 02/15/24 17:35:30.657
STEP: Deleting test namespace @ 02/15/24 17:35:30.691
<< Timeline
------------------------------
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wallrj The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Pull Request Motivation
Vault server can be configured to strictly enforce clients to present client certificates while connecting to the server in the HTTPs transport layer.
It is possible to configure Vault to use a client certificate to secure the transport layer (tcp listener documentation):
When Vault is configured in the way above, there is no possibility to properly configure Vault Issuer/ClusterIssuer by using existing CRDs.
Fixes #6613
Kind
feature
Release Note