New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
certificatemanager,daemon: Modularized the certificate manager #23132
certificatemanager,daemon: Modularized the certificate manager #23132
Conversation
e1e74d5
to
373c4e0
Compare
GetTLSContext(ctx context.Context, tlsCtx *api.TLSContext, ns string) (ca, public, private string, err error) | ||
} | ||
|
||
type SecretManager interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, looks like we have no tests for this. Perhaps worth writing them as these seem fairly straightforward.
Also I wonder how frequently we're calling GetSecrets
?t It's every time doing a GetSecrets() API call to the api-server... and furthermore I wonder how well we're handling temporary failures of said API call? Following the references of GetTLSContext leads to some scary places that seem to imply we get here often if we have L7 rules. Though only if CertDirectory is not set/readable. I might be missing the point here.
Ping @jrajahalme. Worth refactoring this to cache replies? We could also maintain an up-to-date store of secrets (k8s + file) on the side (Resource[api.Secret]
from CoreV1().Secrets("")
though maybe all of it is too much?) and never do actual api-server or file system operations on the hot path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good concerns. That also makes me wonder if we currently deal with changes to these secrets. You would think that if the secret updates, we should also update the rules that use them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At what point do we just want a controller watching for changes on secrets we use (not sure when the overhead of that is worth the simple approach)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we do have to call out with a Get to apiserver, I wonder why we are not writing out the secret to the file (there's also the question of race conditions, but I'm not exactly sure where this file gets written to initially...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should refactor this functionality to reuse support for secrets we added for CEC CRDs, i.e. converting k8s secrets to Envoy secrets and letting Envoy to fetch them from Cilium agent via Secret Discovery Service rather than sending them as inline data in the Network Policy for Envoy.
373c4e0
to
57635ae
Compare
/test |
57635ae
to
87daf3a
Compare
87daf3a
to
9988b2b
Compare
|
||
type SecretManager interface { | ||
GetSecrets(ctx context.Context, secret *api.Secret, ns string) (string, map[string][]byte, error) | ||
GetSecretString(ctx context.Context, secret *api.Secret, ns string) (string, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really a review point, but GetSecretsString doesn't actually need any internal access to the *CertManager, it could just be a standalone helper func.
GetTLSContext(ctx context.Context, tlsCtx *api.TLSContext, ns string) (ca, public, private string, err error) | ||
} | ||
|
||
type SecretManager interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we do have to call out with a Get to apiserver, I wonder why we are not writing out the secret to the file (there's also the question of race conditions, but I'm not exactly sure where this file gets written to initially...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k8s looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! seems like the CI failures are related to the new buildx v0.10 and a rebase on master should let them proceed
9988b2b
to
64fc5be
Compare
/test |
64fc5be
to
a1fdbb9
Compare
/test Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good as far as I can tell. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should refactor how k8s secrets are synced to Envoy, but that does not need to delay this PR.
GetTLSContext(ctx context.Context, tlsCtx *api.TLSContext, ns string) (ca, public, private string, err error) | ||
} | ||
|
||
type SecretManager interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should refactor this functionality to reuse support for secrets we added for CEC CRDs, i.e. converting k8s secrets to Envoy secrets and letting Envoy to fetch them from Cilium agent via Secret Discovery Service rather than sending them as inline data in the Network Policy for Envoy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏
a1fdbb9
to
0da9000
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This commit added a Cell for the certificate manager. The cell exposes two new interfaces `CertificateManager` and `SecretManager` instead of the manager directly. The configuration/flags for the manager has been moved into the Cell. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
0da9000
to
5b01a67
Compare
/test |
@@ -137,25 +138,26 @@ type Repository struct { | |||
// PolicyCache tracks the selector policies created from this repo | |||
policyCache *PolicyCache | |||
|
|||
certManager CertificateManager | |||
certManager certificatemanager.CertificateManager | |||
secretManager certificatemanager.SecretManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this struct field is not set in NewPolicyRepository
below, which leads to TLS policies being broken. The respective test is currently quarantined, thus the issue was not caught on the PR. I've opened #23895 with a fix.
This PR added a Cell for the certificate manager. The cell exposes two new interfaces
CertificateManager
andSecretManager
instead of the manager directly.The configuration/flags for the manager has been moved into the Cell.