Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

certificatemanager,daemon: Modularized the certificate manager #23132

Merged

Conversation

dylandreimerink
Copy link
Member

This PR added a Cell for the certificate manager. The cell exposes two new interfaces CertificateManager and SecretManager instead of the manager directly.

The configuration/flags for the manager has been moved into the Cell.

@dylandreimerink dylandreimerink added kind/enhancement This would improve or streamline existing functionality. release-note/misc This PR makes changes that have no direct user impact. labels Jan 17, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Jan 17, 2023
@dylandreimerink dylandreimerink removed the kind/community-contribution This was a contribution made by a community member. label Jan 17, 2023
@dylandreimerink dylandreimerink force-pushed the feature/modularize-cert-manager branch 2 times, most recently from e1e74d5 to 373c4e0 Compare January 17, 2023 15:23
pkg/crypto/certificatemanager/certificate_manager.go Outdated Show resolved Hide resolved
GetTLSContext(ctx context.Context, tlsCtx *api.TLSContext, ns string) (ca, public, private string, err error)
}

type SecretManager interface {
Copy link
Contributor

@joamaki joamaki Jan 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, looks like we have no tests for this. Perhaps worth writing them as these seem fairly straightforward.

Also I wonder how frequently we're calling GetSecrets?t It's every time doing a GetSecrets() API call to the api-server... and furthermore I wonder how well we're handling temporary failures of said API call? Following the references of GetTLSContext leads to some scary places that seem to imply we get here often if we have L7 rules. Though only if CertDirectory is not set/readable. I might be missing the point here.

Ping @jrajahalme. Worth refactoring this to cache replies? We could also maintain an up-to-date store of secrets (k8s + file) on the side (Resource[api.Secret] from CoreV1().Secrets("") though maybe all of it is too much?) and never do actual api-server or file system operations on the hot path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good concerns. That also makes me wonder if we currently deal with changes to these secrets. You would think that if the secret updates, we should also update the rules that use them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At what point do we just want a controller watching for changes on secrets we use (not sure when the overhead of that is worth the simple approach)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we do have to call out with a Get to apiserver, I wonder why we are not writing out the secret to the file (there's also the question of race conditions, but I'm not exactly sure where this file gets written to initially...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should refactor this functionality to reuse support for secrets we added for CEC CRDs, i.e. converting k8s secrets to Envoy secrets and letting Envoy to fetch them from Cilium agent via Secret Discovery Service rather than sending them as inline data in the Network Policy for Envoy.

@dylandreimerink
Copy link
Member Author

/test


type SecretManager interface {
GetSecrets(ctx context.Context, secret *api.Secret, ns string) (string, map[string][]byte, error)
GetSecretString(ctx context.Context, secret *api.Secret, ns string) (string, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really a review point, but GetSecretsString doesn't actually need any internal access to the *CertManager, it could just be a standalone helper func.

GetTLSContext(ctx context.Context, tlsCtx *api.TLSContext, ns string) (ca, public, private string, err error)
}

type SecretManager interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we do have to call out with a Get to apiserver, I wonder why we are not writing out the secret to the file (there's also the question of race conditions, but I'm not exactly sure where this file gets written to initially...)

Copy link
Contributor

@tommyp1ckles tommyp1ckles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k8s looks good

Copy link
Contributor

@thorn3r thorn3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! seems like the CI failures are related to the new buildx v0.10 and a rebase on master should let them proceed

@dylandreimerink
Copy link
Member Author

/test

@dylandreimerink
Copy link
Member Author

dylandreimerink commented Jan 30, 2023

/test

Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed:

Click to show.

Test Name

K8sDatapathServicesTest Checks N/S loadbalancing Tests with XDP, vxlan tunnel, SNAT and Random

Failure Output

FAIL: Can not connect to service "tftp://[fd04::11]:31225/hello" from outside cluster (9/10)

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next so I can create one.

Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as far as I can tell. Thanks!

Copy link
Member

@jrajahalme jrajahalme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should refactor how k8s secrets are synced to Envoy, but that does not need to delay this PR.

GetTLSContext(ctx context.Context, tlsCtx *api.TLSContext, ns string) (ca, public, private string, err error)
}

type SecretManager interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should refactor this functionality to reuse support for secrets we added for CEC CRDs, i.e. converting k8s secrets to Envoy secrets and letting Envoy to fetch them from Cilium agent via Secret Discovery Service rather than sending them as inline data in the Network Policy for Envoy.

Copy link
Contributor

@ldelossa ldelossa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

Copy link
Contributor

@chancez chancez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

This commit added a Cell for the certificate manager. The cell exposes
two new interfaces `CertificateManager` and `SecretManager` instead
of the manager directly.

The configuration/flags for the manager has been moved into the Cell.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
@dylandreimerink
Copy link
Member Author

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Feb 17, 2023
@pchaigno pchaigno merged commit ab7003b into cilium:master Feb 17, 2023
@@ -137,25 +138,26 @@ type Repository struct {
// PolicyCache tracks the selector policies created from this repo
policyCache *PolicyCache

certManager CertificateManager
certManager certificatemanager.CertificateManager
secretManager certificatemanager.SecretManager
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this struct field is not set in NewPolicyRepository below, which leads to TLS policies being broken. The respective test is currently quarantined, thus the issue was not caught on the PR. I've opened #23895 with a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement This would improve or streamline existing functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet