Skip to content

✨ Add CertificateValidityPeriod and CACertificateValidityPeriod to KubeadmConfig #12335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Karthik-K-N
Copy link
Contributor

@Karthik-K-N Karthik-K-N commented Jun 9, 2025

What this PR does / why we need it:

Adds CertificateValidityPeriod and CACertificateValidityPeriod to KubeadmConfig

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #12289

/area provider/bootstrap-kubeadm

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 9, 2025
@k8s-ci-robot k8s-ci-robot added do-not-merge/needs-area PR is missing an area label size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 9, 2025
@Karthik-K-N
Copy link
Contributor Author

/hold

Added CertificateValidityPeriod to KubeadmConfig if the changes (especially conversions) are in right direction will do the similar changes for CACertificateValidityPeriod as well.

@sbueringer Please let me know your opinion. Thanks.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 9, 2025
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 10, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 11, 2025
@Karthik-K-N
Copy link
Contributor Author

Added both the fields, Need to make changes to honor those fields while generating CA.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 11, 2025
@sbueringer
Copy link
Member

@Karthik-K-N Sorry can you rebase again? I think now we got all PRs merged for now. I'll look into the PR afterwards

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 12, 2025
@Karthik-K-N
Copy link
Contributor Author

@Karthik-K-N Sorry can you rebase again? I think now we got all PRs merged for now. I'll look into the PR afterwards

No issues, Thanks, I just rebased the PR, I have added both the fields with their conversions, I will just try to work on how to use these fields while generating certificate

@Karthik-K-N
Copy link
Contributor Author

/hold cancel

Ready for initial review, UT is pending

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 12, 2025
@sbueringer sbueringer added area/provider/control-plane-kubeadm Issues or PRs related to KCP area/provider/bootstrap-kubeadm Issues or PRs related to CAPBK labels Jun 12, 2025
@k8s-ci-robot k8s-ci-robot removed do-not-merge/needs-area PR is missing an area label labels Jun 12, 2025
@sbueringer
Copy link
Member

/test pull-cluster-api-verify-main
/test pull-cluster-api-e2e-main

Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this

obj.CertificateValidityPeriod = nil

if obj.CertificateValidityPeriod != nil {
obj.CertificateValidityPeriod = ptr.To[metav1.Duration](metav1.Duration{Duration: time.Duration(c.Int31()) * time.Second})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be changed to time.Days when we change the fields on the v1beta2 api to *Days

Please check if max of c.Int31() * time.Days is already more than max of Duration. Then we don't have to do anything here.

We only had to add this for other cases because it was possible to have a Duration which was to high to be converted to Seconds int32

obj.CertificateValidityPeriod = nil

if obj.CertificateValidityPeriod != nil {
obj.CertificateValidityPeriod = ptr.To[metav1.Duration](metav1.Duration{Duration: time.Duration(c.Int31()%24) * time.Hour * 24})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is no built in methods in time package to use time.Days(), I was facing round trip errors while converting hours to Days back and forth due to round off of minute field.
So added this way to always get whole hours rather than in minutes.

Please suggest if there are any better ways to handle this.

@JoelSpeed
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 30, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: e5fdec0cee28873da6d595d3c1e08f0db9640aba

@sbueringer
Copy link
Member

Will try to take another look soon (I currently assume worst case next week)

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 1, 2025
@sbueringer
Copy link
Member

@Karthik-K-N Can you rebase the PR please? :) I'll review directly afterwards

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 8, 2025
@k8s-ci-robot k8s-ci-robot requested a review from JoelSpeed July 8, 2025 14:34
@k8s-ci-robot
Copy link
Contributor

New changes are detected. LGTM label has been removed.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 8, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from joelspeed. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Karthik-K-N
Copy link
Contributor Author

@Karthik-K-N Can you rebase the PR please? :) I'll review directly afterwards

Done. Thanks

@sbueringer
Copy link
Member

/test pull-cluster-api-e2e-main

@sbueringer sbueringer added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jul 9, 2025
Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much!

@@ -180,6 +180,20 @@ type ClusterConfiguration struct {
// featureGates enabled by the user.
// +optional
FeatureGates map[string]bool `json:"featureGates,omitempty"`

// certificateValidityPeriodDays specifies the validity period for a non-CA certificate generated by kubeadm.
// If not specified, kubeadm will use a default of 3650 days (10 years).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If not specified, kubeadm will use a default of 3650 days (10 years).
// If not specified, kubeadm will use a default of 365 days (1 year).

// If not specified, kubeadm will use a default of 3650 days (10 years).
// +optional
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=10950
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// +kubebuilder:validation:Maximum=10950
// +kubebuilder:validation:Maximum=1095

APIserver won't survive more than 3 years + 1 day => See kubernetes/kubernetes#130047 (comment)

// +kubebuilder:validation:Maximum=10950
CertificateValidityPeriodDays int32 `json:"certificateValidityPeriodDays,omitempty"`

// caCertificateValidityPeriodDays specifies the validity period for a CA certificate generated by kubeadm.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// caCertificateValidityPeriodDays specifies the validity period for a CA certificate generated by kubeadm.
// caCertificateValidityPeriodDays specifies the validity period for CA certificates generated by kubeadm.

@@ -180,6 +180,20 @@ type ClusterConfiguration struct {
// featureGates enabled by the user.
// +optional
FeatureGates map[string]bool `json:"featureGates,omitempty"`

// certificateValidityPeriodDays specifies the validity period for a non-CA certificate generated by kubeadm.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// certificateValidityPeriodDays specifies the validity period for a non-CA certificate generated by kubeadm.
// certificateValidityPeriodDays specifies the validity period for non-CA certificates generated by kubeadm.

// If not specified, kubeadm will use a default of 3650 days (10 years).
// +optional
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=10950
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// +kubebuilder:validation:Maximum=10950
// +kubebuilder:validation:Maximum=36500

Let's use 100 years instead of 30 years. Rotating CA certificates is very painful today.


// certificateValidityPeriodDays specifies the validity period for a non-CA certificate generated by kubeadm.
// If not specified, kubeadm will use a default of 3650 days (10 years).
// +optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// +optional
// This field is only supported with Kubernetes v1.31 or above.
// +optional

(This only applies to this field, as the other field is implemented by CAPI controllers)

Comment on lines 218 to 219
obj.CertificateValidityPeriodDays %= 24
obj.CACertificateValidityPeriodDays %= 24
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this avoids an overflow in metav1.Duration. Can we use this instead?

	obj.CertificateValidityPeriodDays = c.Int31n(3 * 365 + 1)
	obj.CACertificateValidityPeriodDays = c.Int31n(100 * 365 + 1)

This limits the days to our supported min/max range

obj.CACertificateValidityPeriod = nil
obj.CertificateValidityPeriod = nil

obj.CertificateValidityPeriod = ptr.To[metav1.Duration](metav1.Duration{Duration: time.Duration(c.Int31()%24+1) * time.Hour * 24})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar here. Should we use this?

	obj.CertificateValidityPeriod = ptr.To[metav1.Duration](metav1.Duration{Duration: time.Duration(c.Int31n(3*365)+1) * time.Hour * 24})
	obj.CACertificateValidityPeriod = ptr.To[metav1.Duration](metav1.Duration{Duration: time.Duration(c.Int31n(100*365)+1) * time.Hour * 24})

Currently we only use 1-24 days

@@ -740,6 +740,14 @@ func TestKubeadmControlPlaneValidateUpdate(t *testing.T) {
unsetTimeouts.Spec.KubeadmConfigSpec.InitConfiguration.Timeouts = nil
unsetTimeouts.Spec.KubeadmConfigSpec.JoinConfiguration.Timeouts = nil

certificateValidityPeriod := before.DeepCopy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also set this and CACertificateValidityPeriodDays in before please

CertificateValidityPeriodDays int32 `json:"certificateValidityPeriodDays,omitempty"`

// caCertificateValidityPeriodDays specifies the validity period for a CA certificate generated by kubeadm.
// If not specified, kubeadm will use a default of 3650 days (10 years).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If not specified, kubeadm will use a default of 3650 days (10 years).
// If not specified, Cluster API will use a default of 3650 days (10 years).
// This field cannot be modified.

(It is us in this case, not kubeadm)

@@ -179,6 +179,7 @@ func (webhook *KubeadmControlPlane) ValidateUpdate(_ context.Context, oldObj, ne
{spec, kubeadmConfigSpec, clusterConfiguration, controllerManager, "*"},
{spec, kubeadmConfigSpec, clusterConfiguration, scheduler},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need additional validation (+ corresponding test coverage)

validateRolloutBefore:

  • If rolloutBefore.CertificatesExpiryDays is set we have to ensure certificateValidityPeriodDays > rolloutBefore.CertificatesExpiryDays.
  • Otherwise control plane Machines are continuously rolled out because the cert expires too soon
  • Example: CertificatesExpiryDays is 14 and certificateValidityPeriodDays is 10 => certs will only be valid for 10 days, rolloutBefore triggers because certs are valid <= 14 days

validateKubeadmControlPlaneSpec

  • CertificateValidityPeriodDays <= CACertificateValidityPeriodDays
    • if CACertificateValidityPeriodDays is not set (==0) CertificateValidityPeriodDays <= the default value for CACertificateValidityPeriodDays

@@ -180,6 +180,20 @@ type ClusterConfiguration struct {
// featureGates enabled by the user.
// +optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[had to put the comment somewhere]

I tested changes to certificateValidityPeriodDays, unfortunately it doesn't work as expected

A change to certificateValidityPeriodDays triggers a rollout but new Machines are not using the new value of certificateValidityPeriodDays.

I think we have to update the kubeadm-config ConfigMap in the wl cluster so kubeadm picks up the change.

We should add a new mutator somewhere around here:

workloadCluster.UpdateImageRepositoryInKubeadmConfigMap(imageRepository),

This can be implemented similar to UpdateImageRepositoryInKubeadmConfigMap (please also add test coverage). I think we can simply set whatever is set on controlPlane.KCP.Spec.KubeadmConfigSpec.ClusterConfiguration.CertificateValidityPeriodDays.

Let me know if you have any questions :)

@k8s-ci-robot
Copy link
Contributor

@Karthik-K-N: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-e2e-main 9582a44 link true /test pull-cluster-api-e2e-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@sbueringer
Copy link
Member

sbueringer commented Jul 9, 2025

Re: https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api/12335/pull-cluster-api-e2e-main/1942878337315639296

e2e tests are fine, the one failure is currently expected

@Karthik-K-N
Copy link
Contributor Author

Addressed all the review comments, Please take a look. Please let me know if I need to add any more test cases. thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/provider/bootstrap-kubeadm Issues or PRs related to CAPBK area/provider/control-plane-kubeadm Issues or PRs related to KCP cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow users to set CertificateValidityPeriod
5 participants