Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop causing increase in certificaterequestpolicies version #341

Closed
hrbasic opened this issue Jan 12, 2024 · 3 comments · Fixed by #353
Closed

Infinite loop causing increase in certificaterequestpolicies version #341

hrbasic opened this issue Jan 12, 2024 · 3 comments · Fixed by #353
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@hrbasic
Copy link

hrbasic commented Jan 12, 2024

Description: I have encountered an issue where an infinite loop is causing an increase in the certificate request policy version. I found that the root cause of the infinite loop is related to the setCertificateRequestPolicyCondition() function in the certificaterequestpolicies.go file.

First problem is here: https://github.com/cert-manager/approver-policy/blob/main/pkg/internal/controllers/certificaterequestpolicies.go#L208, empty list is created which will cause setCertificateRequestPolicyCondition() to always append condition in the list and try to patch object in Reconcile(). Furthermore, I believe if object is patched even when it is already in a ready state, that this behavior leads to an infinite loop, as the object is modified and triggers a new event for the patched object (shouldn't we patch object only if it's not ready?):

// If this update doesn't contain a state transition, we don't update
// the conditions LastTransitionTime to Now()
if existingCondition.Status == condition.Status {
	condition.LastTransitionTime = existingCondition.LastTransitionTime
}

(*conditions)[idx] = condition
return

I've tested fix locally and something like this should be solution:

func (c *certificaterequestpolicies) setCertificateRequestPolicyCondition(
	conditions *[]policyapi.CertificateRequestPolicyCondition,
	generation int64,
	condition policyapi.CertificateRequestPolicyCondition,
) bool {
	condition.LastTransitionTime = &metav1.Time{Time: c.clock.Now()}
	condition.ObservedGeneration = generation

	for idx, existingCondition := range *conditions {
		// Skip unrelated conditions
		if existingCondition.Type != condition.Type {
			continue
		}

		// If this update doesn't contain a state transition, we don't update object
		if existingCondition.Status == condition.Status {
			return false
		}

		(*conditions)[idx] = condition
		return true
	}

	// If we've not found an existing condition of this type, we simply insert
	// the new condition into the slice.
	*conditions = append(*conditions, condition)
	return true
}

Then reconcile should patch object only if it's not in desired state:

func (c *certificaterequestpolicies) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
	result, patch, policyStatus, resultErr := c.reconcileStatusPatch(ctx, req)
	// fmt.Println("Patching object", req.Name)
	if patch {
		// fmt.Println("Patching object", req.Name)
		crp, patch, err := ssa_client.GenerateCertificateRequestPolicyStatusPatch(req.Name, req.Namespace, policyStatus)
		if err != nil {
			err = fmt.Errorf("failed to generate CertificateRequestPolicy.Status patch: %w", err)
			return ctrl.Result{}, utilerrors.NewAggregate([]error{resultErr, err})
		}

		if err := c.client.Status().Patch(ctx, crp, patch, &client.SubResourcePatchOptions{
			PatchOptions: client.PatchOptions{
				FieldManager: "approver-policy",
				Force:        pointer.Bool(true),
			},
		}); err != nil {
			err = fmt.Errorf("failed to apply CertificateRequestPolicy.Status patch: %w", err)
			return ctrl.Result{}, utilerrors.NewAggregate([]error{resultErr, err})
		}
	}

	return result, resultErr
}

How to reproduce:

  1. Start the cert-manager-approver-policy application and monitor the resource version at regular intervals (e.g., every x.y seconds).
  2. Enable higher log verbosity and observe that the object is consistently being patched.

If you confirm that this is indeed a bug, I am willing to assist in fixing it and creating a pull request (PR) if you need assistance.
/kind bug

@jetstack-bot jetstack-bot added the kind/bug Categorizes issue or PR as related to a bug. label Jan 12, 2024
@erikgb
Copy link
Contributor

erikgb commented Jan 12, 2024

@hrbasic Thanks for registering this issue. I haven't observed this behavior, but we had a related bug in trust-manager. This was fixed by cert-manager/trust-manager#260. Please prepare a PR to fix this! Maybe you can be inspired by the tests added in the referenced PR to ensure this bug doesn't come back?

@hrbasic
Copy link
Author

hrbasic commented Jan 12, 2024

Sure, thanks for hint. I'll prepare PR.

@hrbasic
Copy link
Author

hrbasic commented Jan 21, 2024

Hey, I've prepared PR: #353
I would like to share some observations I made during testing.

The issue arises when multiple policies are created. In such cases, policies enter an infinite loop during the patching process. In an attempt to understand the root cause, I created a single policy and introduced a brief pause (a one-second sleep) immediately after the patch operation:

Without Sleep

  • Create event is generated, the policy is assigned a resource version (e.g., 200).
  • The policy is successfully patched, and the resource version increases to 201.
  • Update event is generated, the policy is patched again, but the resource version remains the same at 201.

With Sleep

  • Create event is generated, the policy is assigned a resource version (e.g., 200).
  • The policy is successfully patched, and the resource version increases to 201.
  • Update event is generated, the policy is successfully patched again, and this time the resource version increases to 202.
  • A new update is triggered, leading to an infinite loop.

It appears that when the controller is not busy, it processes events rapidly, preventing the resource version from increasing and creating an infinite loop. However, if the controller is busy, constant updates to the resource version result in an infinite loop.

Anyway, this will fix bug, more info regarding this issue can be found here: kubernetes-sigs/kubebuilder#618

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants