-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conflict errors on certificaterequest updates with kubernetes 1.25 #5982
Comments
I have the same issue in EKS v1.23 |
Does it actually fail to issue certificates? Resource conflicts happen on normal cert-manager operattions as multiple control loops modify the same resources, see i.e #5060 (comment). You can enable ServerSideApply if you'd like to avoid those issues. We do not directly modify managed fields. |
From my side it was a configuration error that aws-privateca-issuer couldn't access the secretmanager properly. This resulted in this error - which wasn't clear on the first sight |
Sorry for the delay, we thought we fixed the issue with a workaround but it happened again multiple times recently.
Let me provide some more facts:
The duplicate requests, for what I understand, are created because the certificaterequest resource is modified in some way, by the control loops. Does this behaviour suggest you some identifiable or known issues? (such as with the operator framework used by the sample custom issuer). Can you please explain this sentence? @irbekrm
We will look into the ServerSideApply but it doesn't seem a trivial change. |
Issues go stale after 90d of inactivity. |
Stale issues rot after 30d of inactivity. |
We rewrote the retryOnConflict function and since we have not observed the bug again. err := retry.RetryOnConflict(retry.DefaultRetry, func() error {
// need to refetch the resource on every try, since
// if we get a conflict on the last update attempt then we need to get
// the current version before updating the certificaterequest.
var newCertReq cmapi.CertificateRequest
log.Info("Starting RetryOnConflict, requestID is " + strconv.Itoa(int(reqStatus.RequestID)))
if err := r.Get(ctx, req.NamespacedName, &newCertReq); err != nil {
log.Error(err, "Couldn't fetch certificateRequest resource")
return err
}
// Copy the spec, the annotations and the status from the resource that went through the reconcile loop
// Copy the annotations in any case, the spec and status only if not error
// Copying the annotations is necessary to avoid losing the RequestID
newCertReq.SetAnnotations(certificateRequest.Annotations)
if receivederr != webra.ErrCallingWebRA {
newCertReq.Spec = certificateRequest.Spec
newCertReq.Status = certificateRequest.Status
}
// Try to update the status using an UPDATE API call
err := r.Status().Update(ctx, newCertReq.DeepCopy())
if err != nil {
log.Error(err, "Couldn't update certificateRequest status", "CertificateRequest", certificateRequest)
return err
}
// Convert certificateRequest.Spec to JSON string
specJSON, err := json.Marshal(newCertReq.Spec)
if err != nil {
log.Error(err, "Failed to marshal certificateRequest.Spec to JSON")
return err
}
// Convert certificateRequest.Annotations to JSON string
annotationsJSON, err := json.Marshal(newCertReq.Annotations)
if err != nil {
log.Error(err, "Failed to marshal certificateRequest.Annotations to JSON")
return err
}
// Build the JSON patch array including both spec and annotations
resourcePatch := []byte(`[
{"op": "replace", "path": "/spec", "value": ` + string(specJSON) + `},
{"op": "replace", "path": "/metadata/annotations", "value": ` + string(annotationsJSON) + `}
]`)
// Try to update the resource using a PATCH API call
if err := r.Patch(ctx, newCertReq.DeepCopy(), client.RawPatch(types.JSONPatchType, resourcePatch)); err != nil {
log.Error(err, "Couldn't patch certificateRequest resource", "CertificateRequest", certificateRequest)
}
time.Sleep(10 * time.Second)
return err
}) |
We have developed an external issuer for cert-manager for managing our custom certificates, following the example provided.
Since upgrading our clusters to kubernetes version 1.25 we noticed that the RetryOnConflict function (in the defer block of the custom certificaterequest_controller, see code below) is triggered and, after 5 (default value) retries, it fails.
In kubernetes 1.24 this behaviour was not present.
We noticed that between one retry and the next the field
managedFields.time
changes, along withmetadata.resourceVersion
, which prevents the RetryOnConflict from working, with the error below.We also noticed a bug fix in kubernetes 1.25 that may be the cause of our issue: ManagedFields time is correctly updated when the value of a managed field is modified.
We would like to understand if the cert-manager is responsible for the frequent modifications on the certificaterequest resource we are trying to manage.
If so, we would like to find a way to prevent the behaviour explained above.
Can you please investigate if this is a bug within the cert-manager?
Thank you.
Steps to reproduce:
Environment details::
/kind bug
The text was updated successfully, but these errors were encountered: