New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Venafi issuer 10s timeout on certificate issuance. #5108
Comments
From the cert-manager bi-weekly community meeting, @SgtCoDFish discussed his proposal around timeouts, which can be read here or in #5214 The summary / scope that I read, is that there is some impact / crossover around a certificate issuing timeout, but the Venafi timeout in particular will not be part of the scope of that change, due to back porting complexity. @SgtCoDFish is that summary correct? If so I could make a separate PR / change around Venafi specifically. In more detail, the general theme is to up any existing timeout to a suitably large value based on settings from other k8s controller research. This currently is preferably to adding new configuration options for timeout settings. |
That's a fair summary @hawksight yeah - I'd appreciate + review a PR doing this for the Venafi issuer. I think we should be able to backport that change as well. I'd rather keep both ACME and Venafi in separate PRs so that if there's a problem with backporting one it won't affect the other. |
For anyone else coming across this, the tl;dr is: Fixed by upping from 10s to 60s timeout on retrieving certificates. |
Is your feature request related to a problem? Please describe.
The problem is sporadic Timeout Errors with a Venafi TPP issuer. There is a hard coded timeout of 10s in the Venafi TPP isser when retrieving a certificate from TPP.
https://github.com/cert-manager/cert-manager/blob/master/pkg/issuer/venafi/client/request.go#L55-L62
Often certifcates can be ready only shortly after the 10s timeout.
A similar issue has been seen there with the Issuer setup:
Describe the solution you'd like
Allow a configurable timeout value for certificate issuance, particularly for the Venafi Issuer. As TPP it not a CA itself, there is an additional call to the CA to account for.
There can also be multiple custom workflows or adapatable scripts which esentially allow you to do anything in powershell.
These for example could include calls out to other services before or during certificate generation. A common example might be to approve or deny a certificate based on some metadata.
The more complex the adapatable workflow or slower the CA is to respond, the more likely it is that a given
CertificateRequest
will hit a timeout. The result it waiting until the next retry. The problem is worse when scaled as there will be many logs and redundant certificate requests.Often teams using cert-manager have little or no control over the TPP service or the CAs used, so improving response times is not something they can achieve. This solution would give them a way to tweak cert-manager to their environment.
Describe alternatives you've considered
Perhaps up the default hard coded timeout from 10s to 20 or 30s as a simple solution.
Could be some consequences to doing this though?
Additional context
Example of Timeout errors seen:
There is also potential that the actual Timeout error often seen could be a misrepresentation and obstructing a useful error message. Referencing this issue on the
vcert
docs:I've seen similar timeout errors on a variety of API endpoints as listed:
But I can't find any hard coded timeouts specifically that would cause these, other than maybe the defaults for an api request to TPP via
vcert
:https://github.com/Venafi/vcert/blob/master/pkg/venafi/tpp/tpp.go#L441-L455
So for this issue, mainly just to tackle the know 10s timeout.
/kind feature
The text was updated successfully, but these errors were encountered: