-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WMAgent: New renew-proxy method left with 0 days window for proxy validity #11985
Comments
While working on this solution, I also noticed the cronjobs are even missing within the current containers:
And I think it has happened when moved to dynamically setting the user at runtime and we stopped exporting the
How did it go unnoticed this is a different story. I am fixing this within the very same PR and not creating yet another bug issue just for this. FYI @amaltaro |
Ok, for the last comment I was partially correct, and partially wrong. The reason for the missing cronjobs is indeed the not exported
Which means, if
(which is exactly what I was doing) then the cronjobs are not populated. The immediate workarround would be to restart the container and let it populate them properly. And the result is the correct set of cronjobs at the end:
|
@todor-ivanov I don't think this is the actual expected behavior. We cannot let proxies get so close to their expiration, otherwise any pilot/job that runs beyond the standard time can have issues (in addition to potential issues renewing this proxy with the voms server). The currect setup https://github.com/dmwm/WMCore/blob/master/deploy/renew_proxy.sh renews the proxy for 7 days and it happens every 12h. Please keep this behavior. |
@amaltaro We do not change the length of the proxy. It stays 168 hours. What we change here is the time window at which we deem the proxy close it it's end of lifetime and we must start renewing it. What currently happens is:
Last time you made exactly the same comment and I did not object it, even though I was foreseeing such behavior already, but was not yet sure. Currently we already see it. the proxy is constantly renewed. BTW in the PR there are some more fixes |
I don't think we need to check for proxy length, as this is already monitored by AgentStatusWatcher component. |
hi @amaltaro , but is What I did with my latest commit: dmwm/CMSKubernetes@01aade3 is:
I am also fixing yet another issue, we have found with you while discussing the current one.
Unfortunately we later blindly wipe them out with the execution of the following command:
So I had to preserve the already existing cronjobs before adding the new ones. |
Yes, AgentStatusWatcher is only monitoring and firing up alerts whenever it is needed. Our current setup of renewing the proxy every 12h is in place for almost 10 years and we never had any issue with that. I keep my position saying that:
|
Impact of the bug
WMAgent
Describe the bug
While working on the validation of the new deployment methods: #11945 people started reporting quite high frequency of
Proxy-expiration
errors from central services for their agents. People also report some minor issues withmanage renew-proxy
command - like Permission denied for renewing an expired proxy etc.:This was a long suspected issue. While creating the function we have put a myproxyMinLifetime calculated in days as
7*24*60*60
:https://github.com/dmwm/CMSKubernetes/blob/51d74e64b1a2c8731a6d5411751437776de98356/docker/pypi/wmagent/bin/manage-common.sh#L316
which is exactly equal to the the
myproxyLifetime
as issued few lines above: https://github.com/dmwm/CMSKubernetes/blob/51d74e64b1a2c8731a6d5411751437776de98356/docker/pypi/wmagent/bin/manage-common.sh#L284This leads to 0 days window for deeming the proxy as valid before renewing. This means we are about to renew those proxy certificates with the rate the cronjob for this is executed inside the container.
And the (permission dined issue is due to the extra restrictive file permissions we set for this proxy (i.e. 400 instead of 600) here:
https://github.com/dmwm/CMSKubernetes/blob/51d74e64b1a2c8731a6d5411751437776de98356/docker/pypi/wmagent/bin/manage-common.sh#L320
How to reproduce it
Try the following set of commands in an agent (being it docker or virtual env):
You will see that the proxy renewal is retried every time.
Expected behavior
To renew proxy only 48 hours before its expiration
Additional context and error message
The issue: #11945 depends on the resolution of the current one, which makes it a chained dependency for the WMAgent new deployment model meta-issue: #11314
The text was updated successfully, but these errors were encountered: