-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Do recreate runner pod earlier on registration token update #1087
Conversation
one thing I noticed here @mumoshu - is that the updated registration token....is it going to trigger a restart in the middle of a pod running? If so, we just have another potential active runner deletion. |
@jbkc85 Yes, you're absolutely right! To be clear, as ARC has no way to atomically delete the runner if and only if the runner isn't busy, This fixes the issue that runner takes too much time to restart even after the token is already expired. |
Got it! That makes perfect sense. |
@mumoshu - I am not sure this is related but.... With Ephemeral containers, there is a race condition possibility here - in which a |
@jbkc85 Good catch! Yeah we'd definitely better to take |
@jbkc85 Ah sorry I'm pretty confused. From my observation, the "runner" container for an ephemeral runner does exit with code 0 (without restarting by itself) when it finishes running a job. We should definitely recreate the pod in that case. Otherwise, the pod gets stuck forever and you'll end up with zero ephemeral runner pods actually running the runner containers, right? Assuming that's right, the assumption for the fix I made in #1085 turns false. We do need to recreate the ephemeral runner pod in the said condition, so the change made in #1085 is useful. Instead, we'd introduce the grace period between the runner un-registration and pod-deletion, as we discussed, and that will fix it. Could you confirm? |
To be clear, this turned out to be my mistake. |
Apparently, we've been missed taking an updated registration token into account when generating the pod template hash which is used to detect if the runner pod needs to be recreated.
This shouldn't have been the end of the world since the runner pod is recreated on the next reconciliation loop anyway, but this change will make the pod recreation happen one reconciliation loop earlier so that you're less likely to get runner pods with outdated refresh tokens.
Ref #1085 (comment)