Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto-idler: cli task run check does not take post-rollout tasks in consideration #769

Closed
Schnitzel opened this issue Nov 15, 2018 · 2 comments · Fixed by #1039
Closed

auto-idler: cli task run check does not take post-rollout tasks in consideration #769

Schnitzel opened this issue Nov 15, 2018 · 2 comments · Fixed by #1039

Comments

@Schnitzel
Copy link
Contributor

our autoidler checks if there are any commands running inside the cli pod. It does that with only showing commands that have a tty attached:
https://github.com/amazeeio/lagoon/blob/3e551c0b8d34bf49abec6ba9ca23b5243ec988dc/services/auto-idler/idle-clis.sh#L78
the idea was that the tasks running from the entrypoints (like the sleep, the cronjob commands, etc) are excluded from the check as they are running without tty and they are running all the time and would prevent the cli from being idled.
this all works, but unfortunately the post-rollout tasks during a lagoon deployment are also running without tty so they are not taken into consideration and it is possible that while a post-rollout task is running the cli auto-idler deletes the pod right away.

Two ideas:

  1. try to force the post-rollout tasks also to be running in a tty (tried already, failed so far)
  2. teach the auto idler somehow that it should not idle while a deployment is running (probably easier)
@dasrecht
Copy link
Contributor

This issue happens regularly and causes builds to fail with error 137

@shreddedbacon
Copy link
Member

I've been looking at this recently, and it is possible to check if there are any processes outside of the entrypoints using pgrep -P 0 | tail -n +3 | wc -l | tr -d ' ' instead of ps --no-headers a | wc -l | tr -d ' '.

pgrep -P 0 finds any processes that aren't running off PID1
As oc rsh is used to access the pod as part of this check, it lists that as a process and also lists pid1 (tini), and so tail -n +3 is used to ignore those processes.
This results in any other process counts being returned.

Through testing though, I noticed that some post-rollout tasks are faster than others, and if the auto-idler hits the cli pod as it is between post-rollout tasks it will still idle the pod and cause the build to fail.

I did also test to see if checking if there are any builds in the running state for a given environment using oc -n "$NAMESPACE" get --no-headers=true builds | grep "Running", but if someone is running a task in the cli pod outside of the build, this won't get be picked up and the pod will be idled.

Would it be good to check both and where there is no running builds, and no running processes, then idle the pod?

I've done some testing of doing both checks in the auto-idler and can submit a PR for this if it is worth pursuing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants