feat(helm): add api-server rollout restart cronjob#67569
Conversation
4e4c5a6 to
7e10d46
Compare
jscheffl
left a comment
There was a problem hiding this comment.
We had similar PR recently. I did not like it.
If a long running process has a problem then there should be other means. A rolling restart is for me only a workaround and should not be/get a permanent feature in Helm chart therefore.
I have marked this PR as a draft for now while I look into how we can better support/document the Gunicorn rolling restarts within the Helm chart instead of forcing a full pod rollout I will update the PR if there's a better native Helm integration we can add. |
+1. Handling of worker restarts should be implemented in the API server itself if it is needed. |
Agree a rolling restart is a workaround and should not be a permanent chart feature. This PR adds an opt-in CronJob disabled by default so teams can use it as a short term mitigation. |
I would be really against adding workaround features to the chart. We have, e.g., PostgreSQL within the chart currently, which was meant only for development purposes, and there are teams which are using it for production. I believe that we should not encourage users to use this particular workaround by implementing it and making it easy to use. If there is a team which will need to do it, they could just create the CronJob definition and apply it to the Kubernetes cluster. |
Then - if really somebody needs it - would propose to add this being a Kustomize Layer example we can add to the repo with some docs how to apply but not adding this to main chart. |
This PR adds Helm chart support for periodic API server rollout restarts on Kubernetes.
Problem:
Long-running
uvicornprocesses in the API server can accumulate stale resources. Kubernetes provideskubectl rollout restartas a native mechanism, but the Airflow Helm chart has no built in support for scheduling this.Fix:
Adds a new
apiServer.rolloutRestartconfiguration section with aCronJob,ServiceAccount,Role, andRoleBindingfollowing the exact same pattern as the existingdatabaseCleanupCronJob.closes: #61432
Was generative AI tooling used to co-author this PR?