Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm chart tries to patch immutable Job resources on helm upgrade #27561

Closed
2 tasks done
calebwoofenden opened this issue Nov 8, 2022 · 9 comments
Closed
2 tasks done
Labels
area:core kind:bug This is a clearly a bug

Comments

@calebwoofenden
Copy link

calebwoofenden commented Nov 8, 2022

Apache Airflow version

2.4.2

What happened

Running helm upgrade with helm hooks disabled in a namespace that already has the chart installed will fail because it's trying to patch a Job resource, which is immutable:

cannot patch "<release_name>-run-airflow-migrations" with kind Job: Job.batch "<release_name>-run-airflow-migrations" is invalid: spec.template: Invalid value: core.PodTemplateSpec {<...entire json spec of Job resource...>}: field is immutable

This happens when helm hooks are disabled on the db migrations job:

  migrateDatabaseJob:
    useHelmHooks: false

For more detail on why a user would want to do this, see #11979

Copying this from my comment on that issue:

Kube does offer a parameter called .spec.ttlSecondsAfterFinished that, when specified, will delete the Job after the specified number of seconds after completion (see https://kubernetes.io/docs/concepts/workloads/controllers/ttlafterfinished/). This is a pretty good solution, though installs done in a period of time shorter than this value might still run into the same problem.

The solution I propose here is to set a helm pre-install hook on the database deployment as well as the db migrations, and set the hook weight on the database lower than that of the migrations, so the database will be created first, then the migrations will run, then the webserver and etc. pods will come up. Then I think we could safely remove the wait-for-db-migrations initContainer on the deployments.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@sunghospark-calm
Copy link

Running into the same problem here.
Had to disable the hooks because migration was not running properly during upgrade, without hooks the followup deployment fails because of the job spec conflict.

@mconigliaro
Copy link

I think #27148 might be the fix for this. It adds createUserJob.applyCustomEnv and migrateDatabaseJob.applyCustomEnv options which we should set to false. We're just waiting on a new release of the helm chart.

Related: #21943

@potiuk
Copy link
Member

potiuk commented Dec 19, 2022

Marking as fixed then.

@potiuk potiuk closed this as completed Dec 19, 2022
@alexandermalyga
Copy link
Contributor

alexandermalyga commented Feb 1, 2023

Just tried applyCustomEnv: false with 1.8.0rc1 and I'm still getting cannot patch "run-airflow-migrations" with kind Job ... field is immutable, so #27148 does not fix this issue.
EDIT:
After manually deleting the old job (deployed without applyCustomEnv: false) and redeploying, it not longer raises the error, so the fix does work.

@alexandermalyga
Copy link
Contributor

Just to clarify, #27148 can indeed fix this issue, but it comes with the effect of removing env vars secrets from the migration job, which the job may use to connect to the database. #29314 should be a better fix for this issue.

@eladkal
Copy link
Contributor

eladkal commented Feb 22, 2023

@alexandermalyga with PR merged this issue is now resolved right?

@alexandermalyga
Copy link
Contributor

@alexandermalyga with PR merged this issue is now resolved right?

Yes, it is fixed, but we discussed a follow-up improvement: #29314 (review)
We agreed with @potiuk that it would be done in a separate PR, but I don't have much time to do it right now.

@eladkal
Copy link
Contributor

eladkal commented Feb 22, 2023

OK so I'm closing this issue as resolved. Please open new issue dedicated to the followup task

@Throne3d
Copy link

Now ttlSecondsAfterFinished has been added, is the recommended approach still as documented? Reading the thread, it sounds like instead of (as in the documentation):

createUserJob:
  useHelmHooks: false
  applyCustomEnv: false
migrateDatabaseJob:
  useHelmHooks: false
  applyCustomEnv: false

the recommendation is instead again:

createUserJob:
  useHelmHooks: false
migrateDatabaseJob:
  useHelmHooks: false

making use of the implicit ttlSecondsAfterFinished to drop the jobs between deploys before the next patch comes along and tries to mutate immutable fields.

Should I create a follow-up to update the documentation with this recommendation, and a caveat that deploys < ttlSecondsAfterFinished apart may still trigger the underlying Job mutation issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants