Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Airflow DAGs not refreshed with pullPolicy set to Always with the same container tag #23895

Closed
1 of 2 tasks
jerome-aosis opened this issue May 24, 2022 · 5 comments
Closed
1 of 2 tasks
Labels
area:helm-chart Airflow Helm Chart invalid kind:bug This is a clearly a bug

Comments

@jerome-aosis
Copy link

Official Helm Chart version

1.6.0 (latest released)

Apache Airflow version

2.3.0 (latest released)

Kubernetes Version

1.20.11

Helm Chart configuration

"helm" upgrade --install "airflow" "path/to/Values.yaml" -n "my-namespace" \
  --set "images.airflow.repository=myregistry/airflow" \
  --set "images.airflow.pullPolicy=Always" \
  --set "images.airflow.tag=v1.8.0" 

Docker Image customisations

My DAGs image is just an hello world DAGs:

from datetime import datetime

from airflow import DAG
from airflow.operators.python import PythonOperator


def print_hello(): return 'Hello world from first Airflow DAG!'


dag = DAG('hello_world', description='Hello World DAG',
          schedule_interval='0 * * * *',
          start_date=datetime(2017, 3, 20), catchup=False)

hello_py_operator = PythonOperator(task_id='hello_task', python_callable=print_hello, dag=dag)

hello_py_operator

The Dockerfile just copy the python file in the default /opt/airflow/dags directory in the worker container.

What happened

  • First time I launch the installation, the DAG is correctly updated in the worker container and the Airflow dashboard show up the DAF ==> OK

  • If I update the DAG by changing its name or description and push it to the tag (v1.8.0 in this case)

dag = DAG('hello_world2', description='Hello World2 DAG',
          schedule_interval='0 * * * *',
          start_date=datetime(2017, 3, 20), catchup=False)
  • I relaunch the helm command and the old DAG is still here without the updated name/description. ==> KO

In the Kubernetes event, I can see that the image is pulled as intended

Successfully pulled image "myregistry/airflow:v1.8.0" in 1.882970769s

From here, the only way to refresh my updated DAG is to

  • build a new container with a new tag (or use the SHA)
  • OR kill workers & scheduler pods, to force de DAG to be refreshed.

What you think should happen instead

DAG should be refreshed only using helm upgrade --install without the need to kill worker/scheduler pods

How to reproduce

  • Use the official helm Chart Airflow apache/airflow:2.3.0
  • build a container with DAGs inside with the tag v1.0.0
  • run the helm install with the container and using pullPolicy to Always
  • Change the dag a little
  • Build the new container with the same tag
  • run the helm upgrade command using pullPolicy to Always

Anything else

This issue occurs everytime

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@jerome-aosis jerome-aosis added area:helm-chart Airflow Helm Chart kind:bug This is a clearly a bug labels May 24, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented May 24, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@potiuk
Copy link
Member

potiuk commented May 31, 2022

This is not how Kubernetes/Helm chart works, from the point of view of the definition of the pods have not changed, so it will not update them - hence they will notbe restarted. Your expectations for "Always" is far beyond its meaning. "Always" for container/pod means "always when container is started" not "always when helm is updated".

If you want to keep same image name for your dag updates, you will have to include manual restart whenever you updated your dags. However, this is absolutely terrible in terms of your traceabiliity of what is going on, because (like in this case) you can never be sure if your image used actually contains newer or older version of your dags. This is very wrong - because there will be more reasons why the image is not updated (for example if your registry is not reachable K8S will not pull the newer image). It's not good for poduction setup. Also if you update the image and one of your pods gets restarted, it will pull the newer image on its own - without you touching the helm and then you will end up with different DAGs in different pods. That's pretty terrible to debug and manage.

The proper solution is to build-in tagging/versioning your images in your image build pipeline - rather than using same tag, you need to update the tag every time when you update DAGs in it (for example by adding date/time or increasing version number) and update it in the tag used in Helm (via values file or env var or flag - depending how you deploy it). That completely solves the problem, makes sure that all your components use the same image and make sure all components get restarted when you deploy a new version

@potiuk potiuk closed this as completed May 31, 2022
@potiuk potiuk added the invalid label May 31, 2022
@jerome-aosis
Copy link
Author

Thanks for your answer.
I use the same dag tag for learning purpose and it's not for production environment.

I agree with you, pullPolicy is not working like that in Kubernetes. But in that case, it seems it is a documentation issue. The documentation says

If you are deploying an image with a constant tag, you need to make sure that the image is pulled every time.
helm upgrade --install airflow apache-airflow/airflow \
  --set images.airflow.repository=my-company/airflow \
  --set images.airflow.tag=8a0da78 \
  --set images.airflow.pullPolicy=Always

Documentation seems to say that by adding --set images.airflow.pullPolicy=Always the DAG with a constant tag will be refreshed. This is not the case :-)

@potiuk
Copy link
Member

potiuk commented Jun 4, 2022

Feel free to make PR fixing the docs then! Airflow is a free software that has more than 2000 contributors like you and most of the docs there were submitted by people like you who wanted to improve others experience. There is no "someone" who will do it - if you want to improve docs - just do it. Bottom right of the documentaiton page yuou will find a "suggest improvement on that page" button. Click it and you will open a PR where you will be able to improve the docs using Github UI - as easily as writing this issue. Can I count on your help there @jerome-aosis ?

The community counts on people like you to be able to help with that. Will you help ?

@jerome-aosis
Copy link
Author

jerome-aosis commented Jun 21, 2022

Obviously, I will. I was not counting on "someone". 😄
Thanks for the process to improve the documentation.

PR: #24576

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:helm-chart Airflow Helm Chart invalid kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

2 participants