Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add airflow.kubernetesPodTemplate.shareProcessNamespace value #408

Merged
merged 1 commit into from
Apr 1, 2022

Conversation

mthoretton
Copy link
Contributor

@mthoretton mthoretton commented Aug 25, 2021

What issues does your PR fix?

I was looking for a nice way to deal with secrets in airflow. As we already use Hashicorp Vault everywhere and are very happy with it I was looking for a good way to make airflow work with it.

There is a Vault secret backend in Airflow but it's very basic. The feature I mostly want to use is to get dynamic credentials for accessing databases, with credentials that will live the time of the pod (dag). For the rest, airflow vars and connections are fine.

I tried to use Vault CSI but it's still early stage and also does not fulfill what I want to do. In the end, I just did what I do in other projects: using pod annotations to let vault inject a sidecar in the pod that will inject credentials and revokes them when the pod dies.

Unfortunately, this vault sidecar is meant to always run (running along with a web server or so) and does not work with k8s jobs would end. In order to make it work, I have to add yet another sidecar that will detect when the dag is done then send a signal to kill the vault container. Otherwise, the Pod containing the dag will move to an "unready" state forever as the vault container never dies. relevant issue in vault project

I know this is quite a niche PR and quite hacky so you'll decide if it's worth merging. Maybe this PR could still help some people trying to do the same. I guess a proper way could be to change the airflow controller to kill the pod when it detects the Dag is done 馃し

Example using new airflow.kubernetesPodTemplate.shareProcessNamespace value:

airflow:
  [...]
  kubernetesPodTemplate:
    [...]
      # new value from this PR
      shareProcessNamespace: true

      # sidecar Vault container to inject secrets
      # (uses the shared process namepsace to kill itself when the task finishes)
      extraContainers:
        - name: vault-killer
          image: ubuntu
          command:
            - /bin/bash
            - '-c'
            - 'sleep 5 && while [[ $(pidof dumb-init) ]]; do sleep 5; done; kill -SIGTERM $(pidof vault) || echo "Vault is already gone."'
          resources:
            requests:
              cpu: 30m
              memory: 50Mi
            limits:
              cpu: 100m
              memory: 100Mi

What does your PR do?

  • Adds a airflow.kubernetesPodTemplate.shareProcessNamespace value to allow enabling the shareProcessNamespace feature on the KubernetesExecutor pod_template.

Checklist

For all Pull Requests

Copy link
Member

@thesuperzapper thesuperzapper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mthoretton thanks for the PR, but I think this should be scoped down to just adding the airflow.kubernetesPodTemplate.shareProcessNamespace value, as PR #456 already adds the airflow.kubernetesPodTemplate.extraContainers value.

charts/airflow/CHANGELOG.md Outdated Show resolved Hide resolved
charts/airflow/files/pod_template.kubernetes-helm-yaml Outdated Show resolved Hide resolved
charts/airflow/values.yaml Outdated Show resolved Hide resolved
charts/airflow/values.yaml Outdated Show resolved Hide resolved
charts/airflow/files/pod_template.kubernetes-helm-yaml Outdated Show resolved Hide resolved
@thesuperzapper thesuperzapper changed the title Add config to kubernetesPodTemplate - Hashicorp Vault feat: add airflow.kubernetesPodTemplate.shareProcessNamespace value Dec 15, 2021
@thesuperzapper thesuperzapper moved this from PR | Waiting For Review to PR | Waiting for Changes in Issue Triage and PR Tracking Dec 15, 2021
@stale stale bot added the lifecycle/stale lifecycle - this is stale label Feb 13, 2022
@tarekabouzeid
Copy link

Hi,

is there any plans for merging this PR soon ?

Best Regards,

@stale stale bot removed the lifecycle/stale lifecycle - this is stale label Feb 18, 2022
@thesuperzapper
Copy link
Member

@tarekabouzeid for this PR to be merged:

  1. @mthoretton need to update it with the requested changes (see review comments)
  2. I need to understand why the ability to set shareProcessNamespace is important, and be confident that this is a use-case we want to enable.

@tarekabouzeid
Copy link

@thesuperzapper We have a use case where we are using a sidecar container running in same pods as webserver and scheduler, this container is mounting dags folder from S3 bucket. We are having lifecycle hooks for the sidecar to terminate when the airflow process is terminated so the entire pod is terminated as well. We worked around that for now and its working fine, but it would be much better if we are able to see airflow process from the sidecar.

@thesuperzapper
Copy link
Member

@thesuperzapper We have a use case where we are using a sidecar container running in same pods as webserver and scheduler, this container is mounting dags folder from S3 bucket. We are having lifecycle hooks for the sidecar to terminate when the airflow process is terminated so the entire pod is terminated as well. We worked around that for now and its working fine, but it would be much better if we are able to see airflow process from the sidecar.

@tarekabouzeid I don't understand in what situation you would NOT want the containers to be restarted if they exit before they are told to by Kubernetes. When would Kubernetes itself not be the thing terminating the Pod (and thus killing all the containers in it)?

Also, an s3-sync sidecar is just like our git-sync sidecar, and that runs just fine without knowing anything about the state of the airflow container.

PS: I actually want to implement a native s3-sync sidecar in the chart, see the issue #249 if you want to help out.

@mthoretton
Copy link
Contributor Author

Sorry for the delay, I think I applied all your comments 馃憣

Signed-off-by: Mathieu Thoretton <mthoretton@gmail.com>
@thesuperzapper thesuperzapper moved this from PR | Waiting for Changes to PR | Ready to Merge in Issue Triage and PR Tracking Mar 21, 2022
Copy link
Member

@thesuperzapper thesuperzapper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mthoretton I rebased and did some small cleanup, but it should be ready to merge now

@thesuperzapper thesuperzapper added the status/ready-to-merge status - this will be merged into next release label Mar 22, 2022
@airflow-helm airflow-helm deleted a comment from stale bot Apr 1, 2022
@thesuperzapper thesuperzapper merged commit 7172e40 into airflow-helm:main Apr 1, 2022
Issue Triage and PR Tracking automation moved this from PR | Ready to Merge to Done Apr 1, 2022
@tarekabouzeid
Copy link

@thesuperzapper We have a use case where we are using a sidecar container running in same pods as webserver and scheduler, this container is mounting dags folder from S3 bucket. We are having lifecycle hooks for the sidecar to terminate when the airflow process is terminated so the entire pod is terminated as well. We worked around that for now and its working fine, but it would be much better if we are able to see airflow process from the sidecar.

@tarekabouzeid I don't understand in what situation you would NOT want the containers to be restarted if they exit before they are told to by Kubernetes. When would Kubernetes itself not be the thing terminating the Pod (and thus killing all the containers in it)?

Also, an s3-sync sidecar is just like our git-sync sidecar, and that runs just fine without knowing anything about the state of the airflow container.

PS: I actually want to implement a native s3-sync sidecar in the chart, see the issue #249 if you want to help out.

Hi @thesuperzapper , thanks for merging this, we used rclone mount as a side car and mount it inside the webserver and scheduler dags directory. we noticed in case of webserver or scheduler failure the rclone container will keep running and pod wont be terminated. So we added a liveness probe for the rclone container to keep checking if the airflow process in the webserver or scheduler is still running , for that we wanted to have shareProcessNamespace.
Regarding mounting dags from s3 or gcs , rclone solution works fine for now, i checked your comment also about using FUSE , i will look into that.

Best Regards,

@thesuperzapper
Copy link
Member

@tarekabouzeid I really want to implement built-in support for syncing DAG definitions from S3 and GCS buckets (just like we have for git-sync), if you want to help with this, here are the relevant issues:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/ready-to-merge status - this will be merged into next release
Development

Successfully merging this pull request may close these issues.

None yet

3 participants