🔗 Return to Table of Contents
for more FAQ topics 🔗
Note, this page was written for the
User-Community Airflow Helm Chart
To use your airflow cluster, you will need to make your DAG definitions (python files) available to airflow.
While there are many ways you can achieve this, we natively support the following methods.
You may store DAG definitions in a git repo and configure the chart to automatically sync a local copy this repo into each airflow Pod at a regular interval.
🟦 Tip 🟦
The content of the git repo will be stored at
{dags.path}/repo/
, by default this will be/opt/airflow/dags/repo/
.
SSH Authentication
The git-sync sidecars can access the git repo using SSH authentication.
For example to sync git@github.com:USERNAME/REPOSITORY.git
using the RSA keys stored in Secret/airflow-ssh-git-secret
:
airflow:
config:
## NOTE: set by `dags.gitSync.syncWait`, unless you override it
#AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: 60
dags:
## NOTE: this is the default value
path: /opt/airflow/dags
gitSync:
enabled: true
## NOTE: some git providers will need an `ssh://` prefix
repo: "git@github.com:USERNAME/REPOSITORY.git"
branch: "master"
revision: "HEAD"
## the sub-path within your repo where dags are located
## NOTE: airflow will only see dags under this path, but the whole repo will still be synced
#repoSubPath: "path/to/dags"
## number of seconds to wait between syncs
## NOTE: also sets `AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL` unless overwritten in `airflow.config`
syncWait: 60
## the max number of seconds allowed for a complete sync
## NOTE: if your repo takes a very long time to sync, you may need to increase this value
#syncTimeout: 120
## the number of consecutive failures allowed before aborting
## NOTE: if your repo regularly has intermittent failures, you may wish to set a non-0 value
#maxFailures: 0
sshSecret: "airflow-ssh-git-secret"
sshSecretKey: "id_rsa"
## NOTE: "known_hosts" verification can be disabled by setting to ""
sshKnownHosts: |-
github.com ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCj7ndNxQowgcQnjshcLrqPEiiphnt+VTTvDP6mHBL9j1aNUkY4Ue1gvwnGLVlOhGeYrnZaMgRK6+PKCUXaDbC7qtbW8gIkhL7aGCsOr/C56SJMy/BCZfxd1nWzAOxSDPgVsmerOBYfNqltV9/hWCqBywINIR+5dIg6JTJ72pcEpEjcYgXkE2YEFXV1JHnsKgbLWNlhScqb2UmyRkQyytRLtL+38TGxkxCflmO+5Z8CSSNY7GidjMIZ7Q4zMjA2n1nGrlTDkzwDCsw+wqFPGQA179cnfGWOWRVruj16z6XyvxvjJwbz0wQZ75XK5tKSb7FNyeIEs4TT4jk+S4dhPeAUC5y+bDYirYgM4GC7uEnztnZyaVWQ7B381AK4Qdrwt51ZqExKbQpTUNn+EjqoTwvqNj4kqx5QUCI0ThS/YkOxJCXmPUWZbhjpCg56i+2aB6CmK2JGhn57K5mj0MNdBXA4/WnwH6XoPWJzK5Nyu2zB3nAZp+S5hpQs+p1vN1/wsjk=
🟦 Tip 🟦
You may create
Secret/airflow-ssh-git-secret
using this command, if you have the SSH private key stored under$HOME/.ssh/id_rsa
:kubectl create secret generic \ airflow-ssh-git-secret \ --from-file=id_rsa=$HOME/.ssh/id_rsa \ --namespace my-airflow-namespace
HTTP Authentication
The git-sync sidecars can access the git repo using HTTP authentication.
For example, to sync https://github.com/USERNAME/REPOSITORY.git
using the HTTP credentials stored in Secret/airflow-http-git-secret
:
airflow:
config:
## NOTE: set by `dags.gitSync.syncWait`, unless you override it
#AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: 60
dags:
## NOTE: this is the default value
path: /opt/airflow/dags
gitSync:
enabled: true
repo: "https://github.com/USERNAME/REPOSITORY.git"
branch: "master"
revision: "HEAD"
## the sub-path within your repo where dags are located
## NOTE: airflow will only see dags under this path, but the whole repo will still be synced
#repoSubPath: "path/to/dags"
## number of seconds to wait between syncs
## NOTE: also sets `AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL` unless overwritten in `airflow.config`
syncWait: 60
## the max number of seconds allowed for a complete sync
## NOTE: if your repo takes a very long time to sync, you may need to increase this value
#syncTimeout: 120
## the number of consecutive failures allowed before aborting
## NOTE: if your repo regularly has intermittent failures, you may wish to set a non-0 value
#maxFailures: 0
httpSecret: "airflow-http-git-secret"
httpSecretUsernameKey: username
httpSecretPasswordKey: password
🟦 Tip 🟦
You may create
Secret/airflow-http-git-secret
using this command, replacingMY_GIT_USERNAME
andMY_GIT_TOKEN
with your HTTP credentials:kubectl create secret generic \ airflow-http-git-secret \ --from-literal=username='MY_GIT_USERNAME' \ --from-literal=password='MY_GIT_TOKEN' \ --namespace my-airflow-namespace
You may use a PersistentVolumeClaim
to share your DAG definitions across the airflow Pods.
🟨 Note 🟨
The
PersistentVolumeClaim
will be empty by default, you must either manually fill it with your DAG files, or configure an external system to automate this process. For example, you may create a CI/CD pipeline on your DAGs repo to update the volume as commits are made.
Chart Managed Volume
The chart can manage the initial creation of a PersistentVolumeClaim for your DAG files.
🟦 Tip 🟦
The name of the
PersistentVolumeClaim
will be your helm release-name with"-dags"
appended.
For example, if you usehelm install my-airflow ...
, the PVC will be calledmy-airflow-dags
.
For example, to have the chart create a PersistentVolumeClaim with the storageClass
called default
and a size
of 1Gi
:
dags:
## NOTE: this is the default value
path: /opt/airflow/dags
persistence:
enabled: true
## NOTE: set `storageClass` to "" for the cluster-default
storageClass: "default"
## NOTE: some types of StorageClass will ignore this request (for example, EFS)
size: 1Gi
## NOTE: as multiple Pods read the DAGs concurrently this MUST be ReadOnlyMany or ReadWriteMany
accessMode: ReadOnlyMany
User Managed Volume
If you wish to take more control of the PersistentVolumeClaim used for your DAG files, you may create a
PersistentVolumeClaim
resource inside your helm install namespace and then tell the chart to use it.
For example, to have the chart use an existing PersistentVolumeClaim called my-dags-pvc
:
dags:
## NOTE: this is the default value
path: /opt/airflow/dags
persistence:
enabled: true
## NOTE: this is name of your existing volume
existingClaim: my-dags-pvc
## NOTE: as multiple Pods read the DAGs concurrently this MUST be ReadOnlyMany or ReadWriteMany
accessMode: ReadOnlyMany
You may embed your DAG files directly into the container image.
🟥 Warning 🟥
This method requires all airflow containers to restart after each update to your DAG files.
All airflow Pods will run the same image, and have the latest DAG definitions.
Example
This chart uses the official apache/airflow
Docker images.
Here is a Dockerfile that extends apache/airflow:2.8.4-python3.9
by placing DAG files into /opt/airflow/dags
:
FROM apache/airflow:2.8.4-python3.9
## copy the content of local folder `./my_dag_folder` into container folder `/opt/airflow/dags`
COPY ./my_dag_folder /opt/airflow/dags
You might then build and tag this Dockerfile as MY_REPO:MY_TAG
.
The following values tell the chart to use the MY_REPO:MY_TAG
container image:
airflow:
image:
repository: MY_REPO
tag: MY_TAG
## WARNING: even if set to "Always" DO NOT reuse tag names,
## containers only pull the latest image when restarting
pullPolicy: IfNotPresent
🟥 Warning 🟥
Ensure that you NEVER REUSE an image tag name.
This ensures that whenever you updateairflow.image.tag
, all airflow pods will restart and have the same DAGs.
For example, you may append a version or git hash corresponding to your DAGs:
MY_REPO:MY_TAG-v1
,MY_REPO:MY_TAG-v2
,MY_REPO:MY_TAG-v3
MY_REPO:MY_TAG-0.1.0
,MY_REPO:MY_TAG-0.1.1
,MY_REPO:MY_TAG-0.1.3
MY_REPO:MY_TAG-a1a1a1a
,MY_REPO:MY_TAG-a2a2a3a
,MY_REPO:MY_TAG-a3a3a3a