Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Compare to Apache Airflow #849
I'm relatively new to this tool myself, but some initial observations based on my experience:
Edit: fixed a typo
Also, to unpack the "heterogenous runtimes" piece a bit further, Airflow has a huge list of "Operators" with support for other runtimes like Bash, Spark, Hive, etc. But the business logic for Operators themselves are all written in Python:
And many of them may have some environmental dependencies that you'll need to configure outside Airflow's setup to get working.
They do seem to have a DockerOperator, which probably provides many of the same facilities as Argo for scheduling docker executions on a single host, but I'm not certain it comes with all the same facilities that Kubernetes offers for managing and scheduling containerized workloads (e.g. pod abstractions, config maps, secrets management, centralized logging, host node selectors, affinity and anti-affinity, etc.).
Thanks @divideby0, I think you described the differences better than what I could have, given my limited knowledge of Airflow. Some additional points:
I would highlight this as a similarity. Argo only works in the context kubernetes where each step is a kubernetes pod. Thus it integrates very deeply into a kubernetes environment, utilizing nearly all of the features in a k8s pod spec (e.g. secrets/configmap mounts, volumes, resource limits, pod affinity, etc...). Scheduling of pods is deferred to kubernetes, and will run on whatever host k8s decides to schedule the pod (obeying any affinity rules set in the step).
I should point out that we created an example of an argo workflow which actually utilizes Airflow operators, since we understand the desire to leverage the huge library of Airflow operators that have been built up over time, but done in a more k8s centric way: