New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare to Apache Airflow #849

Open
elgalu opened this Issue May 2, 2018 · 5 comments

Comments

Projects
None yet
4 participants
@elgalu
Copy link

elgalu commented May 2, 2018

Forgive my ignorance, but could you summarise how this project compares with Apache Airflow?

@divideby0

This comment has been minimized.

Copy link

divideby0 commented May 4, 2018

I'm relatively new to this tool myself, but some initial observations based on my experience:

Similarities

  • Both tools are designed for batch workflows involving a Directed Acyclic Graph (DAG) of steps
  • Both tools provide flow control for error handling and conditional logic based on the output of upstream steps
  • Both tools are open sourced with the Apache License and actively maintained by a community of contributors

Differences

  • Airflow seems tightly coupled to the Python ecosystem, while Argo provides flexibility to schedule steps in heterogeneous runtimes (anything that can run in a container)
  • Argo natively schedules steps to run in a Kubernetes cluster, potentially across several hosts. While Airflow has a contributed feature to schedule jobs on Mesos clusters, there's possibly less integration there (I haven't tried this)
  • Airflow DAGs are expressed in a Python-based DSL, while Argo DAGs are expressed in a YAML syntax that is understood natively by and deployable directly to Kubernetes

Edit: fixed a typo

@divideby0

This comment has been minimized.

Copy link

divideby0 commented May 4, 2018

Also, to unpack the "heterogenous runtimes" piece a bit further, Airflow has a huge list of "Operators" with support for other runtimes like Bash, Spark, Hive, etc. But the business logic for Operators themselves are all written in Python:

https://airflow.apache.org/code.html?highlight=operators

And many of them may have some environmental dependencies that you'll need to configure outside Airflow's setup to get working.

They do seem to have a DockerOperator, which probably provides many of the same facilities as Argo for scheduling docker executions on a single host, but I'm not certain it comes with all the same facilities that Kubernetes offers for managing and scheduling containerized workloads (e.g. pod abstractions, config maps, secrets management, centralized logging, host node selectors, affinity and anti-affinity, etc.).

@jessesuen

This comment has been minimized.

Copy link
Contributor

jessesuen commented May 7, 2018

Thanks @divideby0, I think you described the differences better than what I could have, given my limited knowledge of Airflow. Some additional points:

Airflow natively schedules steps to run in a Kubernetes cluster, potentially across several hosts

I would highlight this as a similarity. Argo only works in the context kubernetes where each step is a kubernetes pod. Thus it integrates very deeply into a kubernetes environment, utilizing nearly all of the features in a k8s pod spec (e.g. secrets/configmap mounts, volumes, resource limits, pod affinity, etc...). Scheduling of pods is deferred to kubernetes, and will run on whatever host k8s decides to schedule the pod (obeying any affinity rules set in the step).

I should point out that we created an example of an argo workflow which actually utilizes Airflow operators, since we understand the desire to leverage the huge library of Airflow operators that have been built up over time, but done in a more k8s centric way:

https://github.com/argoproj/data-pipeline

@divideby0

This comment has been minimized.

Copy link

divideby0 commented May 9, 2018

Thanks Jesse! To clarify, I meant to say Argo natively schedules steps to run on a Kubernetes cluster. I don't believe standalone Airflow has native Kubernetes support yet. That was a typo on my part.

@vimox-shah

This comment has been minimized.

Copy link

vimox-shah commented Jan 28, 2019

what is the learning curve for Argo compare to airflow? and as a beginner what challenges we might face?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment