-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UX Design: Deploying Workflows #236
Comments
My current plan is to have a cli command that supports creating an airflow dag, which will create a new code file (note that the abstraction is mediated through code) that will load the artifact's graph and call the Executor on it. Alternatively, we can just churn out the raw code and wrap that in a function. The latter will be directly transparent to the user, but the former will allow us to do execution optimizations as well as making it easier to setup the environments automatically. The other thing that's weird right now is that Linea also stores the results, where as with vanilla Airflows, users have to save the values themselves---we probably should think through the UX there. I'm going to have a config mode to support both and see how it goes. This line of thinking also starts to sketch out an interface for Linea as a set of execution and value APIs. |
Some open questions (an on-going list): UX
Implementation
|
Here is another reference that Daniel L found https://stackoverflow.com/questions/51573768/how-to-run-jupyter-notebook-in-airflow that talks more about notebooks on airflow. Another good reference here: https://hex.tech/blog/hex-two-point-oh |
@marov here is a discussion on the pipeline automation work that we would love to get your help on! (The previous comment was in the wrong issue, sorry!) |
Took some notes from what @marov has shared
Our proposed initial plan of attack:
|
Per our discussions this morning, we would produce the sliced code, and possibly the sliced code translated into an Airflow DAG as the output program, not one where we call the lineapy executor to run the lineapy DAG. In this mode, optimizations involving reusing cached results would result in altered output programs, e.g., for a simple pipeline a = foo()
b = bar(a)
c = b.func() reusing the result for b = load_pickle(/path/to/where/we/saved/the/pickle/file)
c = b.func() Note that instead of a custom lineapy.load function, we're calling a standard data loading function in Python. |
We should write a more explicit user story here |
Some new questions surfaced from the PR #320. They all fall under a more robust templating mechanism (and by implication, some abstraction for how we think about deployment). Right now we are hard coding quite a few important things.
|
As I was creating my own Airflow dags, I realized that there are a few heuristics that we can use:
Will note down more things to consider as they surface. |
I use |
I ran across another talk https://youtube.com/watch?v=ja2siGyklq0&list=PLGudixcDaxY1noceCfAKU-kfpYIOgbC84&index=9 that uses "Dataclasses as Pipeline Definitions in Airflow". It's pretty cool and might be a more elegant design than the dag factory one. |
High-level goal: For any code that's executed on Linea. We want to ensure that it can be ran on an Airflow connected EC2 instance as well.
This means that we need to address a few reproduceability and networking challenges, all of which may have existing solutions via Airflow:
Here are some notes on Airflow, and maybe @dorx can comment:
Here is the architecture: https://airflow.apache.org/docs/apache-airflow/2.0.0/start.html#basic-airflow-architecture
![image](https://user-images.githubusercontent.com/504219/135689004-0355f0c2-a057-4223-8e6c-ebe58496ec39.png)
Which makes me think that it makes the most sense for us to synthesize Airflow code, but even better would be if they had some server level APIs. They seems pretty extensible (example here).
Need to explore more!
DAG Factories
Working with notebooks
https://medium.com/ai%C2%B3-theory-practice-business/how-to-build-machine-learning-pipelines-with-airflow-papermill-6baef3832bc6
The text was updated successfully, but these errors were encountered: