-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement --export-slice-airflow
#320
Changes from all commits
8081eee
5623ced
de9a525
76c036d
c4e119e
ff8fb3f
de89e7a
06bf582
6448004
fdbfc0f
186af91
acae984
1dc2cdd
0caa77f
c0b2745
20e0899
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -84,6 +84,19 @@ jupyter nbconvert --to notebook --execute tests/test_notebook.ipynb --inplace -- | |
Or you can open it in a notebook UI (JupyterLab, JupyterNotebook, VS Code, etc.) | ||
and re-run it manually | ||
|
||
### Airflow | ||
|
||
Sliced code can be exported to an Airflow DAG using the following command: | ||
|
||
``` | ||
lineapy tests/housing.py --slice "p value" --airflow sliced_housing_dag | ||
``` | ||
This creates a `sliced_housing_dag.py` file in the current dir. It can be executed with: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we move this into a separate folder (when generating)? Cleaner organization. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, how would you call it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With this we are saving to the same dir where the original file is, which is |
||
|
||
``` | ||
airflow db init | ||
airflow dags test sliced_housing_dag_dag $(date '+%Y-%m-%d') -S . | ||
``` | ||
## Visual Graphs | ||
|
||
Sometimes it's helpful to see a visual representation of the graph | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
from black import FileMode, format_str | ||
|
||
from lineapy.graph_reader.program_slice import ( | ||
get_program_slice, | ||
split_code_blocks, | ||
) | ||
from lineapy.instrumentation.tracer import Tracer | ||
|
||
AIRFLOW_IMPORTS_TEMPLATE = """ | ||
from airflow import DAG | ||
from airflow.utils.dates import days_ago | ||
from airflow.operators.python_operator import PythonOperator | ||
""" | ||
|
||
AIRFLOW_MAIN_TEMPLATE = """ | ||
default_dag_args = {"owner": "airflow", "retries": 2, "start_date": days_ago(1)} | ||
|
||
dag = DAG( | ||
dag_id="DAG_NAME_dag", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens if we have two artifacts? Will this logic break? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently we can only slice one artifact through There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry I'm not following, do the |
||
schedule_interval="*/15 * * * *", # Every 15 minutes | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to figure out how to parametrize this for API design. I'm not sure about the workflow where the user goes in and manually modifies things. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, but we need to think about the entire UX. We can't just keep adding options to CLI, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes let's talk about it this afternoon. |
||
max_active_runs=1, | ||
catchup=False, | ||
default_args=default_dag_args, | ||
) | ||
|
||
DAG_NAME = PythonOperator( | ||
dag=dag, task_id=f"DAG_NAME_task", python_callable=DAG_NAME, | ||
) | ||
""" | ||
|
||
|
||
def sliced_aiflow_dag(tracer: Tracer, slice_name: str, func_name: str) -> str: | ||
""" | ||
Returns a an Airflow DAG of the sliced code. | ||
|
||
:param tracer: the tracer object. | ||
:param slice_name: name of the artifacts to get the code slice for. | ||
:return: string containing the code of the Airflow DAG running this slice | ||
""" | ||
artifact = tracer.db.get_artifact_by_name(slice_name) | ||
artifact_var = tracer.slice_var_name(artifact) | ||
if not artifact_var: | ||
return "Unable to extract the slice" | ||
slice_code = get_program_slice(tracer.graph, [artifact.id]) | ||
# We split the code in import and code blocks and join them to full code test | ||
import_block, code_block, main_block = split_code_blocks( | ||
slice_code, func_name | ||
) | ||
full_code = ( | ||
import_block | ||
+ "\n" | ||
+ AIRFLOW_IMPORTS_TEMPLATE | ||
+ "\n\n" | ||
+ code_block | ||
+ f"\n\tprint({artifact_var})" # TODO What to do with artifact_var in a DAG? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we need to discuss and design this in more detail before we can commit this PR. @marov you are better equipped than us to help design what to do with this artifact in a deployed airflow process. Let's look at some reference cases together on our next meeting (Friday). |
||
+ "\n\n" | ||
+ AIRFLOW_MAIN_TEMPLATE.replace("DAG_NAME", func_name) | ||
) | ||
# Black lint | ||
black_mode = FileMode() | ||
black_mode.line_length = 79 | ||
full_code = format_str(full_code, mode=black_mode) | ||
return full_code |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,7 +20,7 @@ relative_files = true | |
|
||
# https://docs.sqlalchemy.org/en/14/orm/extensions/mypy.html | ||
# https://pydantic-docs.helpmanual.io/mypy_plugin/#enabling-the-plugin | ||
plugins = ["sqlalchemy.ext.mypy.plugin", "pydantic.mypy"] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we try just removing the sqlalchemy plugin but keeping the mypy one? And commenting that its disabled until we can add sqlalchemy >= 1.4, which adds mypy support, but is blocked on flask-appbuilder, which is a dep of airflow? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could do that. However, as I've mentioned to @yifanwu today, Airflow is but one of the data tools that linea can have "plugins" to.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I am +1 on monorepos when the code is tightly coupled! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, @yifanwu @saulshanabrook how shall we proceed? |
||
# plugins = ["sqlalchemy.ext.mypy.plugin", "pydantic.mypy"] | ||
|
||
|
||
# Enable function body type checking, even if function types are not annotated | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
import subprocess | ||
|
||
|
||
def test_export_slice_housing_dag(): | ||
""" | ||
Verifies that the "--airflow" CLI command produces a working Airflow DAG | ||
""" | ||
subprocess.check_call( | ||
[ | ||
"lineapy", | ||
"tests/housing.py", | ||
"--slice", | ||
"p value", | ||
"--airflow", | ||
"sliced_housing_dag", | ||
] | ||
) | ||
subprocess.check_call( | ||
[ | ||
"airflow", | ||
"db", | ||
"init", | ||
] | ||
) | ||
subprocess.check_call( | ||
[ | ||
"airflow", | ||
"dags", | ||
"test", | ||
"sliced_housing_dag_dag", | ||
"2020-10-19", | ||
"-S", | ||
".", | ||
] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should really be in the README.md, but since it's not ready for "release", maybe we can create another doc called
EXPERIMENTAL.md
or something and move this there.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under impression that
CONTRIBUTING.md
serves this function.