-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the logic of openlineage.airflow.macros.lineage_parent_id
#2488
Comments
Thanks for creating your first OpenLineage issue! Your feedback is valuable and improves the project. If you haven't already, please be sure to follow the issue template! |
…arent_id`. - Returned format: `<namespace>/<name>/<run_id>`. - `name` should be `<dag_id>.<task_id>`, not a UUID. - `run_id` should be a UUID, not `<run_timestamp>.<try_number>`.
…arent_id`. - Returned format: `<namespace>/<name>/<run_id>`. - `name` should be `<dag_id>.<task_id>`, not a UUID. - `run_id` should be a UUID, not `<run_timestamp>.<try_number>`. Signed-off-by: Fabio Manganiello <fabio.manganiello@booking.com>
May I propose is to just remove (or ar least deprecate) the SparkSubmitOperator(
...,
config={
"spark.openlineage.parentJobNamespace": "{{macros.OpenLineageProviderPlugin.namespace()}}",
"spark.openlineage.parentJobName": "{{dag.dag_id}}.{{task.task_id}}",
"spark.openlineage.parentRunId": "{{macros.OpenLineageProviderPlugin.lineage_run_id(task_instance)}}",
) |
There is no need to deprecate the macro, we can simply add two new macros - then users can choose to either pass all the information and parse it in spark or pass everything separately. |
…arent_id`. - Returned format: `<namespace>/<name>/<run_id>`. - `name` should be `<dag_id>.<task_id>`, not a UUID. - `run_id` should be a UUID, not `<run_timestamp>.<try_number>`. Signed-off-by: Fabio Manganiello <fabio.manganiello@booking.com> Signed-off-by: Fabio Manganiello <fabio@manganiello.tech>
…arent_id`. - Returned format: `<namespace>/<name>/<run_id>`. - `name` should be `<dag_id>.<task_id>`, not a UUID. - `run_id` should be a UUID, not `<run_timestamp>.<try_number>`. - Both `lineage_run_id` and `lineage_parent_id` should expose the same interface - only a `TaskInstance` object is now required as argument. - Import `_DAG_NAMESPACE` instead of inferring it again. Airflow-Reference: apache/airflow#37877 Signed-off-by: Fabio Manganiello <fabio.manganiello@booking.com>
…arent_id`. - Returned format: `<namespace>/<name>/<run_id>`. - `name` should be `<dag_id>.<task_id>`, not a UUID. - `run_id` should be a UUID, not `<run_timestamp>.<try_number>`. - Both `lineage_run_id` and `lineage_parent_id` should expose the same interface - only a `TaskInstance` object is now required as argument. - Import `_DAG_NAMESPACE` instead of inferring it again. Airflow reference: apache/airflow#37877 Signed-off-by: Fabio Manganiello <fabio.manganiello@booking.com>
…arent_id`. - Returned format: `<namespace>/<name>/<run_id>`. - `name` should be `<dag_id>.<task_id>`, not a UUID. - `run_id` should be a UUID, not `<run_timestamp>.<try_number>`. - Both `lineage_run_id` and `lineage_parent_id` should expose the same interface - only a `TaskInstance` object is now required as argument. - Import `_DAG_NAMESPACE` instead of inferring it again. Airflow reference: apache/airflow#37877 Signed-off-by: Fabio Manganiello <fabio@manganiello.tech>
#2578) * [#2488] Fixed format returned by `airflow.macros.lineage_parent_id`. - Returned format: `<namespace>/<name>/<run_id>`. - `name` should be `<dag_id>.<task_id>`, not a UUID. - `run_id` should be a UUID, not `<run_timestamp>.<try_number>`. - Both `lineage_run_id` and `lineage_parent_id` should expose the same interface - only a `TaskInstance` object is now required as argument. - Import `_DAG_NAMESPACE` instead of inferring it again. Airflow reference: apache/airflow#37877 Signed-off-by: Fabio Manganiello <fabio@manganiello.tech> * Fixed redefinition of `get_unknown_source_attribute_run_facet` introduced upon merge. Signed-off-by: Fabio Manganiello <fabio@manganiello.tech> * Addressed #2578 (comment) Signed-off-by: Fabio Manganiello <fabio.manganiello@booking.com> * Fixed failing macro test Signed-off-by: Fabio Manganiello <fabio.manganiello@booking.com> --------- Signed-off-by: Fabio Manganiello <fabio@manganiello.tech> Signed-off-by: Fabio Manganiello <fabio.manganiello@booking.com>
Done: #2582 |
…eage events are emitted. This is a follow-up [/missed corner case] of OpenLineage#2560 Signed-off-by: Fabio Manganiello <fabio@manganiello.tech>
The macro
openlineage.airflow.macros.lineage_parent_id
currently generates the parent_id asnamespace/name/run_id
, wherename
however is a UUID generated from the task metadata, andrun_id
is the concatenation of the task instance timestamp and try number.This isn't compliant with the OpenLineage conventions. I'd expect
name=<dag_id>.<task_id>
, as it's a convention in other parts of the codebase, and run_id to be a UUID.The text was updated successfully, but these errors were encountered: