Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OPENLINEAGE] Introduce AirflowJobFacet and AirflowStateRunFacet #39520

Merged
merged 1 commit into from
Jun 5, 2024

Conversation

kacpermuda
Copy link
Contributor

closes: #39467


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@kacpermuda kacpermuda force-pushed the ol-new-dag-facet branch 3 times, most recently from 35778b2 to ea6a570 Compare May 15, 2024 11:31
@kacpermuda kacpermuda marked this pull request as ready for review May 15, 2024 11:59
@kacpermuda kacpermuda changed the title [OPENLINEAGE] Add new DAG job structure facet [OPENLINEAGE] Introduce AirflowJobFacet and AirflowStateRunFacet May 16, 2024
@kacpermuda kacpermuda marked this pull request as draft May 17, 2024 12:23
@kacpermuda kacpermuda force-pushed the ol-new-dag-facet branch 2 times, most recently from 3d6af0d to 7c0e499 Compare May 21, 2024 14:51
@kacpermuda kacpermuda force-pushed the ol-new-dag-facet branch 3 times, most recently from 77020dd to ec4bc16 Compare May 22, 2024 17:50
@kacpermuda kacpermuda marked this pull request as ready for review May 22, 2024 17:50
@kacpermuda
Copy link
Contributor Author

I had to add some words to spelling wordlist, due to the docs error in CI here. Let me know if there is a better solution to that problem, but part of these words have already been present, just with different capitalization.

@kacpermuda
Copy link
Contributor Author

I also added json specs for the facets, in a new directory. I was not sure if i should add the "$id" key, as in the OL repo, as these facets will not be uploaded to openlineage.io, so i skipped it.

@kacpermuda kacpermuda marked this pull request as draft May 30, 2024 12:15
@kacpermuda kacpermuda marked this pull request as ready for review June 3, 2024 16:36
@kacpermuda
Copy link
Contributor Author

@potiuk Are the 2.7 compatibility tests run somehow different on main? I keep getting the common.io error in this PR (as it's not available in 2.7) and am pretty sure that's not on me 😉 but the same test looks green on main

@potiuk
Copy link
Member

potiuk commented Jun 4, 2024

Yes. When you run tests in PR "selective" checks are run, but when you run them in main "all" tests are run.

In this case, there is an inconsistenty between what is excluded and what is included. On one hand selective checks want to run common.io tests - because openlineage has some common.io import or the other way (whenever openlineage provider gets modified, we run tests for all providers that are cross-dependend on openlineage). But also we want to ignore common.io

Look at the command executed:

Starting the tests with those pytest arguments: 
tests/providers/common/io tests/providers/common/sql tests/providers/dbt/cloud tests/providers/ftp tests/providers/mysql tests/providers/openlineage tests/providers/postgres tests/providers/sftp tests/providers/snowflake tests/providers/trino
 --verbosity=0 --strict-markers --durations=100 --maxfail=50 --color=yes --junitxml=/files/test_result-providers_common_io_common_sql-sqlite.xml --timeouts-order moi --setup-timeout=60 --execution-timeout=60 --teardown-timeout=60 --disable-warnings -rfEX --ignore=tests/system --ignore=tests/integration --warning-output-path=/files/warnings-providers_common_io_common_sql-sqlite.txt 
--ignore=helm_tests --with-db-init --no-cov --ignore=tests/providers/common/io --ignore=tests/providers/fab

From selective checks:

affected-providers-list-as-string = amazon common.io common.sql dbt.cloud ftp google mysql openlineage postgres sftp snowflake trino

From compatibility check we have --ignore=tests/providers/common/io.

Apparently pytest overrides the ignore when we explicitly have tests/providers/common/io on the list of folders to disover tests in. The fix (#40037) is to remove such provider from pytest args when it is also specified as --ignore.

@potiuk
Copy link
Member

potiuk commented Jun 4, 2024

This is the command that will be used after #40037 is merged and you rebase your PR (tests/providers/common/io is gone as explicit pytest parameter):

tests/providers/common/sql tests/providers/dbt/cloud tests/providers/ftp tests/providers/mysql tests/providers/openlineage tests/providers/postgres tests/providers/sftp tests/providers/snowflake tests/providers/trino --verbosity=0 --strict-markers --durations=100 --maxfail=50 --color=yes --junitxml=/files/test_result-providers_common_io_common_sql-sqlite.xml --timeouts-order moi --setup-timeout=60 --execution-timeout=60 --teardown-timeout=60 --disable-warnings -rfEX --ignore=tests/system --ignore=tests/integration --warning-output-path=/files/warnings-providers_common_io_common_sql-sqlite.txt --ignore=helm_tests --with-db-init --ignore tests/providers/apache/beam --ignore tests/system/providers/apache/beam --ignore tests/integration/providers/apache/beam --ignore tests/providers/papermill --ignore tests/system/providers/papermill --ignore tests/integration/providers/papermill --no-cov
 --ignore=tests/providers/common/io --ignore=tests/providers/fab

@potiuk
Copy link
Member

potiuk commented Jun 4, 2024

Rebased after merging #40037 to see if it helps.

@potiuk
Copy link
Member

potiuk commented Jun 4, 2024

OK. The original problem is fixed. Now the remaining problem is to add compatibility so that the openlineage tests can be run on Airflow 2.7 as well (following the different cases of how to handle compatibility described in the unit_tests.rst).

@kacpermuda kacpermuda force-pushed the ol-new-dag-facet branch 2 times, most recently from 0546e39 to 63d051a Compare June 5, 2024 08:41
@kacpermuda kacpermuda marked this pull request as draft June 5, 2024 09:31
Signed-off-by: Kacper Muda <mudakacper@gmail.com>
@kacpermuda kacpermuda marked this pull request as ready for review June 5, 2024 11:33
@kacpermuda
Copy link
Contributor Author

I fixed all the compatibility tests and adjusted the code itself so it works on all Airflow 2.7+ versions now without any issues. Imho it's ready to be merged

@mobuchowski mobuchowski merged commit c202c07 into apache:main Jun 5, 2024
49 checks passed
@kacpermuda kacpermuda deleted the ol-new-dag-facet branch June 5, 2024 11:52
fdemiane pushed a commit to fdemiane/airflow that referenced this pull request Jun 6, 2024
Signed-off-by: Kacper Muda <mudakacper@gmail.com>
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Jul 26, 2024
Signed-off-by: Kacper Muda <mudakacper@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[OpenLineage] Add new DAG job facet
4 participants