Unify DAG creation/database cleaning fixtures for testing #3361

ngken0995 · 2023-11-15T18:57:16Z

Fixes

Description

There will be one clean_db() in conftest.py file because tests from multiple test modules in the directory can access the fixture function. Pool and TaskInstance from airflow.models will have a if condition because not all tests runs a filter and deletion for them. There is a override method for fixture on a test module level(link) There are default values get_test_dag_id, get_test_id, isPool, isTaskInstance in conftest.py. On each test involving clean_db(), there should be a get_test_dag_id() fixture and options to run isPool fixture and isTaskInstance fixture by returning True

Testing Instructions

run just down -v
run just catalog/up
run just catalog/init
run just catalog/test

Checklist

My pull request has a descriptive title (not a vague title likeUpdate index.md).
My pull request targets the default branch of the repository (main) or a parent feature branch.
My commit messages follow best practices.
My code follows the established code style of the repository.
I added or updated tests for the changes I made (if applicable).
I added or updated documentation (if applicable).
I tried running the project locally and verified that there are no visible errors.
I ran the DAG documentation generator (if applicable).

Developer Certificate of Origin

Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

AetherUnbound

Thanks for taking this on Kenneth! I think there are a few changes I'd like to see with this. A pytest fixture that exists in the highest level conftest.py file and also has autouse=True means that the fixture will be used in ever test that's run. Most of our tests don't even interact with the Airflow database, so this becomes quite excessive and unnecessary. Additionally, we don't need the extra fixtures for conditionally cleaning task instances and pools if we opt to 1) automatically define a task instance/pool name for each test and 2) always clean them up by selecting records that match those names (whether they exist or not doesn't matter).

I think one way to do this is to create some helper fixtures that generate a unique DAG ID and pool based on the test module. (I say module here instead of test name because we have the test multi-processing binned by module, see here and docs. Here's an example of what that might look like:

# in conftest.py
def _normalize_test_module_name(request) -> str:
    # Extract the test name
    name = request.module.__name__
    # Replace periods with two underscores
    return name.replace(".", "__")


@pytest.fixture
def get_test_dag_id(request):
    return f"{_normalize_test_module_name(request)}_dag"


@pytest.fixture
def get_test_pool(request):
    return f"{_normalize_test_module_name(request)}_pool"


@pytest.fixture
def clean_db(get_test_dag_id, get_test_pool):
    with create_session() as session:
        # synchronize_session='fetch' required here to refresh models
        # https://stackoverflow.com/a/51222378 CC BY-SA 4.0
        session.query(DagRun).filter(DagRun.dag_id == get_test_dag_id).delete()
        session.query(TaskInstance).filter(
            TaskInstance.dag_id.startswith(get_test_dag_id)
        ).delete(synchronize_session="fetch")
        session.query(Pool).filter(id == get_test_pool).delete()

Tests that required a clean database, test DAG, test pool, etc, could then request them as fixtures in their function signature. For example, the test in test_ingestion_server.py could be changed to:

@pytest.fixture()
def index_readiness_dag(get_test_dag_id, clean_db):
    # Create a DAG that just has an index_readiness_check task
    with DAG(dag_id=get_test_dag_id, schedule=None, start_date=TEST_START_DATE) as dag:
        ingestion_server.index_readiness_check(
            media_type="image", index_suffix="my_test_suffix", timeout=timedelta(days=1)
        )

    return dag

(note the usage of get_test_dag_id and clean_db when creating the fixture)

This might require a few more changes to the affected tests, but I think it's much better than adding the clean_db step to every test!

Full diff I had locally for this test if you're interested

diff --git a/catalog/tests/conftest.py b/catalog/tests/conftest.py
index b5b0f0600..4cb33508b 100644
--- a/catalog/tests/conftest.py
+++ b/catalog/tests/conftest.py
@@ -28,35 +28,30 @@ def pytest_addoption(parser):
 mark_extended = pytest.mark.skipif("not config.getoption('extended')")
 
 
-@pytest.fixture()
-def get_test_dag_id():
-    return ""
+def _normalize_test_module_name(request) -> str:
+    # Extract the test name
+    name = request.module.__name__
+    # Replace periods with two underscores
+    return name.replace(".", "__")
 
 
-@pytest.fixture()
-def get_test_pool():
-    return ""
+@pytest.fixture
+def get_test_dag_id(request):
+    return f"{_normalize_test_module_name(request)}_dag"
 
 
-@pytest.fixture()
-def isTaskInstance():
-    return False
+@pytest.fixture
+def get_test_pool(request):
+    return f"{_normalize_test_module_name(request)}_pool"
 
 
-@pytest.fixture()
-def isPool():
-    return False
-
-
-@pytest.fixture(autouse=True)
-def clean_db(get_test_dag_id, get_test_pool, isTaskInstance, isPool):
+@pytest.fixture
+def clean_db(get_test_dag_id, get_test_pool):
     with create_session() as session:
         # synchronize_session='fetch' required here to refresh models
         # https://stackoverflow.com/a/51222378 CC BY-SA 4.0
         session.query(DagRun).filter(DagRun.dag_id == get_test_dag_id).delete()
-        if isTaskInstance:
-            session.query(TaskInstance).filter(
-                TaskInstance.dag_id == get_test_dag_id
-            ).delete(synchronize_session="fetch")
-        if isPool:
-            session.query(Pool).filter(id == get_test_pool).delete()
+        session.query(TaskInstance).filter(
+            TaskInstance.dag_id == get_test_dag_id
+        ).delete(synchronize_session="fetch")
+        session.query(Pool).filter(id == get_test_pool).delete()
diff --git a/catalog/tests/dags/common/test_ingestion_server.py b/catalog/tests/dags/common/test_ingestion_server.py
index 88c2e3b0d..f2ee0a0f0 100644
--- a/catalog/tests/dags/common/test_ingestion_server.py
+++ b/catalog/tests/dags/common/test_ingestion_server.py
@@ -13,23 +13,12 @@ from common import ingestion_server
 
 
 TEST_START_DATE = datetime(2022, 2, 1, 0, 0, 0)
-TEST_DAG_ID = "api_healthcheck_test_dag"
 
 
 @pytest.fixture()
-def get_test_dag_id():
-    return TEST_DAG_ID
-
-
-@pytest.fixture()
-def isTaskInstance():
-    return True
-
-
-@pytest.fixture()
-def index_readiness_dag():
+def index_readiness_dag(get_test_dag_id, clean_db):
     # Create a DAG that just has an index_readiness_check task
-    with DAG(dag_id=TEST_DAG_ID, schedule=None, start_date=TEST_START_DATE) as dag:
+    with DAG(dag_id=get_test_dag_id, schedule=None, start_date=TEST_START_DATE) as dag:
         ingestion_server.index_readiness_check(
             media_type="image", index_suffix="my_test_suffix", timeout=timedelta(days=1)
         )

ngken0995 · 2023-11-20T15:47:53Z

@AetherUnbound unittest.TestCase only allow auto-use fixture(link) In test_single_run_external_dags_sensor.py, TEST_POOL and DAG_PREFIX are hardcode with the same outcome to fixture get_test_dag_id and get_test_pool. In clean_db, DagRun, TaskInstance and Pool will filter with startswith to search for prefix of names and delete them.

AetherUnbound

Thanks so much for reworking this, and sorry for the delay on review!

That's frustrating about the unittest.TestCase scenario 😭 I'll make an issue for converting all our catalog tests from unittest, I don't think there's a specific reason they need to be in that format (but that's not the purview of this particular PR!). Do you mind adding a "TODO" comment right above the test pool/test prefix constants in test_single_run_external_dags_sensor.py noting that they can be removed and the fixtures can be used instead once this test is converted to pytest?

Also, I apologize for suggestion one thing then requesting another, but I'm realizing the get_* prefix doesn't really fit our pytest fixtures standard. Would you mind renaming get_test_dag_id to sample_dag_id_fixture and get_test_pool to sample_pool_fixture?

openverse-bot · 2023-12-01T00:00:12Z

Based on the low urgency of this PR, the following reviewers are being gently reminded to review this PR:

@obulat
@stacimc
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend¹ days, this PR was ready for review 11 day(s) ago. PRs labelled with low urgency are expected to be reviewed within 5 weekday(s)².

@ngken0995, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Specifically, Saturday and Sunday. ↩
For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the operation that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range. ↩

AetherUnbound · 2023-12-01T01:11:37Z

(Drafting so our auto-ping ignores this for now, please feel free to undraft again once changes are made Kenneth!)

AetherUnbound

Fantastic, everything looks great! Thanks Kenneth!

stacimc

This looks great, thanks @ngken0995!

ngken0995 requested a review from a team as a code owner November 15, 2023 18:57

ngken0995 requested review from AetherUnbound and obulat November 15, 2023 18:57

github-actions bot added the 🧱 stack: catalog Related to the catalog and Airflow DAGs label Nov 15, 2023

openverse-bot added 🟩 priority: low Low priority and doesn't need to be rushed ✨ goal: improvement Improvement to an existing user-facing feature 🤖 aspect: dx Concerns developers' experience with the codebase labels Nov 15, 2023

AetherUnbound requested a review from stacimc November 17, 2023 00:58

AetherUnbound requested changes Nov 17, 2023

View reviewed changes

AetherUnbound reviewed Nov 30, 2023

View reviewed changes

AetherUnbound self-requested a review December 1, 2023 01:10

AetherUnbound marked this pull request as draft December 1, 2023 01:11

AetherUnbound mentioned this pull request Dec 1, 2023

Convert unittest.TestCase tests to pytest #3425

Closed

ngken0995 mentioned this pull request Dec 11, 2023

Implement vale style with alex and proselint WordPress/openverse-sensitive-terms#10

Closed

5 tasks

ngken0995 marked this pull request as ready for review December 12, 2023 13:44

ngken0995 added 7 commits December 13, 2023 13:32

implement global clean_db function

c5ce252

remove unused packages

52ad722

format lint

7cb6664

hardcode TEST_POOL and DAG_PREFIX

d58aeae

add comment and TODO for TEST_POOL and DAG_PREFIX constant

28bb715

rename fixture for format standards

9c667e8

rename get_test_dag_id to sample_dag_id_fixture

719f879

AetherUnbound force-pushed the 2622-unify-dag-creation branch from 31e217d to 719f879 Compare December 13, 2023 21:36

AetherUnbound approved these changes Dec 13, 2023

View reviewed changes

stacimc approved these changes Dec 14, 2023

View reviewed changes

AetherUnbound merged commit 80a75e6 into main Dec 14, 2023
42 checks passed

AetherUnbound deleted the 2622-unify-dag-creation branch December 14, 2023 23:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify DAG creation/database cleaning fixtures for testing #3361

Unify DAG creation/database cleaning fixtures for testing #3361

ngken0995 commented Nov 15, 2023 •

edited

AetherUnbound left a comment •

edited

ngken0995 commented Nov 20, 2023

AetherUnbound left a comment

openverse-bot commented Dec 1, 2023

AetherUnbound commented Dec 1, 2023

AetherUnbound left a comment

stacimc left a comment

Unify DAG creation/database cleaning fixtures for testing #3361

Unify DAG creation/database cleaning fixtures for testing #3361

Conversation

ngken0995 commented Nov 15, 2023 • edited

Fixes

Description

Testing Instructions

Checklist

Developer Certificate of Origin

AetherUnbound left a comment • edited

Choose a reason for hiding this comment

ngken0995 commented Nov 20, 2023

AetherUnbound left a comment

Choose a reason for hiding this comment

openverse-bot commented Dec 1, 2023

Footnotes

AetherUnbound commented Dec 1, 2023

AetherUnbound left a comment

Choose a reason for hiding this comment

stacimc left a comment

Choose a reason for hiding this comment

ngken0995 commented Nov 15, 2023 •

edited

AetherUnbound left a comment •

edited