Skip to content

Add system tests for Dataproc trigger on_kill cancel behavior#65982

Merged
potiuk merged 7 commits into
apache:mainfrom
srchilukoori:test/dataproc-trigger-on-kill-system-tests
May 10, 2026
Merged

Add system tests for Dataproc trigger on_kill cancel behavior#65982
potiuk merged 7 commits into
apache:mainfrom
srchilukoori:test/dataproc-trigger-on-kill-system-tests

Conversation

@srchilukoori
Copy link
Copy Markdown
Contributor

@srchilukoori srchilukoori commented Apr 27, 2026

Follow-up to #65742 as requested by @shahar1. Adds a system test DAG that exercises the cancel_on_kill behavior introduced when DataprocSubmitTrigger and DataprocSubmitJobDirectTrigger migrated to BaseTrigger.on_kill().

Changes:

New file: providers/google/tests/system/google/cloud/dataproc/example_dataproc_cancel_on_kill.py

The DAG contains two tests sharing a single Dataproc cluster:

Test What it does What it proves
A (happy path) Deferrable submit with cancel_on_kill=True runs SparkPi to completion Trigger plumbing with cancel_on_kill=True does not break normal deferrable execution
B (cancel path) Submits a long-running SparkPi (1M slices) asynchronously, cancels it via DataprocHook.cancel_job(), asserts CANCELLED state The cancel_job() call that on_kill() delegates to actually cancels running Dataproc jobs

Testing:

System tests require GCP credentials and a project with Dataproc API enabled. I do not have access to a GCP environment to run this test end-to-end. The test was verified via:

  • python3 -m py_compile — syntax OK
  • ruff check / ruff format — all checks passed
  • All pre-commit hooks passed
  • Pattern consistency verified against existing Dataproc system tests (example_dataproc_spark_deferrable.py, example_dataproc_spark_async.py, example_dataproc_start_from_trigger.py)

To run manually:

SYSTEM_TESTS_ENV_ID=<id> SYSTEM_TESTS_GCP_PROJECT=<project> \
  pytest providers/google/tests/system/google/cloud/dataproc/example_dataproc_cancel_on_kill.py -s

related: #65742


Was generative AI tooling used to co-author this PR?
  • Yes — Claude

@srchilukoori srchilukoori requested a review from shahar1 as a code owner April 27, 2026 22:39
@boring-cyborg boring-cyborg Bot added area:providers provider:google Google (including GCP) related issues labels Apr 27, 2026
@srchilukoori srchilukoori force-pushed the test/dataproc-trigger-on-kill-system-tests branch from a99a3c6 to f13986d Compare April 28, 2026 00:51
Copy link
Copy Markdown
Contributor

@shahar1 shahar1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you!

@VladaZakharova @MaksYermak - is it compatible for running on your environment? I'll be happy for your feedback.

@VladaZakharova
Copy link
Copy Markdown
Contributor

hi
thank you for submitting PR.
In google provider we don't actually create a system test for every parameter we have, because then google provider will blow because of it's own size :)
I can see that you created this PR to test changes from another PR. I think in this case only good unit tests will be enough.

@shahar1
Copy link
Copy Markdown
Contributor

shahar1 commented Apr 28, 2026

hi
thank you for submitting PR.
In google provider we don't actually create a system test for every parameter we have, because then google provider will blow because of it's own size :)
I can see that you created this PR to test changes from another PR. I think in this case only good unit tests will be enough.

Is it possible to retain it, but exclude from the Google automated tests?

srchilukoori pushed a commit to srchilukoori/airflow that referenced this pull request Apr 29, 2026
Per reviewer feedback (apache#65982), exclude the cancel_on_kill system test
from automated Google CI runs. Test now skips at collection time unless
RUN_MANUAL_DATAPROC_CANCEL_ON_KILL_TEST=1 is set, allowing manual
end-to-end runs without burdening automated infrastructure.
@srchilukoori
Copy link
Copy Markdown
Contributor Author

Thanks @shahar1, made the change.

Added an env-var gate at module level in example_dataproc_cancel_on_kill.py:

if not os.environ.get("RUN_MANUAL_DATAPROC_CANCEL_ON_KILL_TEST"):
    pytest.skip("Manual-only system test: set RUN_MANUAL_DATAPROC_CANCEL_ON_KILL_TEST=1 to run.", allow_module_level=True)

How this works: by default the test is skipped at collection time, so automated runs (which don't set the env var) won't pick it up. To run manually, set RUN_MANUAL_DATAPROC_CANCEL_ON_KILL_TEST=1. Same pytest.skip(..., allow_module_level=True) pattern already used for "service not available" gates in example_bigquery_to_mysql.py etc.

Verified locally with pytest --collect-only --system:

  • without the env var: 0 tests collected (SKIPPED with the explanation message)
  • with RUN_MANUAL_DATAPROC_CANCEL_ON_KILL_TEST=1: 1 test collected

@shahar1
Copy link
Copy Markdown
Contributor

shahar1 commented Apr 29, 2026

Thanks @shahar1, made the change.

Added an env-var gate at module level in example_dataproc_cancel_on_kill.py:

if not os.environ.get("RUN_MANUAL_DATAPROC_CANCEL_ON_KILL_TEST"):
    pytest.skip("Manual-only system test: set RUN_MANUAL_DATAPROC_CANCEL_ON_KILL_TEST=1 to run.", allow_module_level=True)

How this works: by default the test is skipped at collection time, so automated runs (which don't set the env var) won't pick it up. To run manually, set RUN_MANUAL_DATAPROC_CANCEL_ON_KILL_TEST=1. Same pytest.skip(..., allow_module_level=True) pattern already used for "service not available" gates in example_bigquery_to_mysql.py etc.

Verified locally with pytest --collect-only --system:

  • without the env var: 0 tests collected (SKIPPED with the explanation message)
  • with RUN_MANUAL_DATAPROC_CANCEL_ON_KILL_TEST=1: 1 test collected

Nice idea, but currently seems incompatible with other tests, as they fail - try to figure out how to work around that.

@srchilukoori
Copy link
Copy Markdown
Contributor Author

srchilukoori commented Apr 29, 2026

Fixed in 678492e. Replaced pytest.skip(allow_module_level=True) with pytestmark = pytest.mark.skipif(...) — file imports cleanly now, test still skipped unless RUN_MANUAL_DATAPROC_CANCEL_ON_KILL_TEST=1 is set.

cc: @VladaZakharova

Sadha Chilukoori added 4 commits May 2, 2026 11:56
Follow-up to apache#65742 (requested by @shahar1). Adds a system test DAG
that exercises the cancel_on_kill plumbing introduced when
DataprocSubmitTrigger and DataprocSubmitJobDirectTrigger migrated
to BaseTrigger.on_kill().

Test A: deferrable submit with cancel_on_kill=True completes normally.
Test B: submits a long-running job, cancels it via DataprocHook.cancel_job()
(the same call on_kill delegates to), and asserts CANCELLED state.
The bare `from airflow.decorators import task` import fails MyPy in
the providers check because `airflow.decorators` does not export `task`
in Airflow 3.x. Switch to the standard `AIRFLOW_V_3_0_PLUS` conditional
import pattern used by all other system tests.
Per reviewer feedback (apache#65982), exclude the cancel_on_kill system test
from automated Google CI runs. Test now skips at collection time unless
RUN_MANUAL_DATAPROC_CANCEL_ON_KILL_TEST=1 is set, allowing manual
end-to-end runs without burdening automated infrastructure.
pytest.skip(allow_module_level=True) raises a Skipped exception at
import time that DagBag records as an import error, breaking the
test_should_be_importable parametrized test in DB-core CI jobs.

pytestmark = pytest.mark.skipif(...) lets the file import cleanly
(DagBag happy, importability test passes) while still skipping
test_run when RUN_MANUAL_DATAPROC_CANCEL_ON_KILL_TEST is unset.
@srchilukoori srchilukoori force-pushed the test/dataproc-trigger-on-kill-system-tests branch from 678492e to 649d397 Compare May 2, 2026 18:56
@potiuk
Copy link
Copy Markdown
Member

potiuk commented May 5, 2026

@srchilukoori — Your unresolved review thread(s) from @shahar1 appear to have been addressed (post-review commits and/or in-thread replies on every thread, with the latest commit pushed after the most recent thread). I've added the ready for maintainer review label so the PR re-enters the maintainer review queue.

@shahar1 — could you take another look when you have a chance? If you agree the feedback was addressed, please mark the threads as resolved so the queue signal stays accurate. If a thread still needs work, please reply in-line — @srchilukoori will follow up.


Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

@potiuk potiuk added the ready for maintainer review Set after triaging when all criteria pass. label May 5, 2026
@shahar1
Copy link
Copy Markdown
Contributor

shahar1 commented May 5, 2026

Merged from main, good to merge after CI is green

@srchilukoori
Copy link
Copy Markdown
Contributor Author

Merged from main, good to merge after CI is green

@shahar1 all checks are green. Please merge when you get a minute.

@potiuk potiuk merged commit f38982c into apache:main May 10, 2026
94 checks passed
jason810496 pushed a commit to jason810496/airflow that referenced this pull request May 11, 2026
…#65982)

* Add system tests for Dataproc trigger on_kill cancel behavior

Follow-up to apache#65742 (requested by @shahar1). Adds a system test DAG
that exercises the cancel_on_kill plumbing introduced when
DataprocSubmitTrigger and DataprocSubmitJobDirectTrigger migrated
to BaseTrigger.on_kill().

Test A: deferrable submit with cancel_on_kill=True completes normally.
Test B: submits a long-running job, cancels it via DataprocHook.cancel_job()
(the same call on_kill delegates to), and asserts CANCELLED state.

* Fix MyPy failure: use version-compat import pattern for task decorator

The bare `from airflow.decorators import task` import fails MyPy in
the providers check because `airflow.decorators` does not export `task`
in Airflow 3.x. Switch to the standard `AIRFLOW_V_3_0_PLUS` conditional
import pattern used by all other system tests.

* Gate cancel_on_kill system test behind opt-in env var

Per reviewer feedback (apache#65982), exclude the cancel_on_kill system test
from automated Google CI runs. Test now skips at collection time unless
RUN_MANUAL_DATAPROC_CANCEL_ON_KILL_TEST=1 is set, allowing manual
end-to-end runs without burdening automated infrastructure.

* Replace module-level pytest.skip with pytestmark skipif

pytest.skip(allow_module_level=True) raises a Skipped exception at
import time that DagBag records as an import error, breaking the
test_should_be_importable parametrized test in DB-core CI jobs.

pytestmark = pytest.mark.skipif(...) lets the file import cleanly
(DagBag happy, importability test passes) while still skipping
test_run when RUN_MANUAL_DATAPROC_CANCEL_ON_KILL_TEST is unset.

* Rename env flag to RUN_MANUAL_GOOGLE_SYSTEM_TESTS for generalization

---------

Co-authored-by: Sadha Chilukoori <ssreddy.8555@gmail.com>
Co-authored-by: Shahar Epstein <60007259+shahar1@users.noreply.github.com>
@srchilukoori srchilukoori deleted the test/dataproc-trigger-on-kill-system-tests branch May 11, 2026 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:google Google (including GCP) related issues ready for maintainer review Set after triaging when all criteria pass.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants