Add DatabricksDeleteJobsOperator to Databricks provider#64766
Add DatabricksDeleteJobsOperator to Databricks provider#64766ha2hi wants to merge 3 commits intoapache:mainfrom
Conversation
|
@ha2hi This PR has a few issues that need to be addressed before it can be reviewed — please see our Pull Request quality criteria. Issues found:
What to do next:
There is no rush — take your time and work at your own pace. We appreciate your contribution and are happy to wait for updates. If you have questions, feel free to ask on the Airflow Slack. |
92267b9 to
9d8fc5e
Compare
|
@potiuk |
There was a problem hiding this comment.
Pull request overview
This PR adds “delete job” support to the Databricks provider by introducing a new hook method for the Databricks Jobs Delete API and an accompanying operator, along with unit tests to validate the operator behavior.
Changes:
- Added
DatabricksHook.delete_job()calling theapi/2.2/jobs/deleteendpoint. - Introduced
DatabricksDeleteJobsOperatorthat uses the hook to delete a job byjob_id. - Added unit tests covering operator execution, hook initialization arguments, and template fields.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| providers/databricks/src/airflow/providers/databricks/hooks/databricks.py | Adds a new delete_job hook method and endpoint constant for Jobs deletion. |
| providers/databricks/src/airflow/providers/databricks/operators/databricks.py | Introduces DatabricksDeleteJobsOperator and wires it to the new hook method. |
| providers/databricks/tests/unit/databricks/operators/test_databricks.py | Adds unit tests validating the new operator’s execution path and templating behavior. |
| def __init__( | ||
| self, | ||
| *, | ||
| job_id: int, | ||
| databricks_conn_id: str = "databricks_default", |
| Deletes a Databricks job. | ||
|
|
||
| :param job_id: The ID of the Databricks job to delete. (templated) | ||
| :param databricks_conn_id: Reference to the Databricks connection. |
| def delete_job(self, job_id: int) -> None: | ||
| """ | ||
| Call the ``api/2.2/jobs/delete`` endpoint. | ||
|
|
||
| :param job_id: The unique identifier of the job to be deleted. | ||
| """ | ||
| self._do_api_call(DELETE_ENDPOINT, {"job_id": job_id}) |
| # Verify that the DatabricksHook was initialized with the correct connection ID and default retry arguments | ||
| mock_hook_class.assert_called_once_with( | ||
| "databricks_default", | ||
| retry_limit=3, | ||
| retry_delay=1, | ||
| retry_args=None, |
| self.databricks_conn_id = databricks_conn_id | ||
| self.polling_period_seconds = polling_period_seconds | ||
| self.databricks_retry_limit = databricks_retry_limit | ||
| self.databricks_retry_delay = databricks_retry_delay | ||
| self.databricks_retry_args = databricks_retry_args |
| REPAIR_RUN_ENDPOINT = ("POST", "2.2/jobs/runs/repair") | ||
| OUTPUT_RUNS_JOB_ENDPOINT = ("GET", "2.2/jobs/runs/get-output") | ||
| CANCEL_ALL_RUNS_ENDPOINT = ("POST", "2.2/jobs/runs/cancel-all") | ||
| DELETE_ENDPOINT = ("POST", "2.2/jobs/delete") |
There was a problem hiding this comment.
I intentionally named it DELETE_ENDPOINT to maintain consistency with existing Job-related endpoints in this file, such as CREATE_ENDPOINT, RESET_ENDPOINT, and UPDATE_ENDPOINT, which do not include the JOB infix.
To resolve this, I think we have three options:
-
Change only this new endpoint : Rename it to DELETE_JOB_ENDPOINT as Copilot suggested.
-
Rename all job endpoints : Add the JOB infix to all existing legacy endpoints (e.g., CREATE_JOB_ENDPOINT, RESET_JOB_ENDPOINT) for overall clarity.
-
Maintain current convention : Keep it as DELETE_ENDPOINT to match the existing names in this module.
I lean towards Option 3 to avoid scope creep or potential breaking changes if anyone is importing those constants directly, but I'm fully open to whichever direction the maintainers prefer. Please let me know what you think!
Description
Currently, the Databricks provider lacks an operator to delete an existing Databricks job, although the Databricks REST API supports this functionality.
This PR introduces the
DatabricksDeleteJobsOperatorto fill this gap.Changes made:
delete_job(self, job_id: int)method to interact with theapi/2.2/jobs/deleteendpoint.test_databricks.pyto verify the operator's execution, correct hook initialization with retry arguments, and template fields.