Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Databricks SQL operators #21363

Merged
merged 11 commits into from
Feb 27, 2022
Merged

Databricks SQL operators #21363

merged 11 commits into from
Feb 27, 2022

Conversation

alexott
Copy link
Contributor

@alexott alexott commented Feb 6, 2022

This PR adds new operators to Databricks provider:

  • DatabrickSqlOperator that allows to execute SQL commands against Databricks SQL Endpoints and Databricks clusters.
  • DatabricksCopyIntoOperator (built on top of DatabrickSqlOperator) that allows to import data into Databricks tables.

This operator uses the same connection as other Databricks operators (although it could be discussed), if having a dedicated connection make sense as we can further customize it with specific input fields, etc.

Another possible improvement - make the databricks-sql-connector dependency optional, but I'm not sure how to make it correctly in Airflow

closes: #21030
closes: #21376

@alexott alexott marked this pull request as draft February 6, 2022 14:58
@alexott alexott changed the title [WIP-do-not-merge] Databricks SQL operator Databricks SQL operator Feb 6, 2022
airflow/providers/databricks/hooks/databricks_base.py Outdated Show resolved Hide resolved
airflow/providers/databricks/hooks/databricks_base.py Outdated Show resolved Hide resolved
airflow/providers/databricks/hooks/databricks.py Outdated Show resolved Hide resolved
airflow/providers/databricks/hooks/databricks.py Outdated Show resolved Hide resolved
airflow/providers/databricks/hooks/databricks_sql.py Outdated Show resolved Hide resolved
airflow/providers/databricks/hooks/databricks_sql.py Outdated Show resolved Hide resolved
airflow/providers/databricks/hooks/databricks_sql.py Outdated Show resolved Hide resolved
airflow/providers/databricks/hooks/databricks_sql.py Outdated Show resolved Hide resolved
@alexott
Copy link
Contributor Author

alexott commented Feb 6, 2022

Thank you for review @pateash , but this is really far from review state - more refactoring is coming

setup.py Show resolved Hide resolved
@alexott alexott marked this pull request as ready for review February 13, 2022 19:05
@alexott alexott force-pushed the databricks-sql-operator branch 2 times, most recently from e74f505 to 74d2e87 Compare February 20, 2022 10:57
@alexott alexott requested a review from mik-laj February 20, 2022 11:23
@alexott alexott changed the title Databricks SQL operator Databricks SQL operators Feb 20, 2022
@alexott
Copy link
Contributor Author

alexott commented Feb 21, 2022

@potiuk Jarek - would it be possible to review the changes?

@potiuk
Copy link
Member

potiuk commented Feb 26, 2022

You need to rebase @alexott

alexott and others added 10 commits February 27, 2022 11:01
No documentation & tests yet
Still need to fix existing tests & add tests for Databricks SQL hook &
operator
This includes:
* identifying SQL Endpoint by name
* allow to output results into a CSV/JSON/JSONL file
* fix tests for DatabricksHook
* address most of the comments
…rator

Co-authored-by: Lennart Kats (databricks) <lennart.kats@databricks.com>
Split documentation for operators into separate pages & add more content
and examples.
@alexott
Copy link
Contributor Author

alexott commented Feb 27, 2022

@potiuk done. thank you for review

@potiuk
Copy link
Member

potiuk commented Feb 27, 2022

Tests are failing though :(

@alexott
Copy link
Contributor Author

alexott commented Feb 27, 2022

🤦 forgot that tests are referring to the requests that was moved into another file...
tests are green now @potiuk

@potiuk potiuk merged commit 27d19e7 into apache:main Feb 27, 2022
@jedcunningham jedcunningham added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Feb 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)
Projects
None yet
6 participants