Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing the behaviours of SQL Hooks and Operators finally #27912

Merged
merged 1 commit into from
Nov 26, 2022

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Nov 25, 2022

This is the more bold fix to common.sql and related providers. It implements more comprehensive tests and describes the intended behaviours in much more explicit way, also it simplifies the notion of "scalar" behaviour by introducing a new method to check on when scalar return is expected.

Comprehensive suite of tests have been added (the original tests were converted to Pytest test and parameterized was used to test all the combinations of parameters and returned values.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member Author

potiuk commented Nov 25, 2022

cc: @alexott

@potiuk
Copy link
Member Author

potiuk commented Nov 25, 2022

cc: @alexott @kazanzhy - I have two important doubts about processing of ; (and sql stripping in general).

  1. (Less important if it works) Stripping ";"

I followed original implementations I think and I left I think intact original behviour of stripping in both implementationations.

Look at the parameterized tests - and the "DBApiHook" behaves now differently than Databricks Hook in this regard - the Databricks hook always splits the ;

https://github.com/apache/airflow/pull/27912/files#diff-db2c1495ba766a1bad71474f36caeb3e8b4228522571e3229c1c64cede967027R99

Where the standard DBApiHook does not:

https://github.com/apache/airflow/pull/27912/files#diff-718d368660d07715b383dd4d94344bc41b835189da39bb95fcecaa3a423e6e47R110

  1. The behaviour of cursor and connection opening/closing is different in both cases:
  1. The comiit strategy is different between the two (and does not work in Databricks IMHO) after last statement in Databricks Hook. The DBApi hook performs commit after all statements when autocommit is off https://github.com/apache/airflow/pull/27912/files#diff-6e1b2f961cb951d05d66d2d814ef5f6d8f8bf8f43c40fb5d40e27a031fed8dd7R307 but the DAtabricks one has very weird behaviour - when autocommit is off, I think each statement result will be rolled back automatically - so it basically makes no sense to set autocommit to off, because there is no way to manually commit (COMMIT is a separate statement and previous statement is rolled back when it is executed).

https://peps.python.org/pep-0249/#Connection.close

Note that closing a connection without committing the changes first will cause an implicit rollback to be performed.

I am no sure what was the reasoning behind those differences - but from what I see those differences now between those two "run" methods. Question is - whether we have good reasons to behave differently for those cases?

airflow/providers/common/sql/hooks/sql.py Outdated Show resolved Hide resolved
airflow/providers/common/sql/operators/sql.py Outdated Show resolved Hide resolved
airflow/providers/databricks/hooks/databricks_sql.py Outdated Show resolved Hide resolved
@potiuk potiuk force-pushed the fix-sql-behaviours branch 3 times, most recently from 25e215b to 6f39bb6 Compare November 25, 2022 16:00
@potiuk
Copy link
Member Author

potiuk commented Nov 25, 2022

OK. There was some permission issue - but I think this one shoudl be ready to go. I added some descriptions and explanations about the behaviours and 🤞 it will pass all the docs and tests

@alexott
Copy link
Contributor

alexott commented Nov 25, 2022

Let me do a checkout in my environment and test.

Copy link
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much work on that PR! Nothing to say on my side, very solid! Thanks for adding all these tests (on top of the other changes), it adds a lot of robustness!

@potiuk
Copy link
Member Author

potiuk commented Nov 25, 2022

So much work on that PR! Nothing to say on my side, very solid! Thanks for adding all these tests (on top of the other changes), it adds a lot of robustness!

Indeed. I added already few small changes/clarifications after the initial push and adding all tests and it felt soooooooo reassuring to see green checkmark for all tests after my changes.

@potiuk
Copy link
Member Author

potiuk commented Nov 25, 2022

BTW. I REALLY ❤️ Pytest style of tests vs. the "classic" unit.TestCase.

Partucularrly fixtures - when you get a grasp of them, parameterization and the integration of those running tests within IDEs are just great.

@potiuk
Copy link
Member Author

potiuk commented Nov 25, 2022

All looks good :)

This is the more bold fix to common.sql and related providers.
It implements more comprehensive tests and describes the intended
behaviours in much more explicit way, also it simplifies the notion
of "scalar" behaviour by introducing a new method to check on when
scalar return is expected.

Comprehensive suite of tests have been added (the original tests
were converted to Pytest test and parameterized was used to
test all the combinations of parameters and returned values.
@dstandish
Copy link
Contributor

It's holiday in us and I'm traveling, won't be able to review until next week

@potiuk potiuk merged commit db5375b into apache:main Nov 26, 2022
@potiuk potiuk deleted the fix-sql-behaviours branch November 26, 2022 09:26
Copy link
Contributor

@alexott alexott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you again!

@@ -32,7 +32,9 @@ This release fixes a few errors that were introduced in common.sql operator whil
* ``_process_output`` method in ``SQLExecuteQueryOperator`` has now consistent semantics and typing, it
can also modify the returned (and stored in XCom) values in the operators that derive from the
``SQLExecuteQueryOperator``).
* last description of the cursor whether to return scalar values are now stored in DBApiHook
* descriptions of all returned results are stored as descriptions property in the DBApiHook
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(could be done later) maybe it would be a bit easier to read if descriptions and last_description are formatted as code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. We can update it later :)

@@ -190,11 +189,17 @@ class SQLExecuteQueryOperator(BaseSQLOperator):
Executes SQL code in a specific database
:param sql: the SQL code or string pointing to a template file to be executed (templated).
File must have a '.sql' extensions.

When implementing a specific Operator, you can also implement `_process_output` method in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use double backquotes in docstrings?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely :). I think it will work anyway in docstring (we'll see in the API docs) but yeah - we can fix it as follow-up

@potiuk
Copy link
Member Author

potiuk commented Nov 26, 2022

BTW. Release of rc3 is coming .

@BillCM
Copy link

BillCM commented Nov 28, 2022

Interesting note: the 3.3.0 DatabricksSQLOperator had a parameter for control writing to xcom. It appears to be removed in 4.0.0. This is causing the dump of a lot of PII into XCOM for my use case.

@alexott
Copy link
Contributor

alexott commented Nov 28, 2022

@BillCM do_xcom_push should be still supported as it's argument to the BaseOperator - can you check it?

@potiuk potiuk linked an issue Nov 28, 2022 that may be closed by this pull request
2 tasks
potiuk added a commit to potiuk/airflow that referenced this pull request Nov 29, 2022
The change apache#27912 fixed and unified behaviour of DBApiHooks
across the board, but it missed two places where sql was mis-used
and overridden in exasol and snowflake hooks.

The check for "sql" type did not use the original sql parameter
value but the one that was overridden later in the run method
implementation.

The fix is the same as applied in Databricks Hook and DBAPI
generic run methods - using consistent typing and separate
variable to convert the sql string into sql list.

Related: astronomer/astro-sdk#1324
potiuk added a commit that referenced this pull request Nov 30, 2022
…27997)

The change #27912 fixed and unified behaviour of DBApiHooks
across the board, but it missed two places where sql was mis-used
and overridden in exasol and snowflake hooks.

The check for "sql" type did not use the original sql parameter
value but the one that was overridden later in the run method
implementation.

The fix is the same as applied in Databricks Hook and DBAPI
generic run methods - using consistent typing and separate
variable to convert the sql string into sql list.

Related: astronomer/astro-sdk#1324
jrggggg pushed a commit to jrggggg/airflow that referenced this pull request Dec 1, 2022
…pache#27997)

The change apache#27912 fixed and unified behaviour of DBApiHooks
across the board, but it missed two places where sql was mis-used
and overridden in exasol and snowflake hooks.

The check for "sql" type did not use the original sql parameter
value but the one that was overridden later in the run method
implementation.

The fix is the same as applied in Databricks Hook and DBAPI
generic run methods - using consistent typing and separate
variable to convert the sql string into sql list.

Related: astronomer/astro-sdk#1324
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DatabricksSQLOperator ValueError “too many values to unpack”
8 participants