-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Snowflake QueryModifier issue #1962
Conversation
result = self.connection.execute(sql, parameters) | ||
|
||
for sql_query in query_modifier.pre_queries: | ||
_ = self.run_single_sql_query(sql_query, parameters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tatiana AFAIK query tags can also be used to track the usage of a particular query. Executing them separately will make the tracking difficult right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@utkarsharma2 That's a very good question. Our query tags support was designed focused on Snowflake, and the expectation is that the users would give ALTER SESSION
statements which would be executed before the main transform
/ run_raw_query
.
If we only had one ALTER SESSION
statement and ran the query of interest, this approach would be safe since it would be only for that broader query. However, since ATM, the user can potentially give multiple queries to be run before, and after the main statement, all of them would be labelled the same way, which can be prone to errors.
We can also look into how other databases support query tags - I wonder if it already works for Postgres and MySQL since they rely only on comments. I'll try it out.
For now, I suggest we focus on the customer issue of running the feature as it was originally designed - and review the overall approach in a separate PR/ticket, how do you feel about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking if it is worth mentioning in the docs string how we run the query and if you have multiple queries then how metrics will be impacted wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tatiana I understand to the naming is misleading since QueryModifier
means something else generally. Would it make sense to change it or isolate it Snowflake?
I won't consider this as a blocker since it's a pressing issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #1962 +/- ##
==========================================
- Coverage 90.84% 89.74% -1.11%
==========================================
Files 72 72
Lines 4250 4261 +11
Branches 511 514 +3
==========================================
- Hits 3861 3824 -37
- Misses 302 341 +39
- Partials 87 96 +9
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
de7028f
to
bcf14fa
Compare
Add a DAG to illustrate how users can set `query_tags` in Snowflake using the Python SDK. Before this PR, it would fail with: ``` E sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 000008 (0A000): 01acd6d5-0607-a92d-0000-68213eeeebda: Actual statement count 2 did not match the desired statement count 1. E [SQL: ALTER SESSION SET query_tag='not_guinea_pig';;CREATE TABLE IF NOT EXISTS SANDBOX.ASTROFLOW_CI._tmp_ksimu2ab9s9kbatbtrexodw6q9alcskhfcumxpxuvi2ir60ew81a3st3k AS SELECT * E FROM IDENTIFIER(%(input_table)s) WHERE type NOT LIKE 'Guinea Pig' E ] E [parameters: ***'input_table': 'SANDBOX.ASTROFLOW_CI._tmp_tddzk0vrcf3lbt1nxucnv3ejyx1ds3su84fvdqpoa4p69yguysppti2mo'***] E (Background on this error at: https://sqlalche.me/e/14/f405) ``` Note: the failing Redshift tests are unrelated to the change introduced in this PR. They are being investigated by PR #1959 . So far, they don't happen when we use a newer version of Redshift, they don't happen, but the costs are higher. Therefore, I suggest we open an exception and merge this PR disregarding them.
@aql.transform(assume_schema_exists=True) | ||
@aql.transform( | ||
assume_schema_exists=True, | ||
query_modifier=QueryModifier(pre_queries=["ALTER SESSION SET query_tag='not_guinea_pig';"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tatiana We ran this DAG on Astro but like are still getting this error
Actual statement count 2 did not match the desired statement count 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pankajastro, as we spoke in the call, the issue seems that the Astro CI is pointing to an old version of the Python SDK. The stack trace had:
File "/home/astro/.local/lib/python3.10/site-packages/astro/databases/base.py", line 141, in run_sql
result = self.connection.execute(
Which does not match this version (or the main branch):
https://github.com/astronomer/astro-sdk/blob/main/python-sdk/src/astro/databases/base.py#L106-L128
Please, let me know if that's not the issue and we can investigate further#1962
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @tatiana, Look like the problem is in the way we ran the test but yet needs to be confirmed. I ran dag on my local setup and it ran fine.
Add a DAG to illustrate how users can set
query_tags
in Snowflake using the Python SDK.Before this PR, it would fail with:
Note: the failing Redshift tests are unrelated to the change introduced in this PR. They are being investigated by PR #1959 . So far, they don't happen when we use a newer version of Redshift, they don't happen, but the costs are higher. Therefore, I suggest we open an exception and merge this PR disregarding them.