Fix Snowflake QueryModifier issue #1962

tatiana · 2023-06-08T22:39:00Z

Add a DAG to illustrate how users can set query_tags in Snowflake using the Python SDK.

Before this PR, it would fail with:

E       sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 000008 (0A000): 01acd6d5-0607-a92d-0000-68213eeeebda: Actual statement count 2 did not match the desired statement count 1.
E       [SQL: ALTER SESSION SET query_tag='not_guinea_pig';;CREATE TABLE IF NOT EXISTS SANDBOX.ASTROFLOW_CI._tmp_ksimu2ab9s9kbatbtrexodw6q9alcskhfcumxpxuvi2ir60ew81a3st3k AS SELECT *
E           FROM IDENTIFIER(%(input_table)s) WHERE type NOT LIKE 'Guinea Pig'
E           ]
E       [parameters: ***'input_table': 'SANDBOX.ASTROFLOW_CI._tmp_tddzk0vrcf3lbt1nxucnv3ejyx1ds3su84fvdqpoa4p69yguysppti2mo'***]
E       (Background on this error at: https://sqlalche.me/e/14/f405)

Note: the failing Redshift tests are unrelated to the change introduced in this PR. They are being investigated by PR #1959 . So far, they don't happen when we use a newer version of Redshift, they don't happen, but the costs are higher. Therefore, I suggest we open an exception and merge this PR disregarding them.

utkarsharma2 · 2023-06-09T16:11:50Z

python-sdk/src/astro/databases/base.py

-            result = self.connection.execute(sql, parameters)
+
+        for sql_query in query_modifier.pre_queries:
+            _ = self.run_single_sql_query(sql_query, parameters)


@tatiana AFAIK query tags can also be used to track the usage of a particular query. Executing them separately will make the tracking difficult right?

ref - https://docs.vividcortex.com/general-reference/query-tags/

@utkarsharma2 That's a very good question. Our query tags support was designed focused on Snowflake, and the expectation is that the users would give ALTER SESSION statements which would be executed before the main transform / run_raw_query.

If we only had one ALTER SESSION statement and ran the query of interest, this approach would be safe since it would be only for that broader query. However, since ATM, the user can potentially give multiple queries to be run before, and after the main statement, all of them would be labelled the same way, which can be prone to errors.

We can also look into how other databases support query tags - I wonder if it already works for Postgres and MySQL since they rely only on comments. I'll try it out.

For now, I suggest we focus on the customer issue of running the feature as it was originally designed - and review the overall approach in a separate PR/ticket, how do you feel about this?

I was thinking if it is worth mentioning in the docs string how we run the query and if you have multiple queries then how metrics will be impacted wdyt?

@tatiana I understand to the naming is misleading since QueryModifier means something else generally. Would it make sense to change it or isolate it Snowflake?

I won't consider this as a blocker since it's a pressing issue.

pankajastro

LGTM

codecov · 2023-06-14T09:01:44Z

Codecov Report

Patch coverage: 82.35% and project coverage change: -1.11% ⚠️

Comparison is base (bfc8daa) 90.84% compared to head (bcf14fa) 89.74%.
Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1962      +/-   ##
==========================================
- Coverage   90.84%   89.74%   -1.11%     
==========================================
  Files          72       72              
  Lines        4250     4261      +11     
  Branches      511      514       +3     
==========================================
- Hits         3861     3824      -37     
- Misses        302      341      +39     
- Partials       87       96       +9

Flag	Coverage Δ
PythonSDK	`89.74% <82.35%> (-1.11%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
python-sdk/src/astro/sql/operators/cleanup.py	`90.08% <50.00%> (-1.52%)`	⬇️
python-sdk/src/astro/databases/base.py	`87.79% <85.71%> (-4.45%)`	⬇️
python-sdk/src/astro/__init__.py	`100.00% <100.00%> (ø)`

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Add a DAG to illustrate how users can set `query_tags` in Snowflake using the Python SDK. Before this PR, it would fail with: ``` E sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 000008 (0A000): 01acd6d5-0607-a92d-0000-68213eeeebda: Actual statement count 2 did not match the desired statement count 1. E [SQL: ALTER SESSION SET query_tag='not_guinea_pig';;CREATE TABLE IF NOT EXISTS SANDBOX.ASTROFLOW_CI._tmp_ksimu2ab9s9kbatbtrexodw6q9alcskhfcumxpxuvi2ir60ew81a3st3k AS SELECT * E FROM IDENTIFIER(%(input_table)s) WHERE type NOT LIKE 'Guinea Pig' E ] E [parameters: ***'input_table': 'SANDBOX.ASTROFLOW_CI._tmp_tddzk0vrcf3lbt1nxucnv3ejyx1ds3su84fvdqpoa4p69yguysppti2mo'***] E (Background on this error at: https://sqlalche.me/e/14/f405) ``` Note: the failing Redshift tests are unrelated to the change introduced in this PR. They are being investigated by PR #1959 . So far, they don't happen when we use a newer version of Redshift, they don't happen, but the costs are higher. Therefore, I suggest we open an exception and merge this PR disregarding them.

pankajastro · 2023-08-09T09:53:34Z

python-sdk/example_dags/example_amazon_s3_snowflake_transform.py

-@aql.transform(assume_schema_exists=True)
+@aql.transform(
+    assume_schema_exists=True,
+    query_modifier=QueryModifier(pre_queries=["ALTER SESSION SET query_tag='not_guinea_pig';"]),


@tatiana We ran this DAG on Astro but like are still getting this error

Actual statement count 2 did not match the desired statement count 1.

@pankajastro, as we spoke in the call, the issue seems that the Astro CI is pointing to an old version of the Python SDK. The stack trace had:

File "/home/astro/.local/lib/python3.10/site-packages/astro/databases/base.py", line 141, in run_sql result = self.connection.execute(

Which does not match this version (or the main branch):
https://github.com/astronomer/astro-sdk/blob/main/python-sdk/src/astro/databases/base.py#L106-L128

Please, let me know if that's not the issue and we can investigate further#1962

Yes, @tatiana, Look like the problem is in the way we ran the test but yet needs to be confirmed. I ran dag on my local setup and it ran fine.

tatiana changed the title ~~Change an example DAG to illustrate the usage of QueryModifier~~ Fix QueryModifier issue Jun 9, 2023

tatiana marked this pull request as ready for review June 9, 2023 15:24

tatiana requested review from dimberman, utkarsharma2, sunank200 and pankajastro as code owners June 9, 2023 15:24

jlaneve approved these changes Jun 9, 2023

View reviewed changes

utkarsharma2 reviewed Jun 9, 2023

View reviewed changes

tatiana changed the title ~~Fix QueryModifier issue~~ Fix Snowflake QueryModifier issue Jun 9, 2023

pankajastro approved these changes Jun 14, 2023

View reviewed changes

tatiana added 6 commits August 7, 2023 21:59

Change an example DAG to illustrate the usage of QueryModifier

dfadd6c

Fix query_modifier behaviour

5ee8364

Fix tests

51fc2e9

Fix broken

d6dc83d

Fix transform

3d611a2

Release 1.7.0a3

bcf14fa

pankajastro force-pushed the query-tag-integration-test branch from de7028f to bcf14fa Compare August 7, 2023 16:29

pankajastro approved these changes Aug 7, 2023

View reviewed changes

pankajastro merged commit 894ac85 into main Aug 7, 2023
31 of 32 checks passed

pankajastro deleted the query-tag-integration-test branch August 7, 2023 17:19

pankajastro mentioned this pull request Aug 9, 2023

Comment out query_modifier param #1996

Closed

pankajastro reviewed Aug 9, 2023

View reviewed changes

tatiana mentioned this pull request Sep 20, 2024

WIP: Fix setting Snowflake query tags #2187

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Snowflake QueryModifier issue #1962

Fix Snowflake QueryModifier issue #1962

tatiana commented Jun 8, 2023 •

edited

Loading

utkarsharma2 Jun 9, 2023

utkarsharma2 Jun 9, 2023

tatiana Jun 9, 2023

pankajastro Jun 14, 2023

utkarsharma2 Jun 14, 2023

pankajastro left a comment

codecov bot commented Jun 14, 2023 •

edited

Loading

pankajastro Aug 9, 2023

tatiana Aug 9, 2023

pankajastro Aug 9, 2023

Fix Snowflake QueryModifier issue #1962

Fix Snowflake QueryModifier issue #1962

Conversation

tatiana commented Jun 8, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pankajastro left a comment

Choose a reason for hiding this comment

codecov bot commented Jun 14, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tatiana commented Jun 8, 2023 •

edited

Loading

codecov bot commented Jun 14, 2023 •

edited

Loading