Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Snowflake QueryModifier issue #1962

Merged
merged 6 commits into from
Aug 7, 2023
Merged

Conversation

tatiana
Copy link
Collaborator

@tatiana tatiana commented Jun 8, 2023

Add a DAG to illustrate how users can set query_tags in Snowflake using the Python SDK.

Before this PR, it would fail with:

E       sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 000008 (0A000): 01acd6d5-0607-a92d-0000-68213eeeebda: Actual statement count 2 did not match the desired statement count 1.
E       [SQL: ALTER SESSION SET query_tag='not_guinea_pig';;CREATE TABLE IF NOT EXISTS SANDBOX.ASTROFLOW_CI._tmp_ksimu2ab9s9kbatbtrexodw6q9alcskhfcumxpxuvi2ir60ew81a3st3k AS SELECT *
E           FROM IDENTIFIER(%(input_table)s) WHERE type NOT LIKE 'Guinea Pig'
E           ]
E       [parameters: ***'input_table': 'SANDBOX.ASTROFLOW_CI._tmp_tddzk0vrcf3lbt1nxucnv3ejyx1ds3su84fvdqpoa4p69yguysppti2mo'***]
E       (Background on this error at: https://sqlalche.me/e/14/f405)

Note: the failing Redshift tests are unrelated to the change introduced in this PR. They are being investigated by PR #1959 . So far, they don't happen when we use a newer version of Redshift, they don't happen, but the costs are higher. Therefore, I suggest we open an exception and merge this PR disregarding them.

@tatiana tatiana changed the title Change an example DAG to illustrate the usage of QueryModifier Fix QueryModifier issue Jun 9, 2023
@tatiana tatiana marked this pull request as ready for review June 9, 2023 15:24
result = self.connection.execute(sql, parameters)

for sql_query in query_modifier.pre_queries:
_ = self.run_single_sql_query(sql_query, parameters)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tatiana AFAIK query tags can also be used to track the usage of a particular query. Executing them separately will make the tracking difficult right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@utkarsharma2 That's a very good question. Our query tags support was designed focused on Snowflake, and the expectation is that the users would give ALTER SESSION statements which would be executed before the main transform / run_raw_query.

If we only had one ALTER SESSION statement and ran the query of interest, this approach would be safe since it would be only for that broader query. However, since ATM, the user can potentially give multiple queries to be run before, and after the main statement, all of them would be labelled the same way, which can be prone to errors.

We can also look into how other databases support query tags - I wonder if it already works for Postgres and MySQL since they rely only on comments. I'll try it out.

For now, I suggest we focus on the customer issue of running the feature as it was originally designed - and review the overall approach in a separate PR/ticket, how do you feel about this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking if it is worth mentioning in the docs string how we run the query and if you have multiple queries then how metrics will be impacted wdyt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tatiana I understand to the naming is misleading since QueryModifier means something else generally. Would it make sense to change it or isolate it Snowflake?

I won't consider this as a blocker since it's a pressing issue.

@tatiana tatiana changed the title Fix QueryModifier issue Fix Snowflake QueryModifier issue Jun 9, 2023
Copy link
Contributor

@pankajastro pankajastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codecov
Copy link

codecov bot commented Jun 14, 2023

Codecov Report

Patch coverage: 82.35% and project coverage change: -1.11% ⚠️

Comparison is base (bfc8daa) 90.84% compared to head (bcf14fa) 89.74%.
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1962      +/-   ##
==========================================
- Coverage   90.84%   89.74%   -1.11%     
==========================================
  Files          72       72              
  Lines        4250     4261      +11     
  Branches      511      514       +3     
==========================================
- Hits         3861     3824      -37     
- Misses        302      341      +39     
- Partials       87       96       +9     
Flag Coverage Δ
PythonSDK 89.74% <82.35%> (-1.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
python-sdk/src/astro/sql/operators/cleanup.py 90.08% <50.00%> (-1.52%) ⬇️
python-sdk/src/astro/databases/base.py 87.79% <85.71%> (-4.45%) ⬇️
python-sdk/src/astro/__init__.py 100.00% <100.00%> (ø)

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pankajastro pankajastro merged commit 894ac85 into main Aug 7, 2023
31 of 32 checks passed
@pankajastro pankajastro deleted the query-tag-integration-test branch August 7, 2023 17:19
utkarsharma2 pushed a commit that referenced this pull request Aug 8, 2023
Add a DAG to illustrate how users can set `query_tags` in Snowflake
using the Python SDK.

Before this PR, it would fail with:
```
E       sqlalchemy.exc.ProgrammingError: (snowflake.connector.errors.ProgrammingError) 000008 (0A000): 01acd6d5-0607-a92d-0000-68213eeeebda: Actual statement count 2 did not match the desired statement count 1.
E       [SQL: ALTER SESSION SET query_tag='not_guinea_pig';;CREATE TABLE IF NOT EXISTS SANDBOX.ASTROFLOW_CI._tmp_ksimu2ab9s9kbatbtrexodw6q9alcskhfcumxpxuvi2ir60ew81a3st3k AS SELECT *
E           FROM IDENTIFIER(%(input_table)s) WHERE type NOT LIKE 'Guinea Pig'
E           ]
E       [parameters: ***'input_table': 'SANDBOX.ASTROFLOW_CI._tmp_tddzk0vrcf3lbt1nxucnv3ejyx1ds3su84fvdqpoa4p69yguysppti2mo'***]
E       (Background on this error at: https://sqlalche.me/e/14/f405)
```

Note: the failing Redshift tests are unrelated to the change introduced
in this PR. They are being investigated by PR #1959 . So far, they don't
happen when we use a newer version of Redshift, they don't happen, but
the costs are higher. Therefore, I suggest we open an exception and
merge this PR disregarding them.
@aql.transform(assume_schema_exists=True)
@aql.transform(
assume_schema_exists=True,
query_modifier=QueryModifier(pre_queries=["ALTER SESSION SET query_tag='not_guinea_pig';"]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tatiana We ran this DAG on Astro but like are still getting this error

Actual statement count 2 did not match the desired statement count 1.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pankajastro, as we spoke in the call, the issue seems that the Astro CI is pointing to an old version of the Python SDK. The stack trace had:

  File "/home/astro/.local/lib/python3.10/site-packages/astro/databases/base.py", line 141, in run_sql
    result = self.connection.execute(

Which does not match this version (or the main branch):
https://github.com/astronomer/astro-sdk/blob/main/python-sdk/src/astro/databases/base.py#L106-L128

Please, let me know if that's not the issue and we can investigate further#1962

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, @tatiana, Look like the problem is in the way we ran the test but yet needs to be confirmed. I ran dag on my local setup and it ran fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants