New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added ability for Snowflake to attribute usage to Airflow by adding an application parameter #16420
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease. |
Static checks are failing. Can you please install pre-commit and fix those @sfc-gh-madkins (pre-commit will fix those problems automatically). I am about to make a June provider's release, so if you could fix it quickly it could make it into the release. |
Should this parameter be user controlled? In the case of Google, we have a fixed value set.
airflow/airflow/providers/google/cloud/hooks/bigquery.py Lines 157 to 162 in 8505d2f
Thanks to the version, we can also distinguish Vanila Airflow from Composer Airflow.
|
Go discussion on whether this should be a user-defined parameter that could
be over-written.
The idea was for managed airflow providers (Astronomer, AWS), they could
override this parameter for their installations of this repo. I guess they
could override an non user-defined parameter as well.
Any thoughts? I can talk to Ry over at Astronomer on this.
…On Mon, Jun 14, 2021 at 9:32 AM Kaxil Naik ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In airflow/providers/snowflake/hooks/snowflake.py
<#16420 (comment)>:
> @@ -143,6 +143,7 @@ def __init__(self, *args, **kwargs) -> None:
self.schema = kwargs.pop("schema", None)
self.authenticator = kwargs.pop("authenticator", None)
self.session_parameters = kwargs.pop("session_parameters", None)
+ self.application = kwargs.pop("application")
⬇️ Suggested change
- self.application = kwargs.pop("application")
+ self.application = kwargs.pop("application", None)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16420 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ATSRCUZ5DT2YGSVVNFXNO6LTSYHGJANCNFSM46TNNUEQ>
.
|
I think it would be great to be able to override it without changing neither DAGs, nor provider code. but adding version of Airflow might be a good idea indeed. @sfc-gh-madkins is this label somehow standardized on Snowflake side, or is it ok to add + version.version as it is in case of Google Providers? Ultimately maybe this will do?:
|
We will want to keep the Airflow tag standardized, so I would not want to
include the version.
Ultimately, this is what we need the code to look like when we call to make
the connection:
con = snowflake.connector.connect (
user='XXXX',
password='XXXX',
account='XXXX',
application='Airflow', ...
)
If we want to remove the user-defined part of it. It should just be one
line of code.
…On Mon, Jun 14, 2021 at 10:30 AM Jarek Potiuk ***@***.***> wrote:
Go discussion on whether this should be a user-defined parameter that
could be over-written. The idea was for managed airflow providers
(Astronomer, AWS), they could override this parameter for their
installations of this repo. I guess they could override an non user-defined
parameter as well. Any thoughts? I can talk to Ry over at Astronomer on
this.
I think it would be great to be able to override it without changing
neither DAGs, nor provider code. but adding version of Airflow might be a
good idea indeed. @sfc-gh-madkins <https://github.com/sfc-gh-madkins> is
this label somehow standardized on Snowflake side, or is it ok to add +
version.version as it is in case of Google Providers?
Ultimately maybe this will do?:
os.enviroment.get('_SNOWFLAKE_AIRFLOW_LABEL', "AIRFLOW" + version.version)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16420 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ATSRCU6PUBOIU4F2Y6NBZZTTSYOBXANCNFSM46TNNUEQ>
.
|
SGTM |
Fine for me. Happy to merge it as is before I start releasing providers today :) |
Are we sure we want it to be an operator parameter? I think it should be a connection configuration parameter or fixed constant. Now we force user to make changes in Python code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, it should be connection parameter, environment parameter or fixed constant (preferred by me). Now we require changes to the DAG file, so very few people will be using this feature.
Good point, I like the idea of Connection better -- that way we can leverage environment Variables too |
It's added to Hook (and bubbled up to Operator). I think indeed it would be better to get it as extra in connection with 'AIRFLOW' default. WDYT @sfc-gh-madkins ? |
Yes -- The hook is where it should happen where the param is unchangeable
by the consumer. For managed providers, they can then override it when they
go to push out their version of Airflow.
…On Mon, Jun 14, 2021 at 12:01 PM Jarek Potiuk ***@***.***> wrote:
I think, it should be connection parameter, environment parameter or fixed
constant (preferred by me). Now we require changes to the DAG file, so very
few people will be using this feature.
It's added to Hook (and bubbled up to Operator). I think indeed it would
be better to get it as extra in connection with 'AIRFLOW' default. WDYT
@sfc-gh-madkins <https://github.com/sfc-gh-madkins> ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16420 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ATSRCU4XEVCOHPEW23SGTETTSYYU3ANCNFSM46TNNUEQ>
.
|
Can you change it please then @sfc-gh-madkins ? because the default is now in the Operator. |
Code updated to just pass the application parameter statically in the
conn_config object
…On Mon, Jun 14, 2021 at 1:19 PM Jarek Potiuk ***@***.***> wrote:
Can you change it please then @sfc-gh-madkins
<https://github.com/sfc-gh-madkins> ? because the default is now in the
Operator.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16420 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ATSRCUYYR3X6GVJWRZMKYBLTSZB2JANCNFSM46TNNUEQ>
.
|
@@ -179,6 +179,7 @@ def _get_conn_params(self) -> Dict[str, Optional[str]]: | |||
"role": self.role or role, | |||
"authenticator": self.authenticator or authenticator, | |||
"session_parameters": self.session_parameters or session_parameters, | |||
"application": "AIRFLOW", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping we could use: AIRFLOW_SNOWFLAKE_PARTNER
or a similar environment variable so even the partners don't need to change the provider, what do you think?
Example:
"application": "AIRFLOW", | |
"application": os.environ.get("AIRFLOW_SNOWFLAKE_PARTNER", "AIRFLOW"), |
So it default to AIRFLOW
and your partners just need to add that Environment variable in their image, example:
AIRFLOW_SNOWFLAKE_PARTNER=AWS
or
AIRFLOW_SNOWFLAKE_PARTNER=Astronomer
or
AIRFLOW_SNOWFLAKE_PARTNER=GCP
That does make sense. I like that.
…On Mon, Jun 14, 2021 at 5:18 PM Kaxil Naik ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In airflow/providers/snowflake/hooks/snowflake.py
<#16420 (comment)>:
> @@ -179,6 +179,7 @@ def _get_conn_params(self) -> Dict[str, Optional[str]]:
"role": self.role or role,
"authenticator": self.authenticator or authenticator,
"session_parameters": self.session_parameters or session_parameters,
+ "application": "AIRFLOW",
I was hoping we could use: AIRFLOW_SNOWFLAKE_PARTNER or a similar
environment variable so even the partners don't need to change the
provider, what do you think?
Example:
⬇️ Suggested change
- "application": "AIRFLOW",
+ "application": os.environ.get("AIRFLOW_SNOWFLAKE_PARTNER", "AIRFLOW"),
So it default to AIRFLOW and your partners just need to add that
Environment variable in their image, example:
AIRFLOW_SNOWFLAKE_PARTNER=AWS
or
AIRFLOW_SNOWFLAKE_PARTNER=Astronomer
or
AIRFLOW_SNOWFLAKE_PARTNER=GCP
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16420 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ATSRCU4YQ7M6ND54UGGU333TSZ54NANCNFSM46TNNUEQ>
.
|
Awesome work, congrats on your first merged pull request! |
Fixes problem introduced in apache#16420
Fixes problem introduced in #16420
Support was added in Snowflake provider: apache/airflow#16420 to attribute usage to Partners. This PR/commit will adds this Env var for Astronomer <-> Snowflake partnership
Support was added in Snowflake provider: apache/airflow#16420 to attribute usage to Partners. This PR/commit will adds this Env var for Astronomer <-> Snowflake partnership
Added the ability for Snowflake to track usage coming for the Snowflake Operator by adding an "application" parameter that is passed through the snowflake-python-connector. This application parameter is able to be over-written by managed airflow providers allowing them to get credit for directing usage to Snowflake. This is a critical element for tiering up in Snowflake's Partner Program.