Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ability for Snowflake to attribute usage to Airflow by adding an application parameter #16420

Merged
merged 8 commits into from Jun 15, 2021
Merged

Conversation

sfc-gh-madkins
Copy link
Contributor

@sfc-gh-madkins sfc-gh-madkins commented Jun 13, 2021

Added the ability for Snowflake to track usage coming for the Snowflake Operator by adding an "application" parameter that is passed through the snowflake-python-connector. This application parameter is able to be over-written by managed airflow providers allowing them to get credit for directing usage to Snowflake. This is a critical element for tiering up in Snowflake's Partner Program.

@boring-cyborg boring-cyborg bot added area:providers provider:snowflake Issues related to Snowflake provider labels Jun 13, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Jun 13, 2021

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, pylint and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Jun 13, 2021
@github-actions
Copy link

The PR is likely OK to be merged with just subset of tests for default Python and Database versions without running the full matrix of tests, because it does not modify the core of Airflow. If the committers decide that the full tests matrix is needed, they will add the label 'full tests needed'. Then you should rebase to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

@potiuk
Copy link
Member

potiuk commented Jun 13, 2021

Static checks are failing. Can you please install pre-commit and fix those @sfc-gh-madkins (pre-commit will fix those problems automatically). I am about to make a June provider's release, so if you could fix it quickly it could make it into the release.

@mik-laj
Copy link
Member

mik-laj commented Jun 13, 2021

Should this parameter be user controlled? In the case of Google, we have a fixed value set.

client_info = ClientInfo(client_library_version='airflow_v' + version.version)

return Client(
client_info=self.client_info,
project=project_id,
location=location,
credentials=self._get_credentials(),
)

Thanks to the version, we can also distinguish Vanila Airflow from Composer Airflow.

version = '1.10.1-composer'

@sfc-gh-madkins
Copy link
Contributor Author

sfc-gh-madkins commented Jun 14, 2021 via email

@potiuk
Copy link
Member

potiuk commented Jun 14, 2021

Go discussion on whether this should be a user-defined parameter that could be over-written. The idea was for managed airflow providers (Astronomer, AWS), they could override this parameter for their installations of this repo. I guess they could override an non user-defined parameter as well. Any thoughts? I can talk to Ry over at Astronomer on this.

I think it would be great to be able to override it without changing neither DAGs, nor provider code. but adding version of Airflow might be a good idea indeed. @sfc-gh-madkins is this label somehow standardized on Snowflake side, or is it ok to add + version.version as it is in case of Google Providers?

Ultimately maybe this will do?:

os.enviroment.get('_SNOWFLAKE_AIRFLOW_LABEL', "AIRFLOW" + version.version) 

@sfc-gh-madkins
Copy link
Contributor Author

sfc-gh-madkins commented Jun 14, 2021 via email

@mik-laj
Copy link
Member

mik-laj commented Jun 14, 2021

If we want to remove the user-defined part of it. It should just be one
line of code.

SGTM

@potiuk
Copy link
Member

potiuk commented Jun 14, 2021

Fine for me. Happy to merge it as is before I start releasing providers today :)

@mik-laj
Copy link
Member

mik-laj commented Jun 14, 2021

Are we sure we want it to be an operator parameter? I think it should be a connection configuration parameter or fixed constant. Now we force user to make changes in Python code.

Copy link
Member

@mik-laj mik-laj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, it should be connection parameter, environment parameter or fixed constant (preferred by me). Now we require changes to the DAG file, so very few people will be using this feature.

@kaxil
Copy link
Member

kaxil commented Jun 14, 2021

Are we sure we want it to be an operator parameter? I think it should be a connection configuration parameter or fixed constant. Now we force user to make changes in Python code.

Good point, I like the idea of Connection better -- that way we can leverage environment Variables too

@potiuk
Copy link
Member

potiuk commented Jun 14, 2021

I think, it should be connection parameter, environment parameter or fixed constant (preferred by me). Now we require changes to the DAG file, so very few people will be using this feature.

It's added to Hook (and bubbled up to Operator). I think indeed it would be better to get it as extra in connection with 'AIRFLOW' default. WDYT @sfc-gh-madkins ?

@sfc-gh-madkins
Copy link
Contributor Author

sfc-gh-madkins commented Jun 14, 2021 via email

@potiuk
Copy link
Member

potiuk commented Jun 14, 2021

Can you change it please then @sfc-gh-madkins ? because the default is now in the Operator.

@sfc-gh-madkins
Copy link
Contributor Author

sfc-gh-madkins commented Jun 14, 2021 via email

@mik-laj mik-laj requested review from potiuk and kaxil June 14, 2021 22:14
@@ -179,6 +179,7 @@ def _get_conn_params(self) -> Dict[str, Optional[str]]:
"role": self.role or role,
"authenticator": self.authenticator or authenticator,
"session_parameters": self.session_parameters or session_parameters,
"application": "AIRFLOW",
Copy link
Member

@kaxil kaxil Jun 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping we could use: AIRFLOW_SNOWFLAKE_PARTNER or a similar environment variable so even the partners don't need to change the provider, what do you think?

Example:

Suggested change
"application": "AIRFLOW",
"application": os.environ.get("AIRFLOW_SNOWFLAKE_PARTNER", "AIRFLOW"),

So it default to AIRFLOW and your partners just need to add that Environment variable in their image, example:

AIRFLOW_SNOWFLAKE_PARTNER=AWS

or

AIRFLOW_SNOWFLAKE_PARTNER=Astronomer

or

AIRFLOW_SNOWFLAKE_PARTNER=GCP

@sfc-gh-madkins
Copy link
Contributor Author

sfc-gh-madkins commented Jun 15, 2021 via email

@potiuk potiuk merged commit 643e46c into apache:main Jun 15, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Jun 15, 2021

Awesome work, congrats on your first merged pull request!

potiuk added a commit to potiuk/airflow that referenced this pull request Jun 15, 2021
potiuk added a commit that referenced this pull request Jun 15, 2021
kaxil added a commit to astronomer/ap-airflow that referenced this pull request Jun 24, 2021
Support was added in Snowflake provider: apache/airflow#16420 to attribute usage to Partners.

This PR/commit will adds this Env var for Astronomer <-> Snowflake partnership
kaxil added a commit to astronomer/ap-airflow that referenced this pull request Jun 25, 2021
Support was added in Snowflake provider: apache/airflow#16420 to attribute usage to Partners.

This PR/commit will adds this Env var for Astronomer <-> Snowflake partnership
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers okay to merge It's ok to merge this PR as it does not require more tests provider:snowflake Issues related to Snowflake provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants