Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds secret backend and logging information to provider yaml #17625

Merged
merged 1 commit into from
Aug 17, 2021

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Aug 15, 2021

This is preparatory work to automatically generate summary of
secret backends and logging handlers for all providers.

It adds logging/secret-backends section in the provider.yaml
and refactors the code a little to extract common initialization
routines to a decorator.

The provider manager is also simplified by removing the
unnecessary _add methods.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@potiuk potiuk requested a review from turbaszek as a code owner August 15, 2021 17:44
@boring-cyborg boring-cyborg bot added area:CLI area:providers provider:microsoft-azure Azure-related issues provider:google Google (including GCP) related issues labels Aug 15, 2021
@potiuk
Copy link
Member Author

potiuk commented Aug 15, 2021

@dimberman - that one might also help to simplify your Provider Manager's change from #15330

@potiuk potiuk force-pushed the add-secrets-logging-to-providers branch from d3c2cbd to 7d4fbbe Compare August 15, 2021 17:51
@potiuk potiuk requested a review from kaxil August 15, 2021 17:57
airflow/cli/cli_parser.py Outdated Show resolved Hide resolved
airflow/cli/cli_parser.py Outdated Show resolved Hide resolved
airflow/providers_manager.py Outdated Show resolved Hide resolved
airflow/providers_manager.py Outdated Show resolved Hide resolved
Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@potiuk potiuk force-pushed the add-secrets-logging-to-providers branch from 7d4fbbe to 8300247 Compare August 15, 2021 19:20
@potiuk potiuk requested a review from ashb as a code owner August 15, 2021 19:20
@potiuk
Copy link
Member Author

potiuk commented Aug 15, 2021

Can (Should ?) we add api-auth backends too ?

https://airflow.apache.org/docs/apache-airflow-providers-google/stable/api-auth-backend/google-openid.html

Added, the secrets -> secrets name changed as well. (BTW - there are a number of other places where Secret Backend is also used rather than Secrets Backend so it should likely be fixed there)

@potiuk potiuk force-pushed the add-secrets-logging-to-providers branch from 8300247 to e09ce9a Compare August 15, 2021 19:55
@mik-laj
Copy link
Member

mik-laj commented Aug 15, 2021

I am not sure if I understand what we want to achieve in this change and if we really have adopted the correct data structure for what we want to achieve. How are the changes to Airflow CLI and core to improve our documentation?

Previously, adding new information to provider_info/ProviderManager was always related to adding new features or as required by making existing features work. I considered the ability to view information using the CLI as features that only facilitated troubleshooting, but was never the main goal.

Currently, we have the following keys in providers_info.schema.json on the main branch.
description -> Makes it easier to identify the package.
extra-links -> Required by extra links to prevent loading unknown classes. Only registered classes can create extra links.
hook-class-names -> Required by Connection UI.
name -> Makes it easier to identify the package.
package-name -> Makes it easier to identify the package.

Could you please give the main purpose why we want to add this information to provider_info? Should we also add information about operators, sensors, hooks to provider_info? How is Airflow going to behave when the class is not added to provider_info but is used by Airflow? If your main goal is to prepare a list of auth_backend/secrets-backend/task-handlers in our documentation. I think it is enough for us to just add these keys to provider.yaml like operators, hooks, sensors, integrations, without adding to provider_info.

@potiuk
Copy link
Member Author

potiuk commented Aug 16, 2021

This is not documentation change (yet). It's preparatory work and the main purpose is to prepare a common interface and communicatoin to both "community" providers developers as well as (and this is more important goal) to "custom providers writers" who get the tool to know about what and how they can customise in their own providers.

Eventually "get_provider_info" provides summarized, queryeable, discoverable information about all things provider "provides" - not only those that require some "deep" integration,

The main purpose is to get one place with information on capabilities of providers. This is purely informational one telling "this provider provides those features". It helps with discovery (again) if someone uses "providers" command (or in the future web ui). This is a good way to show what custom providers can deliver by exposing features of providers not only in the documentation, but also in the actual airlfow CLI, "customise providers" documentation, and webui. One of the best way to teach the users how they can add their own providers and expose such functionalities in similar way.

Additional (more important) feature is that during our tests we are importing the classes exposed in provider.yaml and checking if they are properly configured (which also happens in runtime). This makes it easy to reason about any kind of typos you might have and you will see them early (as I did and fixed it while adding it).

We are just about to add @taskgroups in the same way.

This is about it.

@potiuk
Copy link
Member Author

potiuk commented Aug 16, 2021

Documentation for that one (including the summary pages in documentation) and update to "How to write your Custom Provider" to include logging, secret backends and API authentication backends and much deeper interlinking between Airflow "features" and "Communtiy Providers" summary providing those featuers, will follow as the next PR.

@mik-laj
Copy link
Member

mik-laj commented Aug 16, 2021

You previously added provider info schema to separate information needed only for building documentation from that required for Airflow operation. #13488
Has anything changed in this regard? Do we want to eliminate this division? What if someone does not define a class in provider_info and wants to use it? Should we then block it?

Additional (more important) feature is that during our tests we are importing the classes exposed in provider.yaml and checking if they are properly configured (which also happens in runtime). This makes it easy to reason about any kind of typos you might have and you will see them early (as I did and fixed it while adding it).

We have pre-commit checks for testing provider.yaml files, so we can achieve similar featture without any changes to Airflow. Additionally, these tests check that all integration have been added to the provider.yaml files.
https://github.com/apache/airflow/blob/main/scripts/ci/pre_commit/pre_commit_check_provider_yaml_files.py

@potiuk
Copy link
Member Author

potiuk commented Aug 16, 2021

You previously added provider info schema to separate information needed only for building documentation from that required for Airflow operation. #13488

Glad that you reminded me, I should have updated this as well. I will fix it in a moment.

Has anything changed in this regard? Do we want to eliminate this division? What if someone does not define a class in provider_info and wants to use it?

Nothing changed, I've forgotten to update it, simply. There is no blocking of any sorts. All the schema options in those schemas are optional.

We have pre-commit checks for testing provider.yaml files, so we can achieve similar featture without any changes to Airflow. Additionally, these tests check that all integration have been added to the provider.yaml files.

And I will also add it while adding the documentation, no doubt about it. The pre-commit will not work for custom providers, the check is really not for ".yaml" but for "get_provider_info" output (those very same checks are run in our CI when we test providers).

@potiuk
Copy link
Member Author

potiuk commented Aug 16, 2021

Glad that you reminded me, I should have updated this as well. I will fix it in a moment.

Ah no. I did not forget - I've already added it this weekend :)

https://github.com/apache/airflow/pull/17625/files#diff-2d9d90ece6c5a0bc080e0e20bf6a00534c7b642c2ef9a5b83f66894052913c4bR30

@potiuk
Copy link
Member Author

potiuk commented Aug 16, 2021

What do you think - others? I think this is really, really useful to be able to see what kind of secret backends we have extra, what logging handlers, what authentication APIS we have. While I agree listing all operators/hooks/sensor for provider makes no sense, listing the "core feature extensions" of Airflow IMHO is worth to be queryable at runtime (and verifiable). For me this is again - disciverability problem (same as with docs) rather that "let the users figure it out and search on their own".

There is big difference IMHO when you want to just integrate operators/sensors/hooks of a given service, comparing to when you want to extend a "built-in" airflow capability with a given integration. Those are even useful for different kind of users - logging/auth/seceret backends are for Devops/Operations people, where Operators/Hooks/Sensors are for Dag writers. Those are different audiences - CLI is there for the Devops, they will use to to query for different kinds of things and I find it would be super-useful to see it next to "extra links", "connections" - which are all managed not by DAG writers but by DevOps.

I believe we should not look at it from the "airflow developer" point of view where we treat Operators/Hooks/Sensors vs. Auth/Secrets/Logging as similar "beings" inside "provider" package (python constructs) and literally "the same kind of thing".

But I think we should look at it from the side of the user - IMHO even if "Operators/Hooks/Sensors vs Auth/Secrets/Logging" are packaged together in a provider, they have conceptually completely different scope, they are used differently and they have different audience (and we should treat them differently),

@mik-laj
Copy link
Member

mik-laj commented Aug 16, 2021

It makes sense to me. We can add these commands, but I am not sure if it will be informative enough. The class name is not enough to configure the integrative ones. The user will still need to search for more information in our documentation.

@potiuk
Copy link
Member Author

potiuk commented Aug 16, 2021

It makes sense to me. We can add these commands, but I am not sure if it will be informative enough. The class name is not enough to configure the integrative ones. The user will still need to search for more information in our documentation.

This is fine. The whole point is to make the user aware that there is "something" that they should look for. That's the 'discoverability" I mentioned before.

That's why I think this is the very same problem as lack of information in "logging" docs" that there are some "community loggers" .

This is preparatory work to automatically generate summary of
secret backends, logging handlers, API auth backends
for all providers.

It adds logging/secrets-backends/auth-backends sections in the
provider.yaml and refactors the code a little to extract common
initialization routines to a decorator as well as common sanity
check that imports classes/modules and checks if community
providers are using correct package prefix.

The provider manager is also simplified by removing the
unnecessary `_add` methods.
@potiuk potiuk force-pushed the add-secrets-logging-to-providers branch from e09ce9a to 21ab6fd Compare August 16, 2021 19:24
@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Aug 16, 2021
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:CLI area:providers full tests needed We need to run full set of tests for this PR to merge provider:google Google (including GCP) related issues provider:microsoft-azure Azure-related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants