Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import Hooks lazily individually in providers manager #17682

Merged
merged 4 commits into from
Aug 19, 2021

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Aug 18, 2021

This change implements lazy loading of individual hooks for providers
manager. First the hooks list is discovered by the manager when
hooks are accessed, but the hooks are not immediately
imported - the hooks initially keep just a callable that will
be used to retrieve the hook when first accessed.

Besides listing details of all hooks, all Hooks are only imported when we
want to retrieve the list of available field behaviours and widgets
(which only happens in webserver and should happen anyway whenever one
of those are needed because they are all collectively used in the
connection view).

In the case when hooks are accessed in tasks
(connection.get_hook()) only the individual Hooks are
imported when accessed.

The change deprecates 'hook-class-names' json-schema and replaces it
with 'connection-types' because we need to know connection-type
for each HookClass name declaratively so that we can utilse
it in connection.get_hook() mehtod (otherwise we do not know
which Hooks conrrespond to which connection type without importing
them).

The change is backwards compatible. It adds deprecation
warnings in case providers use the 'hook-class-names' property
only and log warnings in case it provides both hook-class-names
and connection-types but there are inconsistencies between
those.

Part of this change is also to fix some inconsistencies found
when all hooks were added to connection-types arrays, which
make potetntially backwards-incompatible changes to Google Provider
where some hooks were useing google_cloud_default name for
default_connection_type, but they were in fact using different,
specialized Hook.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg bot added area:CLI area:dev-tools provider:cncf-kubernetes Kubernetes provider related issues area:providers provider:Apache provider:amazon-aws AWS/Amazon - related issues provider:google Google (including GCP) related issues labels Aug 18, 2021
@potiuk potiuk requested review from ashb and dimberman August 18, 2021 12:12
@potiuk potiuk force-pushed the lazy-loading-individual-hooks branch from 4eb5e34 to 6de95d7 Compare August 18, 2021 12:14
BREEZE.rst Outdated Show resolved Hide resolved
@potiuk potiuk force-pushed the lazy-loading-individual-hooks branch 5 times, most recently from 5ebe8b8 to 3d02cd1 Compare August 18, 2021 14:17
@potiuk potiuk force-pushed the lazy-loading-individual-hooks branch from 3d02cd1 to 484776b Compare August 18, 2021 16:45
This change implements lazy loading of individual hooks for providers
manager. First the hooks list is discovered by the manager when
hooks are accessed, but the hooks are not immediately
imported - the hooks initially keep just a callable that will
be used to retrieve the hook when first accessed.

Besides listing details of all hooks, all Hooks are only imported when we
want to retrieve the list of available field behaviours and widgets
(which only happens in webserver and should happen anyway whenever one
of those are needed because they are all collectively used in the
connection view).

In the case when hooks are accessed in tasks
(connection.get_hook()) only the individual Hooks are
imported when accessed.

The chand deprecates 'hook-class-names' json-schema and replaces it
with 'connection-types' because we need to know connection-type
for each HookClass name declaratively so that we can utilse
it in connection.get_hook() mehtod (otherwise we do not know
which Hooks conrrespond to which connection type without importing
them.

The change is backwards compatible. It adds deprecation
warnings in case providers use the 'hook-class-names' property
only and log warnings in case it provides both `hook-class-names`
and `connection-types` but there are inconsistencies between
those.

Part of this change is also to fix some inconsistencies found
when all hooks were added to connection-types arrays, which
make potetntially backwards-incompatible changes to Google Provider
where some hooks were useing `google_cloud_default` name for
default_connection_type, but they were in fact using different,
specialized Hook.
@potiuk potiuk force-pushed the lazy-loading-individual-hooks branch from 484776b to 73ad7fe Compare August 18, 2021 18:31
@potiuk
Copy link
Member Author

potiuk commented Aug 18, 2021

Seems good to go for review :)

}
},
"connection-types": {
"type": "array",
"description": "Map of connection types mapped to hook class names",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"description": "Map of connection types mapped to hook class names",
"description": "List of connection types mapped to hook class names",

or

Suggested change
"description": "Map of connection types mapped to hook class names",
"description": "Array of connection types mapped to hook class names",

Copy link
Member

@ashb ashb Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@potiuk Was there a reason you have this as a list of dicts rather than just a map as originally documented?

i.e. instead of this currently

connection-types:
  - hook-class-name: airflow.providers.airbyte.hooks.airbyte.AirbyteHook
    connection-type: airbyte

why didn't we have

connection-types:
  airbyte: hook-class-name: airflow.providers.airbyte.hooks.airbyte.AirbyteHook

I'm only now looking at this PR (sorry!) now to take over Daniel's @task.docker decorator work. and I'd like to follow the similar same pattern here.

I guess this is farily set in stone anyway as the providers with this pattern have been released haven't they?

Copy link
Member Author

@potiuk potiuk Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Map was really a copy&paste victim not intention.

It could also work with Map - because we have no possibility to duplicate connection_type. But map does not give us much here, the uniqueness has to be checked anyway for every connection type, because it is cross-provider and we just drop any connection_type that we already have registered.

But one reason why it is "better" is how jsonschema works nicer with hints and validation when you edit it.

Compare:

    "connection-types": {
      "type": "array",
      "description": "Array of connection types mapped to hook class names",
      "items": {
          "type": "object",
          "properties": {
              "connection-type": {
                  "description": "Type of connection defined by the provider",
                  "type": "string"
              },
              "hook-class-name": {
                  "description": "Hook class name that implements the connection type",
                  "type": "string"
              }
          }
      },

with (that's how it would look like if we use map:

    "additional-extras": {
      "type": "object",
      "description": "Additional extras that the provider should have"
    },

I have my provider.yaml mapped to the schema in IntelliJ and it does not only display the hints but also it validates if the fields are of the "string" type. It's a bit trivial, but It makes me want to be a bit more verbose to give a better hint/validation to users (and autocomplete works too)

image

image

Also I think it reads a bit better when you have array of (effectitvely) named tuples. It's more read-optimised than write-optimised.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think the "additional extras" was mistake BTW. It would be much better if I added it similar way as connection types.

potiuk and others added 3 commits August 19, 2021 02:30
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@potiuk
Copy link
Member Author

potiuk commented Aug 19, 2021

Yeah. I want to do separate doc change where i will extract all those into separate pages in the documentation of providers a s do interlinking with the main docs.

@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Aug 19, 2021
@potiuk potiuk merged commit 76ed2a4 into apache:main Aug 19, 2021
@potiuk potiuk deleted the lazy-loading-individual-hooks branch August 19, 2021 14:18
@eladkal eladkal mentioned this pull request Aug 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:CLI area:dev-tools area:providers full tests needed We need to run full set of tests for this PR to merge provider:amazon-aws AWS/Amazon - related issues provider:cncf-kubernetes Kubernetes provider related issues provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants