-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import Hooks lazily individually in providers manager #17682
Conversation
4eb5e34
to
6de95d7
Compare
5ebe8b8
to
3d02cd1
Compare
3d02cd1
to
484776b
Compare
This change implements lazy loading of individual hooks for providers manager. First the hooks list is discovered by the manager when hooks are accessed, but the hooks are not immediately imported - the hooks initially keep just a callable that will be used to retrieve the hook when first accessed. Besides listing details of all hooks, all Hooks are only imported when we want to retrieve the list of available field behaviours and widgets (which only happens in webserver and should happen anyway whenever one of those are needed because they are all collectively used in the connection view). In the case when hooks are accessed in tasks (connection.get_hook()) only the individual Hooks are imported when accessed. The chand deprecates 'hook-class-names' json-schema and replaces it with 'connection-types' because we need to know connection-type for each HookClass name declaratively so that we can utilse it in connection.get_hook() mehtod (otherwise we do not know which Hooks conrrespond to which connection type without importing them. The change is backwards compatible. It adds deprecation warnings in case providers use the 'hook-class-names' property only and log warnings in case it provides both `hook-class-names` and `connection-types` but there are inconsistencies between those. Part of this change is also to fix some inconsistencies found when all hooks were added to connection-types arrays, which make potetntially backwards-incompatible changes to Google Provider where some hooks were useing `google_cloud_default` name for default_connection_type, but they were in fact using different, specialized Hook.
484776b
to
73ad7fe
Compare
Seems good to go for review :) |
airflow/provider.yaml.schema.json
Outdated
} | ||
}, | ||
"connection-types": { | ||
"type": "array", | ||
"description": "Map of connection types mapped to hook class names", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"description": "Map of connection types mapped to hook class names", | |
"description": "List of connection types mapped to hook class names", |
or
"description": "Map of connection types mapped to hook class names", | |
"description": "Array of connection types mapped to hook class names", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@potiuk Was there a reason you have this as a list of dicts rather than just a map as originally documented?
i.e. instead of this currently
connection-types:
- hook-class-name: airflow.providers.airbyte.hooks.airbyte.AirbyteHook
connection-type: airbyte
why didn't we have
connection-types:
airbyte: hook-class-name: airflow.providers.airbyte.hooks.airbyte.AirbyteHook
I'm only now looking at this PR (sorry!) now to take over Daniel's @task.docker
decorator work. and I'd like to follow the similar same pattern here.
I guess this is farily set in stone anyway as the providers with this pattern have been released haven't they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Map was really a copy&paste victim not intention.
It could also work with Map - because we have no possibility to duplicate connection_type. But map does not give us much here, the uniqueness has to be checked anyway for every connection type, because it is cross-provider and we just drop any connection_type that we already have registered.
But one reason why it is "better" is how jsonschema works nicer with hints and validation when you edit it.
Compare:
"connection-types": {
"type": "array",
"description": "Array of connection types mapped to hook class names",
"items": {
"type": "object",
"properties": {
"connection-type": {
"description": "Type of connection defined by the provider",
"type": "string"
},
"hook-class-name": {
"description": "Hook class name that implements the connection type",
"type": "string"
}
}
},
with (that's how it would look like if we use map:
"additional-extras": {
"type": "object",
"description": "Additional extras that the provider should have"
},
I have my provider.yaml mapped to the schema in IntelliJ and it does not only display the hints but also it validates if the fields are of the "string" type. It's a bit trivial, but It makes me want to be a bit more verbose to give a better hint/validation to users (and autocomplete works too)
Also I think it reads a bit better when you have array of (effectitvely) named tuples. It's more read-optimised than write-optimised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I think the "additional extras" was mistake BTW. It would be much better if I added it similar way as connection types.
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
Co-authored-by: Kaxil Naik <kaxilnaik@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update https://github.com/apache/airflow/blob/main/docs/apache-airflow-providers/howto/create-update-providers.rst too -- but otherwise LGTM
Yeah. I want to do separate doc change where i will extract all those into separate pages in the documentation of providers a s do interlinking with the main docs. |
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
This change implements lazy loading of individual hooks for providers
manager. First the hooks list is discovered by the manager when
hooks are accessed, but the hooks are not immediately
imported - the hooks initially keep just a callable that will
be used to retrieve the hook when first accessed.
Besides listing details of all hooks, all Hooks are only imported when we
want to retrieve the list of available field behaviours and widgets
(which only happens in webserver and should happen anyway whenever one
of those are needed because they are all collectively used in the
connection view).
In the case when hooks are accessed in tasks
(connection.get_hook()) only the individual Hooks are
imported when accessed.
The change deprecates 'hook-class-names' json-schema and replaces it
with 'connection-types' because we need to know connection-type
for each HookClass name declaratively so that we can utilse
it in connection.get_hook() mehtod (otherwise we do not know
which Hooks conrrespond to which connection type without importing
them).
The change is backwards compatible. It adds deprecation
warnings in case providers use the 'hook-class-names' property
only and log warnings in case it provides both
hook-class-names
and
connection-types
but there are inconsistencies betweenthose.
Part of this change is also to fix some inconsistencies found
when all hooks were added to connection-types arrays, which
make potetntially backwards-incompatible changes to Google Provider
where some hooks were useing
google_cloud_default
name fordefault_connection_type, but they were in fact using different,
specialized Hook.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.