Skip to content

Comments

Allow loading plugins on Airflow start-up#11596

Merged
kaxil merged 3 commits intoapache:masterfrom
astronomer:lazy-load-plugins
Oct 16, 2020
Merged

Allow loading plugins on Airflow start-up#11596
kaxil merged 3 commits intoapache:masterfrom
astronomer:lazy-load-plugins

Conversation

@kaxil
Copy link
Member

@kaxil kaxil commented Oct 16, 2020

0be7654 commit made an optimization where the plugin are lazy-loaded. However, there are use-cases where you would still want the plugins to be loaded on Airflow start-up.

This PR does not change the current behavior but just provides a way to disable lazy-loading


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

apache@0be7654 commit made an optimization where the plugin are lazy-loaded. However, there are use-cases where you would still want the plugins to be loaded on Airflow start-up.

This PR does not change the current behavior but just provides a way to disable lazy-loading
@kaxil kaxil requested a review from ashb October 16, 2020 18:52
@turbaszek turbaszek requested a review from mik-laj October 16, 2020 18:58
@kaxil kaxil added this to the Airflow 2.0.0-alpha2 milestone Oct 16, 2020
@mik-laj
Copy link
Member

mik-laj commented Oct 16, 2020

What are the use cases for this option? Why would the user want to set these options to true?

@mik-laj
Copy link
Member

mik-laj commented Oct 16, 2020

Can you add some docs to docs/plugins.rst? https://airflow.readthedocs.io/en/latest/plugins.html

@mjpieters
Copy link
Contributor

Looks like I'll need this switch?

We have dags that use plugin operators & sensors, and the scheduler fails to load our dags as the plugin is not loaded when processing each file.

So airflow dags list works, airflow dags show <dag_id> works, the webserver is able to show all the dags, but the import_error table shows that the scheduler (or rather, the DagFileProcessorProcess job), can't load our dags due to the missing plugins.

@mjpieters
Copy link
Contributor

TBH, the lack of plugin loading in the scheduler annoying. I can also add

from airflow import plugins_manager
plugins_manager.ensure_plugins_loaded()

to any DAG that needs to have access to plugin-provided operators, sensors or hooks, I suppose.

@kaxil
Copy link
Member Author

kaxil commented Oct 16, 2020

TBH, the lack of plugin loading in the scheduler annoying. I can also add

from airflow import plugins_manager
plugins_manager.ensure_plugins_loaded()

to any DAG that needs to have access to plugin-provided operators, sensors or hooks, I suppose.

Yeah, the flag in this PR should help you, having to add those import lines in all the DAGs is annoying.

@kaxil
Copy link
Member Author

kaxil commented Oct 16, 2020

Can you add some docs to docs/plugins.rst? https://airflow.readthedocs.io/en/latest/plugins.html

Added docs

@mjpieters
Copy link
Contributor

Another workaround is to just import these items directly, not from the airflow.(hooks|operators|sensors).<pluginname> namespace.

@kaxil
Copy link
Member Author

kaxil commented Oct 16, 2020

Another workaround is to just import these items directly, not from the airflow.(hooks|operators|sensors).<pluginname> namespace.

For hooks, operators and sensors I would recommend not using Plugin, instead treat them as Python modules and import them directly

@kaxil
Copy link
Member Author

kaxil commented Oct 16, 2020

What are the use cases for this option? Why would the user want to set these options to true?

For a good number of examples listed here (link1 and link2), it is ideal to load plugins at the startup. One of the other use-cases is to allow creating more tables when running airflow db upgrade via plugins for which it is needed that the plugin is loaded when Scheduler starts.

@mik-laj
Copy link
Member

mik-laj commented Oct 16, 2020

allow creating more tables when running airflow db upgrade via plugins for which it is needed that the plugin is loaded when Scheduler starts.

Can we not load plugins in this one case? It makes sense to me that plugins can have separate migrations, but then I think it's worth documenting too.

@kaxil
Copy link
Member Author

kaxil commented Oct 16, 2020

allow creating more tables when running airflow db upgrade via plugins for which it is needed that the plugin is loaded when Scheduler starts.

Can we not load plugins in this one case? It makes sense to me that plugins can have separate migrations, but then I think it's worth documenting too.

This is an optional feature, this does not change the current behavior. I have documented when users can load the plugin at startup of each Airflow process in 5d9f709. Do you think we need more docs? What specifically?

@mik-laj
Copy link
Member

mik-laj commented Oct 16, 2020

to any DAG that needs to have access to plugin-provided operators, sensors or hooks, I suppose.

We plan to end support for operators and Hooks plugins in the near future.
See:
#9506
#9500

@kaxil
Copy link
Member Author

kaxil commented Oct 16, 2020

to any DAG that needs to have access to plugin-provided operators, sensors or hooks, I suppose.

We plan to end support for operators and Hooks plugins in the near future.
See:
#9506
#9500

#11596 (comment) -- Yup

@mik-laj
Copy link
Member

mik-laj commented Oct 16, 2020

allow creating more tables when running airflow db upgrade via plugins for which it is needed that the plugin is loaded when Scheduler starts.

I still don't understand the use cases for this option. Why do we want to load plugins always and we can't do it on a specific case? If this is due to database migrations, then we can add loading plugins when the database is initialized.

@ashb
Copy link
Member

ashb commented Oct 16, 2020

allow creating more tables when running airflow db upgrade via plugins for which it is needed that the plugin is loaded when Scheduler starts.

I still don't understand the use cases for this option. Why do we want to load plugins always and we can't do it on a specific case? If this is due to database migrations, then we can add loading plugins when the database is initialized.

There is more cases then just db migrations - plugins wanting to do start up initialization, or preloading modules used in all dags as #11596 (comment) was hinting it (he mentioned more explicitly in slack).

This is a workaround until we add a richer plugin system to Airflow, but it is not onerous to support.

@kaxil kaxil merged commit be72817 into apache:master Oct 16, 2020
@kaxil kaxil deleted the lazy-load-plugins branch October 16, 2020 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants