Skip to content

Comments

Fix circular imports when airflow starts#29494

Merged
potiuk merged 3 commits intoapache:mainfrom
potiuk:remove-circular-imports-from-settings
Feb 13, 2023
Merged

Fix circular imports when airflow starts#29494
potiuk merged 3 commits intoapache:mainfrom
potiuk:remove-circular-imports-from-settings

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented Feb 12, 2023

When airflow CLI is started without generated config file the CLI fails with circular imports. Some of the util classes imported during such start were importing "constants" (in fact variables) from the setttings and this happened before settings were fully initialized.

This PR moves all relevant top-level imports to be local imports and moves initialization of the State class with settings-defined colors to initialization() method in order to avoid those circular imports.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member Author

potiuk commented Feb 12, 2023

I have not found yet which change triggered it (for sure that was not the change by @o-nikolas #29257 as I reverted it and it did not help) but it seems our main cli airflow fails with circular imports. Maybe others can find where it is from but my changes fix it (@o-nikolas - this is another manifestation of the circular "dependencies" we have - between config and settings):

Example stack trace I got before that one in various stages of the fix:

Traceback (most recent call last):
  File "/Users/jarek/.pyenv/versions/airflow-3.9/bin/airflow", line 5, in <module>
    from airflow.__main__ import main
  File "/Users/jarek/IdeaProjects/airflow/airflow/__main__.py", line 27, in <module>
    from airflow.cli import cli_parser
  File "/Users/jarek/IdeaProjects/airflow/airflow/cli/cli_parser.py", line 33, in <module>
    from airflow import settings
  File "/Users/jarek/IdeaProjects/airflow/airflow/settings.py", line 39, in <module>
    from airflow.configuration import AIRFLOW_HOME, WEBSERVER_CONFIG, conf  # NOQA F401
  File "/Users/jarek/IdeaProjects/airflow/airflow/configuration.py", line 1794, in <module>
    conf.validate()
  File "/Users/jarek/IdeaProjects/airflow/airflow/configuration.py", line 343, in validate
    self._validate_config_dependencies()
  File "/Users/jarek/IdeaProjects/airflow/airflow/configuration.py", line 439, in _validate_config_dependencies
    executor, _ = ExecutorLoader.import_default_executor_cls()
  File "/Users/jarek/IdeaProjects/airflow/airflow/executors/executor_loader.py", line 151, in import_default_executor_cls
    return cls.import_executor_cls(executor_name)
  File "/Users/jarek/IdeaProjects/airflow/airflow/executors/executor_loader.py", line 126, in import_executor_cls
    return import_string(cls.executors[executor_name]), ConnectorSource.CORE
  File "/Users/jarek/IdeaProjects/airflow/airflow/utils/module_loading.py", line 36, in import_string
    module = import_module(module_path)
  File "/Users/jarek/.pyenv/versions/3.9.9/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/Users/jarek/IdeaProjects/airflow/airflow/executors/sequential_executor.py", line 30, in <module>
    from airflow.executors.base_executor import BaseExecutor, CommandType
  File "/Users/jarek/IdeaProjects/airflow/airflow/executors/base_executor.py", line 34, in <module>
    from airflow.models.taskinstance import TaskInstance, TaskInstanceKey
  File "/Users/jarek/IdeaProjects/airflow/airflow/models/taskinstance.py", line 72, in <module>
    from airflow.datasets.manager import dataset_manager
  File "/Users/jarek/IdeaProjects/airflow/airflow/datasets/manager.py", line 27, in <module>
    from airflow.models.dataset import DatasetDagRunQueue, DatasetEvent, DatasetModel
  File "/Users/jarek/IdeaProjects/airflow/airflow/models/dataset.py", line 40, in <module>
    from airflow.utils import timezone
  File "/Users/jarek/IdeaProjects/airflow/airflow/utils/timezone.py", line 27, in <module>
    from airflow.settings import TIMEZONE
ImportError: cannot import name 'TIMEZONE' from partially initialized module 'airflow.settings' (most likely due to a circular import) (/Users/jarek/IdeaProjects/airflow/airflow/settings.py)

Then:

Traceback (most recent call last):
  File "/Users/jarek/.pyenv/versions/airflow-3.9/bin/airflow", line 5, in <module>
    from airflow.__main__ import main
  File "/Users/jarek/IdeaProjects/airflow/airflow/__main__.py", line 27, in <module>
    from airflow.cli import cli_parser
  File "/Users/jarek/IdeaProjects/airflow/airflow/cli/cli_parser.py", line 33, in <module>
    from airflow import settings
  File "/Users/jarek/IdeaProjects/airflow/airflow/settings.py", line 39, in <module>
    from airflow.configuration import AIRFLOW_HOME, WEBSERVER_CONFIG, conf  # NOQA F401
  File "/Users/jarek/IdeaProjects/airflow/airflow/configuration.py", line 1794, in <module>
    conf.validate()
  File "/Users/jarek/IdeaProjects/airflow/airflow/configuration.py", line 343, in validate
    self._validate_config_dependencies()
  File "/Users/jarek/IdeaProjects/airflow/airflow/configuration.py", line 439, in _validate_config_dependencies
    executor, _ = ExecutorLoader.import_default_executor_cls()
  File "/Users/jarek/IdeaProjects/airflow/airflow/executors/executor_loader.py", line 151, in import_default_executor_cls
    return cls.import_executor_cls(executor_name)
  File "/Users/jarek/IdeaProjects/airflow/airflow/executors/executor_loader.py", line 126, in import_executor_cls
    return import_string(cls.executors[executor_name]), ConnectorSource.CORE
  File "/Users/jarek/IdeaProjects/airflow/airflow/utils/module_loading.py", line 36, in import_string
    module = import_module(module_path)
  File "/Users/jarek/.pyenv/versions/3.9.9/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/Users/jarek/IdeaProjects/airflow/airflow/executors/sequential_executor.py", line 30, in <module>
    from airflow.executors.base_executor import BaseExecutor, CommandType
  File "/Users/jarek/IdeaProjects/airflow/airflow/executors/base_executor.py", line 34, in <module>
    from airflow.models.taskinstance import TaskInstance, TaskInstanceKey
  File "/Users/jarek/IdeaProjects/airflow/airflow/models/taskinstance.py", line 72, in <module>
    from airflow.datasets.manager import dataset_manager
  File "/Users/jarek/IdeaProjects/airflow/airflow/datasets/manager.py", line 29, in <module>
    from airflow.utils.log.logging_mixin import LoggingMixin
  File "/Users/jarek/IdeaProjects/airflow/airflow/utils/log/logging_mixin.py", line 29, in <module>
    from airflow.settings import IS_K8S_EXECUTOR_POD
ImportError: cannot import name 'IS_K8S_EXECUTOR_POD' from partially initialized module 'airflow.settings' (most likely due to a circular import) (/Users/jarek/IdeaProjects/airflow/airflow/settings.py)

Then:

Traceback (most recent call last):
  File "/Users/jarek/.pyenv/versions/airflow-3.9/bin/airflow", line 5, in <module>
    from airflow.__main__ import main
  File "/Users/jarek/IdeaProjects/airflow/airflow/__main__.py", line 27, in <module>
    from airflow.cli import cli_parser
  File "/Users/jarek/IdeaProjects/airflow/airflow/cli/cli_parser.py", line 33, in <module>
    from airflow import settings
  File "/Users/jarek/IdeaProjects/airflow/airflow/settings.py", line 39, in <module>
    from airflow.configuration import AIRFLOW_HOME, WEBSERVER_CONFIG, conf  # NOQA F401
  File "/Users/jarek/IdeaProjects/airflow/airflow/configuration.py", line 1791, in <module>
    conf.validate()
  File "/Users/jarek/IdeaProjects/airflow/airflow/configuration.py", line 342, in validate
    self._validate_config_dependencies()
  File "/Users/jarek/IdeaProjects/airflow/airflow/configuration.py", line 438, in _validate_config_dependencies
    executor, _ = ExecutorLoader.import_default_executor_cls()
  File "/Users/jarek/IdeaProjects/airflow/airflow/executors/executor_loader.py", line 151, in import_default_executor_cls
    return cls.import_executor_cls(executor_name)
  File "/Users/jarek/IdeaProjects/airflow/airflow/executors/executor_loader.py", line 126, in import_executor_cls
    return import_string(cls.executors[executor_name]), ConnectorSource.CORE
  File "/Users/jarek/IdeaProjects/airflow/airflow/utils/module_loading.py", line 36, in import_string
    module = import_module(module_path)
  File "/Users/jarek/.pyenv/versions/3.9.9/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/Users/jarek/IdeaProjects/airflow/airflow/executors/sequential_executor.py", line 30, in <module>
    from airflow.executors.base_executor import BaseExecutor, CommandType
  File "/Users/jarek/IdeaProjects/airflow/airflow/executors/base_executor.py", line 34, in <module>
    from airflow.models.taskinstance import TaskInstance, TaskInstanceKey
  File "/Users/jarek/IdeaProjects/airflow/airflow/models/taskinstance.py", line 91, in <module>
    from airflow.models.mappedoperator import MappedOperator
  File "/Users/jarek/IdeaProjects/airflow/airflow/models/mappedoperator.py", line 35, in <module>
    from airflow.models.abstractoperator import (
  File "/Users/jarek/IdeaProjects/airflow/airflow/models/abstractoperator.py", line 33, in <module>
    from airflow.utils.state import State, TaskInstanceState
  File "/Users/jarek/IdeaProjects/airflow/airflow/utils/state.py", line 73, in <module>
    class State:
  File "/Users/jarek/IdeaProjects/airflow/airflow/utils/state.py", line 123, in State
    from airflow.settings import STATE_COLORS
ImportError: cannot import name 'STATE_COLORS' from partially initialized module 'airflow.settings' (most likely due to a circular import) (/Users/jarek/IdeaProjects/airflow/airflow/settings.py)

@potiuk potiuk force-pushed the remove-circular-imports-from-settings branch from 3f82d72 to 9b9b930 Compare February 12, 2023 20:01
@potiuk potiuk changed the title Fix circular imports when airflow starts from scratch Fix circular imports when airflow starts Feb 12, 2023
@potiuk potiuk force-pushed the remove-circular-imports-from-settings branch from 9b9b930 to 5fe3d7b Compare February 12, 2023 20:55
@potiuk potiuk requested a review from XD-DENG as a code owner February 12, 2023 20:55
potiuk and others added 3 commits February 13, 2023 20:10
When airflow CLI is started without generated config file the CLI
fails with circular imports. Some of the util classes imported
during such start were importing "constants" (in fact variables)
from the setttings and this happened before settings were fully
initialized.

This PR moves all relevant top-level imports to be local imports
and moves initialization of the State class with settings-defined
colors to initialization() method in order to avoid those circular
imports.
Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
@potiuk potiuk force-pushed the remove-circular-imports-from-settings branch from 9de9c76 to ffebd00 Compare February 13, 2023 19:12
@potiuk potiuk merged commit 47b67f1 into apache:main Feb 13, 2023
@potiuk potiuk deleted the remove-circular-imports-from-settings branch February 13, 2023 23:51
@o-nikolas
Copy link
Contributor

The first traceback you

I have not found yet which change triggered it (for sure that was not the change by @o-nikolas #29257 as I reverted it and it did not help) but it seems our main cli airflow fails with circular imports. Maybe others can find where it is from but my changes fix it (@o-nikolas - this is another manifestation of the circular "dependencies" we have - between config and settings):

Example stack trace I got before that one in various stages of the fix:

Traceback (most recent call last):
  File "/Users/jarek/.pyenv/versions/airflow-3.9/bin/airflow", line 5, in <module>
    from airflow.__main__ import main
  File "/Users/jarek/IdeaProjects/airflow/airflow/__main__.py", line 27, in <module>
    from airflow.cli import cli_parser
  File "/Users/jarek/IdeaProjects/airflow/airflow/cli/cli_parser.py", line 33, in <module>
    from airflow import settings
  File "/Users/jarek/IdeaProjects/airflow/airflow/settings.py", line 39, in <module>
    from airflow.configuration import AIRFLOW_HOME, WEBSERVER_CONFIG, conf  # NOQA F401
  File "/Users/jarek/IdeaProjects/airflow/airflow/configuration.py", line 1794, in <module>
    conf.validate()
  File "/Users/jarek/IdeaProjects/airflow/airflow/configuration.py", line 343, in validate
    self._validate_config_dependencies()
  File "/Users/jarek/IdeaProjects/airflow/airflow/configuration.py", line 439, in _validate_config_dependencies
    executor, _ = ExecutorLoader.import_default_executor_cls()
  File "/Users/jarek/IdeaProjects/airflow/airflow/executors/executor_loader.py", line 151, in import_default_executor_cls
    return cls.import_executor_cls(executor_name)
  File "/Users/jarek/IdeaProjects/airflow/airflow/executors/executor_loader.py", line 126, in import_executor_cls
    return import_string(cls.executors[executor_name]), ConnectorSource.CORE
  File "/Users/jarek/IdeaProjects/airflow/airflow/utils/module_loading.py", line 36, in import_string
    module = import_module(module_path)
  File "/Users/jarek/.pyenv/versions/3.9.9/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/Users/jarek/IdeaProjects/airflow/airflow/executors/sequential_executor.py", line 30, in <module>
    from airflow.executors.base_executor import BaseExecutor, CommandType
  File "/Users/jarek/IdeaProjects/airflow/airflow/executors/base_executor.py", line 34, in <module>
    from airflow.models.taskinstance import TaskInstance, TaskInstanceKey
  File "/Users/jarek/IdeaProjects/airflow/airflow/models/taskinstance.py", line 72, in <module>
    from airflow.datasets.manager import dataset_manager
  File "/Users/jarek/IdeaProjects/airflow/airflow/datasets/manager.py", line 27, in <module>
    from airflow.models.dataset import DatasetDagRunQueue, DatasetEvent, DatasetModel
  File "/Users/jarek/IdeaProjects/airflow/airflow/models/dataset.py", line 40, in <module>
    from airflow.utils import timezone
  File "/Users/jarek/IdeaProjects/airflow/airflow/utils/timezone.py", line 27, in <module>
    from airflow.settings import TIMEZONE
ImportError: cannot import name 'TIMEZONE' from partially initialized module 'airflow.settings' (most likely due to a circular import) (/Users/jarek/IdeaProjects/airflow/airflow/settings.py)

@potiuk I believe that a similar fix I applied in airflow.__int__ (in #29257) would also have fixed this case actually. Basically in __main__.py if airflow.configuration.conf is imported before airflow.cli.cli_parser (which imports airflow.settings) then the cycle would be removed.
We get cycles like this when airflow.settings is the first module to import/load airflow.configuration because the very first import of airflow.configuration triggers the conf object to be created and validated, but much of the conf validation code depends on airflow.settings downstream so we can't have it happen inside airflow.settings.

So importing airflow.configuration early (thus initting/validating conf outside the context of airflow.settings) would have fixed the issue in __main__.py I think:

 from __future__ import annotations
 
 import os
 
 import argcomplete
 
-from airflow.cli import cli_parser
 from airflow.configuration import conf
+from airflow.cli import cli_parser
 
 
 def main():
     """Main executable function."""
     if conf.get("core", "security") == "kerberos":
         os.environ["KRB5CCNAME"] = conf.get("kerberos", "ccache")

@potiuk
Copy link
Member Author

potiuk commented Feb 14, 2023

Well. It was failing after the fix from you. I checked it was there. I even reverted it to see if it changes anything. It did not. Maybe you can try to revert my fix and see if you can reproduce it.

But now when I am thinking if it - maybe this was somehow connected to installed entrypoint and 'pip install -e .' caused some weird interaction?

Because this is how I got it failing:

  • Activate venv
  • Run pip install -e
  • Remove all files generated in AIRFLOW_HOME
  • run ``airflow`

Maybe that was somehow my environmental problem ? Maybe you can double check it by reverting :) ?

@potiuk
Copy link
Member Author

potiuk commented Feb 14, 2023

Ah yeah i see you proposed a new one - now (sorry on mobile).

Yeah worth trying. I just know from the past that i had some nice ideas for fixing those problems and they often broke down when I tested them :) after some big changes.

I would love to get those problems not reappear.

@Taragolis
Copy link
Contributor

Some time ago I tried to write specific tests for known and repeatable problem with circular imports.

There is couple issues (I guess all could be solved):

  1. Python 3.7 only show as regular ImportError without mentioning that is circular import
  2. Need to run all this kind of tests outside of regular pytest environment because everything already imported in the right way. I found a stupid solution - generate python file and execute it in subprocess
  3. If run in subprocess need to mock specific stuff very carefully.

@o-nikolas
Copy link
Contributor

o-nikolas commented Feb 14, 2023

I created a new PR which fixes __main__.py in the same way that I fixed __init__.py

We need this fix at each possible "entrypoint" into airflow runtime. It should only be these two places (unless others know of any more?)

@o-nikolas
Copy link
Contributor

Some time ago I tried to write specific tests for known and repeatable problem with circular imports.

This would be so awesome to have. If you see the PR I linked (and the other one like it for __init__.py) the fix ultimately relies on configuration.py being imported early and intentionally. If we could find some way to test/assert at least that specifically that could help.

@pierrejeambrun pierrejeambrun added this to the Airflow 2.5.2 milestone Feb 27, 2023
@pierrejeambrun pierrejeambrun added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Feb 27, 2023
pierrejeambrun pushed a commit that referenced this pull request Mar 7, 2023
* Fix circular imports when airflow starts from scratch

When airflow CLI is started without generated config file the CLI
fails with circular imports. Some of the util classes imported
during such start were importing "constants" (in fact variables)
from the setttings and this happened before settings were fully
initialized.

This PR moves all relevant top-level imports to be local imports
and moves initialization of the State class with settings-defined
colors to initialization() method in order to avoid those circular
imports.

* Update airflow/models/xcom.py

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

---------

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
(cherry picked from commit 47b67f1)
pierrejeambrun pushed a commit that referenced this pull request Mar 8, 2023
* Fix circular imports when airflow starts from scratch

When airflow CLI is started without generated config file the CLI
fails with circular imports. Some of the util classes imported
during such start were importing "constants" (in fact variables)
from the setttings and this happened before settings were fully
initialized.

This PR moves all relevant top-level imports to be local imports
and moves initialization of the State class with settings-defined
colors to initialization() method in order to avoid those circular
imports.

* Update airflow/models/xcom.py

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

---------

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
(cherry picked from commit 47b67f1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:logging changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants