Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes problem where conf variable was used before initialization #16088

Merged

Conversation

@potiuk
Copy link
Member

@potiuk potiuk commented May 26, 2021

There was a problem that when we initialized configuration, we've run
validate() which - among others - checkd if the connection is an sqlite
but when the SQLAlchemy connection was not configured via variable but
via secret manager, it has fallen back to secret_backend, which should
be configured via conf and initialized.
The problem is that the "conf" object is not yet created, because
the "validate()" method has not finished yet and
"initialize_configuration" has not yet returned.
This led to snake eating its own tail.

This PR defers the validate() method to after secret backends have
been initialized. The effect of it is that secret backends might
be initialized with configuration that is not valid, but there are
no real negative consequences of this.

Fixes: #16079
Fixes: #15685

starting


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@potiuk potiuk requested review from ashb and kaxil May 26, 2021
@potiuk
Copy link
Member Author

@potiuk potiuk commented May 26, 2021

cc: @uranusjr - I believe this one should fix the cyclic initialization problem.

@potiuk potiuk force-pushed the run_conf_validation_after_initialization branch from 11b8102 to 19742c1 May 26, 2021
@uranusjr
Copy link
Member

@uranusjr uranusjr commented May 27, 2021

Looks good to me. I wonder if it would be possible to add a test for this; could be difficult since it depends on the global interpreter state?

@potiuk
Copy link
Member Author

@potiuk potiuk commented May 27, 2021

Looks good to me. I wonder if it would be possible to add a test for this; could be difficult since it depends on the global interpreter state?

Yeah. Testing this is difficult. And actually we kind'a test it in the updated test -> the sequence initialize followed by validate is run there (that's why I actually added the "validate()" in the test even if it was not strictly necessary there. But you are right. By the time we got there in tests the "initialize()" command have been run multiple times already in other tests and global state is already changed.

I think it would require some refactoring of the approach we use conf. Currently it is indeed a global state, but it should really be independent object that you can initialize and test in isolation. This is what I mentioned in the other related comment in #15685 (comment) . The way configuration is used is the main reason why we cannot remove the last few files from pylint_todo.txt - because pylint (rightfully) detects cyclic dependencies (not imports!) because of the way configuration is doing two things - it is both used by and uses other airflow components.

For now this problem seems to be fixed by this change (confirmed by @dowthron in #16079 - seems @downthron came to the same fix as I did in the meantime).

But I am happy to discuss the way we can approach configuration differently as next step. The little problem with that is that it might only be really fixable in Airflow 3.0 because it might introduce some backward-incompatibilities as we might need to change how airflow.configuration.conf object is used and I believe people use it as a "public API" of airflow and by changing it we might break their plugins and dags. But maybe we can do it similarly as we did with lazy initialization of conf rather than import via getattr /PEP-562 + STATICA_HACK

def __getattr__(name):
to handle from airflow import DAG and from airflow import AirflowException.

It would be great to have separate conf object that we could test initialization of independently and unentangle the cyclic dependencies.

There was a problem that when we initialized configuration, we've run
validate() which - among others - checkd if the connection is an `sqlite`
but when the SQLAlchemy connection was not configured via variable but
via secret manager, it has fallen back to secret_backend, which should
be configured via conf and initialized.
The problem is that the "conf" object is not yet created, because
the "validate()" method has not finished yet and
"initialize_configuration" has not yet returned.
This led to snake eating its own tail.

This PR defers the validate() method to after secret backends have
been initialized. The effect of it is that secret backends might
be initialized with configuration that is not valid, but there are
no real negative consequences of this.

Fixes: apache#16079
Fixes: apache#15685

starting
@potiuk potiuk force-pushed the run_conf_validation_after_initialization branch from 19742c1 to 9fb9101 May 27, 2021
@potiuk potiuk closed this May 27, 2021
@potiuk potiuk reopened this May 27, 2021
@uranusjr
Copy link
Member

@uranusjr uranusjr commented May 27, 2021

If we’re going to keep having global variables without actually having the state global (likely the practical approach given the giant effort required to refactor away airflow.conf usages entirely), we could probably steal learn a trick or two from Flask. They are the expert hiding local states behind global variables (flask.request, flask.g, etc.)

Copy link
Member

@uranusjr uranusjr left a comment

Anyway, 👍 for this one

ashb
ashb approved these changes May 27, 2021
@github-actions
Copy link

@github-actions github-actions bot commented May 27, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@ashb ashb added this to the Airflow 2.1.1 milestone May 27, 2021
kaxil
kaxil approved these changes May 27, 2021
@potiuk potiuk closed this May 27, 2021
@potiuk potiuk reopened this May 27, 2021
@potiuk
Copy link
Member Author

@potiuk potiuk commented May 27, 2021

rebuilding.

@potiuk potiuk merged commit 65519ab into apache:master May 27, 2021
99 of 108 checks passed
@potiuk potiuk deleted the run_conf_validation_after_initialization branch May 27, 2021
jhtimmins added a commit to astronomer/airflow that referenced this issue Jun 3, 2021
…che#16088)

There was a problem that when we initialized configuration, we've run
validate() which - among others - checkd if the connection is an `sqlite`
but when the SQLAlchemy connection was not configured via variable but
via secret manager, it has fallen back to secret_backend, which should
be configured via conf and initialized.
The problem is that the "conf" object is not yet created, because
the "validate()" method has not finished yet and
"initialize_configuration" has not yet returned.
This led to snake eating its own tail.

This PR defers the validate() method to after secret backends have
been initialized. The effect of it is that secret backends might
be initialized with configuration that is not valid, but there are
no real negative consequences of this.

Fixes: apache#16079
Fixes: apache#15685

starting

(cherry picked from commit 65519ab)
andormarkus pushed a commit to andormarkus/airflow that referenced this issue Jun 5, 2021
…che#16088)

There was a problem that when we initialized configuration, we've run
validate() which - among others - checkd if the connection is an `sqlite`
but when the SQLAlchemy connection was not configured via variable but
via secret manager, it has fallen back to secret_backend, which should
be configured via conf and initialized.
The problem is that the "conf" object is not yet created, because
the "validate()" method has not finished yet and
"initialize_configuration" has not yet returned.
This led to snake eating its own tail.

This PR defers the validate() method to after secret backends have
been initialized. The effect of it is that secret backends might
be initialized with configuration that is not valid, but there are
no real negative consequences of this.

Fixes: apache#16079
Fixes: apache#15685

starting
ashb added a commit that referenced this issue Jun 22, 2021
)

There was a problem that when we initialized configuration, we've run
validate() which - among others - checkd if the connection is an `sqlite`
but when the SQLAlchemy connection was not configured via variable but
via secret manager, it has fallen back to secret_backend, which should
be configured via conf and initialized.
The problem is that the "conf" object is not yet created, because
the "validate()" method has not finished yet and
"initialize_configuration" has not yet returned.
This led to snake eating its own tail.

This PR defers the validate() method to after secret backends have
been initialized. The effect of it is that secret backends might
be initialized with configuration that is not valid, but there are
no real negative consequences of this.

Fixes: #16079
Fixes: #15685

starting

(cherry picked from commit 65519ab)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment