Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check and run migration in commands if necessary #18439

Merged
merged 5 commits into from Dec 8, 2021

Conversation

ephraimbuddy
Copy link
Contributor

This PR proposes to add db initialization and upgrade in webserver startup.

When the webserver is started, we check the migrations and decide whether
to initialize the database or upgrade it.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg bot added area:CLI area:webserver Webserver related Issues labels Sep 22, 2021
@ephraimbuddy
Copy link
Contributor Author

Hi @potiuk, instead of failing the command(#18323) I'm proposing to check and run the migration in webserver but I'm not sure if it's the right approach.

I know that the helm chart runs a separate container for migration and would update it if this approach is accepted.

@uranusjr
Copy link
Member

If we do this, a nob to override the behaviour would be even more necessary. Also, the logic needs to be able to handle cases where the current db state is in the future, i.e. say I migrate to latest, git checkout 2.1.0, and run ./breeze start-airflow.

@potiuk
Copy link
Member

potiuk commented Sep 22, 2021

I like this, however, I'd rather do it in other components as well.

I think webserver is not the onlu/exclusive one that should get it. While webserver won't likely break anything even if it is run with some old db version (but scheduler can ) . But since we have locks for upgrades, we could run the upgrade for ALL components that need DB and get rid of the "wait for migration" special init container.

@potiuk
Copy link
Member

potiuk commented Sep 22, 2021

If we do this, a nob to override the behaviour would be even more necessary. Also, the logic needs to be able to handle cases where the current db state is in the future, i.e. say I migrate to latest, git checkout 2.1.0, and run ./breeze start-airflow.

I thought a bit about it and I believe the dev environment is not a problem at all. Moreover, it might be pretty handy for development case. For breeze you have stop command that wipes out the DB volumes., and i use it rather often when I am switching branches. Alternatively you can also run ./breeze start-airflow with --db-reset flag (which I often use as well) which will run airflow db reset as part of the entrypoint_ci before attempting to run airflow..

And to be honest most people even if they develop stuff don't actually switch to old versions - it;s mainly those few committers who either are involved in release management or analysing some strange bugs.

But still even in the case you described - just failing if the current db migration is unknown to your version of airflow - would be really handy to notify the developer that the db needs some action. Otherwise the errors you can see might be rather strange. It did happen to me several times that I was scratching my head what's going to only find out that I had an old (or new) db structure.

So I see such migration check run at start as pretty useful thing also during development as a consistency check (especially that in Breeze we can even get a bit different error messages to instruct the new developers that they should run ./breeze stop. or --db-reset

@potiuk
Copy link
Member

potiuk commented Sep 22, 2021

The more I think about it, the more I like the idea of always running the migration at start of everything. I think for quite a while we neglected the fact, the db structure can get desynchronized from code, but I cannot imagine ANY situation including the dev where DB should be in a different version than latest airflow migration in the code. There is not a single case I can think of that this might be better than allowing for desynchronisation. This is "db as a code" to the fullest.

The only edge case is of course when you are working on a new migration (which is extremely rare actually) - but even there eve if you try to implement and run/rerun and fix the migration you would rather just test the migrations not the airflow and if you wan to iterate on it you have to play with alembic command-line rather than with airflow worker/scheduler/webserver. And even there, it's extremely easy to add --skip-db-migrations to all the components of airflow.

There is one potential problem - the migration lock is not implemented for MsSQL.

But assuming that is done, is there any reason we do NOT want to run migration for all components when they start ? Is there any particular reason we would not want to run the migrations? Any risks involved in doing so ? I thought hard about it and I cannot think a single reason why we would not want to run the migrations as first thing before starting up of scheduler/webserver/worker/triggerer.

Copy link
Member

@kaxil kaxil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please wait on this change. This is not needed for 2.2.0 and is a significant change on how migrations are handled. Let's think it through.

Or if we want to merge it now, for now let's do the "check" part only and have a follow-up PR to run migrations. The main thing to check is what happens when all these components start together, I know a PR was merged that locks the migration but that needs to checked.

I would also to love to hear more thoughts and opinions

@kaxil
Copy link
Member

kaxil commented Sep 23, 2021

There is one potential problem - the migration lock is not implemented for MsSQL.

It has not been addressed in this PR -- so makes MSSQL support wonky.

@potiuk
Copy link
Member

potiuk commented Sep 23, 2021

Please wait on this change. This is not needed for 2.2.0 and is a significant change on how migrations are handled. Let's think it through.

Absolutely!

I would also to love to hear more thoughts and opinions

Me too :)

@ashb
Copy link
Member

ashb commented Sep 23, 2021

I feel uneasy about automatically running migrations for non-automated approaches.

What I propse we do is:

  • When connected to a TTY: prompt the user if they want to run migrations if there are any outstanding

    There are outstanding db migrations. Do you wish to run them now [N/y]?

  • When not connected to a TTY (i.e. a daemon or non-attached container for instance) just error.

That feels much safer than automatically running a potential long migration when the user isn't expecting -- doubly so in the case of MySQL where DDL etc is non transactional so the process can't even be safely interrupted!

@potiuk
Copy link
Member

potiuk commented Sep 23, 2021

I was also a bit torn between "erroring out" and "migrating". I see benefits of both. But I think indeed erroring out is less "magical" (explicit vs. implicit) and still keeps the same properties as "migrating" in terms of consistency: db vs. airflow code which I think is the root of the problem.

I like very much the terminal idea. It's the "least surprise" and we can even make it abit more "aggressive". If you do not respond in 5 seconds - exit with error. Or if you have "unknown" migration - error out immediately and print "You should likely reset the db" with instructions on how to do it.

This will cover the case that we had for quite some time in Breeze where you thought you run something but then you got the question about rebuild. The 4s. timeout with "erroring out" has a much nicer workflow for development.

@potiuk
Copy link
Member

potiuk commented Sep 25, 2021

This is about a 100th time (yeah I am exaggerating a bit) of similar problem I see over and over: #18513

We MUST add it - and fast.

@ephraimbuddy
Copy link
Contributor Author

This is about a 100th time (yeah I am exaggerating a bit) of similar problem I see over and over: #18513

We MUST add it - and fast.

Yeah. I’m working on making it error out and also prompt to run migration with timeout😊

airflow/utils/db.py Outdated Show resolved Hide resolved
@uranusjr
Copy link
Member

One question: Would there be race conditions if multiple airflow commands are launched concurrently? i.e. ./breeze start-airflow launches a tmux session and five commands together, and both the scheduler and webserver would want to do migration at the same time.

@potiuk
Copy link
Member

potiuk commented Sep 26, 2021

One question: Would there be race conditions if multiple airflow commands are launched concurrently? i.e. ./breeze start-airflow launches a tmux session and five commands together, and both the scheduler and webserver would want to do migration at the same time.

We have now exclusive locks for migration - in most backends (MsSQL needs to be re-added as it has been removed - but it should be possible to bring it back after I discovered where the lock contention came from - I will do it shortly after this one is merged) .

But I think it's a good point - the lock should be put around the whole check with this change rather than inside the upgradedb command.

    with create_global_lock(session=session, pg_lock_id=2, lock_name="upgrade"):
        log.info("Creating tables")
        command.upgrade(config, 'heads')
    add_default_pool_if_not_exists()

@potiuk
Copy link
Member

potiuk commented Sep 26, 2021

This would work fine even now - but rather than detecting that no migration is neded - it would attempt to do it for all of the components (only the first would actually run) - so there is no NEEED to move the lock outside, but it would be a little better also from logging point of view.

airflow/utils/db.py Outdated Show resolved Hide resolved
airflow/utils/db.py Outdated Show resolved Hide resolved
@potiuk
Copy link
Member

potiuk commented Dec 6, 2021

Hey @kaxil - have your worries been addressed?

@kaxil
Copy link
Member

kaxil commented Dec 6, 2021

Hey @kaxil - have your worries been addressed?

Hey, yeah I & Ephraim are working closely on this one -- some of the PRs (#20018 & #20069) lately were to unblock this PR :)

We will rebase on top of main once both those PRs are merged

@ephraimbuddy ephraimbuddy force-pushed the check-and-run-migration branch 2 times, most recently from 2eb9d72 to 7683e16 Compare December 7, 2021 06:44
@ephraimbuddy
Copy link
Contributor Author

ephraimbuddy commented Dec 7, 2021

Because we use init containers to run the migration, this change doesn't get hit when using the airflow chart. Only in a chart without a migration container will it get hit.

If we don't raise TimeoutError in the init containers, then this change is hit. However, it just stops for a few minutes to show the error message and the component will continue running.

I think the migration container should not exit on error, we should have a way to restart it when it fails.

@potiuk
Copy link
Member

potiuk commented Dec 7, 2021

Cool

@@ -37,7 +37,7 @@
WORKER_PROCESS_NAME = "worker"


@cli_utils.action_logging
@cli_utils.action_cli()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we've got the empty parens on all these?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I personally like to do this because I hate the first argument is the function unless you need a flag when the decorator becomes a function complexity, and perfer to always just implement the callable-returning-decorator case.

airflow/utils/cli.py Outdated Show resolved Hide resolved
airflow/utils/cli.py Show resolved Hide resolved
@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Dec 8, 2021
@github-actions
Copy link

github-actions bot commented Dec 8, 2021

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

ephraimbuddy and others added 5 commits December 8, 2021 18:28
This PR proposes to add db initialization and upgrade in webserver startup.

When the webserver is started, we check the migrations and decide whether
to initialize the database or upgrade it.

Run migration if tty

remove args

Add prompt and timeout

update message

fix initialization

apply suggestions from code review.

Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>

fixup! apply suggestions from code review.

fixup! fixup! apply suggestions from code review.

Remove comment

check db migration in all components

expand action_logging decorator to check db migration if needed

Update airflow/utils/db.py

disable db check for task commands

use action_cli instead of action_logging

Update airflow/utils/db.py

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

add return type

Update airflow/utils/db.py

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>

fixup! Update airflow/utils/db.py

add doc & remove unintended import

add action_cli to roles_import

Apply use of environment context

fixup! Apply use of environment context

Check and run migration in webserver startup if necessary

This PR proposes to add db initialization and upgrade in webserver startup.

When the webserver is started, we check the migrations and decide whether
to initialize the database or upgrade it.

Run migration if tty

remove args

Add prompt and timeout

update message

fix initialization

apply suggestions from code review.

Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>

fixup! apply suggestions from code review.

fixup! fixup! apply suggestions from code review.

Remove comment

check db migration in all components

expand action_logging decorator to check db migration if needed

Update airflow/utils/db.py

disable db check for task commands

use action_cli instead of action_logging

Update airflow/utils/db.py

Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>

add return type

add doc & remove unintended import

Don't check db when creating user
Co-authored-by: Ash Berlin-Taylor <ash_github@firemirror.com>
@kaxil kaxil merged commit 737c0ef into apache:main Dec 8, 2021
@kaxil kaxil deleted the check-and-run-migration branch December 8, 2021 20:55
@potiuk
Copy link
Member

potiuk commented Dec 8, 2021

woooohooo!

ashb added a commit to astronomer/airflow that referenced this pull request Feb 23, 2022
We recently merged a change (apache#18439) where we check if DB migrations
are pending before running the main command, but this had the
side-effect of showing these two log lines from alembic:

```
[2022-02-22 18:06:01,995] {{migration.py:201}} INFO - Context impl PostgresqlImpl.
[2022-02-22 18:06:01,995] {{migration.py:204}} INFO - Will assume transactional DDL.
```

Which is a) not useful information to a user, and b) "pollutes" the
output if a command was producing JSON or some other structured format
@ephraimbuddy ephraimbuddy added the type:improvement Changelog: Improvements label Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:CLI area:webserver Webserver related Issues full tests needed We need to run full set of tests for this PR to merge type:improvement Changelog: Improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants