Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove redundant session.commit() in migration #18453

Merged
merged 1 commit into from
Sep 23, 2021

Conversation

kaxil
Copy link
Member

@kaxil kaxil commented Sep 23, 2021

session.commit() is redundant as we won't have anything to commit.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Sep 23, 2021
@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@uranusjr uranusjr closed this Sep 23, 2021
@uranusjr uranusjr reopened this Sep 23, 2021
@potiuk
Copy link
Member

potiuk commented Sep 23, 2021

Just one comment on that one to ask about some common approach here (and maybe an opportunity to get some useful knowledge sharing).

I actually put that commit() deliberately there. I know it is not needed, simply for many years when dealing with the databases, I am used to set some explicit boundaries of database transactions (+ the provide_session kind of decorators which are there to pass an open session/transaction down the stack without closing it). So even if the operation does "nothing" between session.open() an exiting the scope of that session I usually mark it es explicit commit(). Especially when there are different branches you can take - and "return" like in this case (it could also be rollback() in this case), to clearly mark where the session is "closed".

The benefit of that approach is that it is more future-proof. It's very easy to imagine (in general case) that someone in the future will add some modification of the database between session and exiting the function and might not notice that there is branch that might lead to "no-commit". Maybe that's not likely in this particular case, but for me it's more of "natural-habit". Whenever I see such situation that there is a branching and "commit()" in one branch and "nothing" in the other, it feels wrong - and I do not have to exercise extra mental power to understand whether this particular case is justified exception or not. For me it is pretty "natural" thing to do: open sessions() -> branch -> make sure the session is treated the same in all branches..

I'd love to hear your thoughts about it :)

@kaxil
Copy link
Member Author

kaxil commented Sep 23, 2021

Just one comment on that one to ask about some common approach here (and maybe an opportunity to get some useful knowledge sharing).

I actually put that commit() deliberately there. I know it is not needed, simply for many years when dealing with the databases, I am used to set some explicit boundaries of database transactions (+ the provide_session kind of decorators which are there to pass an open session/transaction down the stack without closing it). So even if the operation does "nothing" between session.open() an exiting the scope of that session I usually mark it es explicit commit(). Especially when there are different branches you can take - and "return" like in this case (it could also be rollback() in this case), to clearly mark where the session is "closed".

The benefit of that approach is that it is more future-proof. It's very easy to imagine (in general case) that someone in the future will add some modification of the database between session and exiting the function and might not notice that there is branch that might lead to "no-commit". Maybe that's not likely in this particular case, but for me it's more of "natural-habit". Whenever I see such situation that there is a branching and "commit()" in one branch and "nothing" in the other, it feels wrong - and I do not have to exercise extra mental power to understand whether this particular case is justified exception or not. For me it is pretty "natural" thing to do: open sessions() -> branch -> make sure the session is treated the same in all branches..

I'd love to hear your thoughts about it :)

Valid point, I am discussing this right now with @ashb to get his take on it :)

From my point of view, I have only used a commit with select if it is SELECT ... FOR UPDATE etc.

@ashb
Copy link
Member

ashb commented Sep 23, 2021

Given the next statement is return, and thus that active Session will be closed, the commit here is entirely redundant -- the session will be cleaned up and any the transactions will be rolledback.

So what this will actually do is:

COMMIT (from the commit())
BEGIN (because session.autocommit is False so there is always an active transaction as long as there is a session in scope)
ROLLBACK (because the session goes out of scope, gets gc'd and closed.)

@potiuk
Copy link
Member

potiuk commented Sep 23, 2021

Given the next statement is return, and thus that active Session will be closed, the commit here is entirely redundant -- the session will be cleaned up and any the transactions will be rolledback.

So what this will actually do is:

COMMIT (from the commit())
BEGIN (because session.autocommit is False so there is always an active transaction as long as there is a session in scope)
ROLLBACK (because the session goes out of scope, gets gc'd and closed.)

Yep. I agree it's not needed, and It certainly works like that, so from performance point of view it is 'slightly` better to remove, and I am not at all against removing it, i just want to see if it bothers others from "reasoning" point of view. I even initially implemented it without the commit, but when I looked at the PR after I pushed it - my internal policeman told me "let's make it consistent" and I pushed a changed commit.

But if it's just me who is bothered by that - I am perfectly OK to remove it.

@ashb
Copy link
Member

ashb commented Sep 23, 2021

My head always goes to "this is python, we have de-facto/automatic RAII so we don't need it" :D

@potiuk
Copy link
Member

potiuk commented Sep 23, 2021

My head always goes to "this is python, we have de-facto/automatic RAII so we don't need it" :D

Yah, I might have to teach my internal policeman a few new tricks.

`session.commit()` is redundant as we won't have anything to commit.
@kaxil kaxil removed the full tests needed We need to run full set of tests for this PR to merge label Sep 23, 2021
@kaxil kaxil merged commit e81c7df into apache:main Sep 23, 2021
@kaxil kaxil deleted the improve-sql-check branch September 23, 2021 20:50
@kaxil
Copy link
Member Author

kaxil commented Sep 23, 2021

Failures are unrelated and already on main: https://github.com/apache/airflow/runs/3691456615#step:6:15250

Need to take a look at those

@yuqian90
Copy link
Contributor

yuqian90 commented Nov 24, 2021

Hi @kaxil , in some of the tests we run, we populate an empty sqlite meta data db by doing airflow db init. However, since this revision, we are encountering the following error when running airflow db init:

    lib/python3.8/site-packages/alembic/runtime/migration.py", line 542, in run_migrations
        raise util.CommandError(
    alembic.util.exc.CommandError: Migration "upgrade 127d2bf2dfa7 -> cc1e65623dc7, add max tries column to task instance" has left an uncommitted transaction opened; transactional_ddl is False so Alembic is not committing transactions

This is happening because the test meta data db is empty. So the if condition here is True and the code just returns without committing the transaction.

        if not bool(session.query(TaskInstance).first()):
            return

Does anyone have any suggestions how to get around this error?

@kaxil
Copy link
Member Author

kaxil commented Nov 24, 2021

@yuqian90 How can I reproduce it? I tried with an empty SQLite DB and didn't get an error.

What versions of SQLite do you use?

root@15dd880f441c:/opt/airflow# sqlite3 --version
3.27.2 2019-02-25 16:06:06 bd49a8271d650fa89e446b42e513b595a717b9212c91dd384aab871fc1d0alt1

@yuqian90
Copy link
Contributor

@yuqian90 How can I reproduce it? I tried with an empty SQLite DB and didn't get an error.

What versions of SQLite do you use?

root@15dd880f441c:/opt/airflow# sqlite3 --version
3.27.2 2019-02-25 16:06:06 bd49a8271d650fa89e446b42e513b595a717b9212c91dd384aab871fc1d0alt1

Hi @kaxil, I have the exact same sqlite3 as you. Maybe it's not a problem with sqlite3. The error has left an uncommitted transaction opened came from alembic. So potentially alembic or sqlalchemy is the problem here. These are the versions i have. Does any of these ring a bell?

apache-airflow==2.2.2
apache-airflow-providers-sqlite==2.0.1
alembic==1.4.2
Flask-SQLAlchemy==2.5.1
marshmallow-sqlalchemy==0.26.1
SQLAlchemy==1.3.19
SQLAlchemy-JSONField==1.0.0
SQLAlchemy-Utils==0.37.9

Here's how to reproduce:

  • Create an empty directory to be used as AIRFLOW_HOME
  • Run airflow db init
$ mkdir /tmp/test_airflow_home
$ AIRFLOW_HOME=/tmp/test_airflow_home airflow db init
DB: sqlite:////tmp/test_airflow_home/airflow.db
[2021-11-24 08:51:16,278] {db.py:910} INFO - Creating tables
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
WARNI [airflow.providers_manager] The package apache-airflow-providers-celery is not compatible with this version of Airflow. The package has version 1.0.1 but the minimum supported version of the package is 2.1.0
WARNI [airflow.providers_manager] The package apache-airflow-providers-celery is not compatible with this version of Airflow. The package has version 1.0.1 but the minimum supported version of the package is 2.1.0
INFO  [alembic.runtime.migration] Running upgrade  -> e3a246e0dc1, current schema
INFO  [alembic.runtime.migration] Running upgrade e3a246e0dc1 -> 1507a7289a2f, create is_encrypted
/tmp/venv_airflow_2.2.2/lib/python3.8/site-packages/alembic/ddl/sqlite.py:40 UserWarning: Skipping unsupported ALTER for creation of implicit constraintPlease refer to the batch mode feature which allows for SQLite migrations using a copy-and-move strategy.
INFO  [alembic.runtime.migration] Running upgrade 1507a7289a2f -> 13eb55f81627, maintain history for compatibility with earlier migrations
INFO  [alembic.runtime.migration] Running upgrade 13eb55f81627 -> 338e90f54d61, More logging into task_instance
INFO  [alembic.runtime.migration] Running upgrade 338e90f54d61 -> 52d714495f0, job_id indices
INFO  [alembic.runtime.migration] Running upgrade 52d714495f0 -> 502898887f84, Adding extra to Log
INFO  [alembic.runtime.migration] Running upgrade 502898887f84 -> 1b38cef5b76e, add dagrun
INFO  [alembic.runtime.migration] Running upgrade 1b38cef5b76e -> 2e541a1dcfed, task_duration
INFO  [alembic.runtime.migration] Running upgrade 2e541a1dcfed -> 40e67319e3a9, dagrun_config
INFO  [alembic.runtime.migration] Running upgrade 40e67319e3a9 -> 561833c1c74b, add password column to user
INFO  [alembic.runtime.migration] Running upgrade 561833c1c74b -> 4446e08588, dagrun start end
INFO  [alembic.runtime.migration] Running upgrade 4446e08588 -> bbc73705a13e, Add notification_sent column to sla_miss
INFO  [alembic.runtime.migration] Running upgrade bbc73705a13e -> bba5a7cfc896, Add a column to track the encryption state of the 'Extra' field in connection
INFO  [alembic.runtime.migration] Running upgrade bba5a7cfc896 -> 1968acfc09e3, add is_encrypted column to variable table
INFO  [alembic.runtime.migration] Running upgrade 1968acfc09e3 -> 2e82aab8ef20, rename user table
INFO  [alembic.runtime.migration] Running upgrade 2e82aab8ef20 -> 211e584da130, add TI state index
INFO  [alembic.runtime.migration] Running upgrade 211e584da130 -> 64de9cddf6c9, add task fails journal table
INFO  [alembic.runtime.migration] Running upgrade 64de9cddf6c9 -> f2ca10b85618, add dag_stats table
INFO  [alembic.runtime.migration] Running upgrade f2ca10b85618 -> 4addfa1236f1, Add fractional seconds to mysql tables
INFO  [alembic.runtime.migration] Running upgrade 4addfa1236f1 -> 8504051e801b, xcom dag task indices
INFO  [alembic.runtime.migration] Running upgrade 8504051e801b -> 5e7d17757c7a, add pid field to TaskInstance
INFO  [alembic.runtime.migration] Running upgrade 5e7d17757c7a -> 127d2bf2dfa7, Add dag_id/state index on dag_run table
INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, add max tries column to task instance
Traceback (most recent call last):
  File "/tmp/venv_airflow_2.2.2/bin/airflow", line 8, in <module>
    sys.exit(main())
  File "/tmp/venv_airflow_2.2.2/lib/python3.8/site-packages/airflow/__main__.py", line 48, in main
    args.func(args)
  File "/tmp/venv_airflow_2.2.2/lib/python3.8/site-packages/airflow/cli/cli_parser.py", line 48, in command
    return func(*args, **kwargs)
  File "/tmp/venv_airflow_2.2.2/lib/python3.8/site-packages/airflow/cli/commands/db_command.py", line 31, in initdb
    db.initdb()
  File "/tmp/venv_airflow_2.2.2/lib/python3.8/site-packages/airflow/utils/session.py", line 70, in wrapper
    return func(*args, session=session, **kwargs)
  File "/tmp/venv_airflow_2.2.2/lib/python3.8/site-packages/airflow/utils/db.py", line 592, in initdb
    upgradedb(session=session)
  File "/tmp/venv_airflow_2.2.2/lib/python3.8/site-packages/airflow/utils/session.py", line 67, in wrapper
    return func(*args, **kwargs)
  File "/tmp/venv_airflow_2.2.2/lib/python3.8/site-packages/airflow/utils/db.py", line 911, in upgradedb
    command.upgrade(config, 'heads')
  File "/tmp/venv_airflow_2.2.2/lib/python3.8/site-packages/alembic/command.py", line 298, in upgrade
    script.run_env()
  File "/tmp/venv_airflow_2.2.2/lib/python3.8/site-packages/alembic/script/base.py", line 489, in run_env
    util.load_python_file(self.dir, "env.py")
  File "/tmp/venv_airflow_2.2.2/lib/python3.8/site-packages/alembic/util/pyfiles.py", line 98, in load_python_file
    module = load_module_py(module_id, path)
  File "/tmp/venv_airflow_2.2.2/lib/python3.8/site-packages/alembic/util/compat.py", line 184, in load_module_py
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/tmp/venv_airflow_2.2.2/lib/python3.8/site-packages/airflow/migrations/env.py", line 107, in <module>
    run_migrations_online()
  File "/tmp/venv_airflow_2.2.2/lib/python3.8/site-packages/airflow/migrations/env.py", line 101, in run_migrations_online
    context.run_migrations()
  File "<string>", line 8, in run_migrations
  File "/tmp/venv_airflow_2.2.2/lib/python3.8/site-packages/alembic/runtime/environment.py", line 846, in run_migrations
    self.get_context().run_migrations(**kw)
  File "/tmp/venv_airflow_2.2.2/lib/python3.8/site-packages/alembic/runtime/migration.py", line 542, in run_migrations
    raise util.CommandError(
alembic.util.exc.CommandError: Migration "upgrade 127d2bf2dfa7 -> cc1e65623dc7, add max tries column to task instance" has left an uncommitted transaction opened; transactional_ddl is False so Alembic is not committing transactions

@kaxil
Copy link
Member Author

kaxil commented Dec 8, 2021

@yuqian90 Some of your requirements are different than the one in constraints file that might be a reason.

Yours:

alembic==1.4.2
SQLAlchemy==1.3.19

In Constraints:

alembic==1.7.4
SQLAlchemy==1.3.24

@kaxil
Copy link
Member Author

kaxil commented Dec 8, 2021

OK I can confirm that alembic==1.4.2 is the issue you have, upgrade the version to at least 1.7.4 (specified by constraints). Even 1.5.1 should work

kaxil added a commit to astronomer/airflow that referenced this pull request Dec 8, 2021
Related to apache#18453 (comment)

`1.5.0` was yanked so `>=1.5.1` is safe and we already have `1.7.5` in constraints-main
kaxil added a commit that referenced this pull request Dec 9, 2021
Related to #18453 (comment)

`1.5.0` was yanked so `>=1.5.1` is safe and we already have `1.7.5` in constraints-main
jedcunningham pushed a commit that referenced this pull request Dec 10, 2021
Related to #18453 (comment)

`1.5.0` was yanked so `>=1.5.1` is safe and we already have `1.7.5` in constraints-main

(cherry picked from commit f99d0e7)
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Mar 10, 2022
Related to apache/airflow#18453 (comment)

`1.5.0` was yanked so `>=1.5.1` is safe and we already have `1.7.5` in constraints-main

(cherry picked from commit f99d0e7066d2c00ff6581fd566a4d0a9826a2fc3)

GitOrigin-RevId: 3a39702e6fc98a3b43b6d9506c8e1c4a3f0913d1
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jun 4, 2022
Related to apache/airflow#18453 (comment)

`1.5.0` was yanked so `>=1.5.1` is safe and we already have `1.7.5` in constraints-main

(cherry picked from commit f99d0e7066d2c00ff6581fd566a4d0a9826a2fc3)

GitOrigin-RevId: 3a39702e6fc98a3b43b6d9506c8e1c4a3f0913d1
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jul 10, 2022
Related to apache/airflow#18453 (comment)

`1.5.0` was yanked so `>=1.5.1` is safe and we already have `1.7.5` in constraints-main

GitOrigin-RevId: f99d0e7066d2c00ff6581fd566a4d0a9826a2fc3
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Aug 27, 2022
Related to apache/airflow#18453 (comment)

`1.5.0` was yanked so `>=1.5.1` is safe and we already have `1.7.5` in constraints-main

GitOrigin-RevId: f99d0e7066d2c00ff6581fd566a4d0a9826a2fc3
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Oct 4, 2022
Related to apache/airflow#18453 (comment)

`1.5.0` was yanked so `>=1.5.1` is safe and we already have `1.7.5` in constraints-main

GitOrigin-RevId: f99d0e7066d2c00ff6581fd566a4d0a9826a2fc3
aglipska pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Oct 7, 2022
Related to apache/airflow#18453 (comment)

`1.5.0` was yanked so `>=1.5.1` is safe and we already have `1.7.5` in constraints-main

GitOrigin-RevId: f99d0e7066d2c00ff6581fd566a4d0a9826a2fc3
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Dec 7, 2022
Related to apache/airflow#18453 (comment)

`1.5.0` was yanked so `>=1.5.1` is safe and we already have `1.7.5` in constraints-main

GitOrigin-RevId: f99d0e7066d2c00ff6581fd566a4d0a9826a2fc3
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jan 27, 2023
Related to apache/airflow#18453 (comment)

`1.5.0` was yanked so `>=1.5.1` is safe and we already have `1.7.5` in constraints-main

GitOrigin-RevId: f99d0e7066d2c00ff6581fd566a4d0a9826a2fc3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants