Skip to content

Comments

Fix docker compose command in "Running Airflow in Docker"#36611

Closed
jtommi wants to merge 1 commit intoapache:mainfrom
jtommi:patch-1
Closed

Fix docker compose command in "Running Airflow in Docker"#36611
jtommi wants to merge 1 commit intoapache:mainfrom
jtommi:patch-1

Conversation

@jtommi
Copy link

@jtommi jtommi commented Jan 5, 2024

On the page Running Airflow in Docker
Running docker compose up airflow-init with the provided docker-compose.yaml does not initialize the database.
It will execute the command provided in the docker-compose.yaml followed by /entrypoint, none of which do initialize the database.

What worked for me was to docker compose run airflow-init ("run" instead of "up")

Also (not included in this PR), the message after initialization for me looked like this (Docker-Desktop on Windows 11):

[2024-01-05T09:51:47.658+0000] {override.py:1820} INFO - Added Permission menu access on Permission Pairs to role Admin
[2024-01-05T09:51:47.934+0000] {override.py:1458} INFO - Added user airflow
User "airflow" created with role "Admin"
2.8.0

not like the doc mentions:

airflow-init_1       | Upgrades done
airflow-init_1       | Admin user airflow created
airflow-init_1       | 2.8.0
start_airflow-init_1 exited with code 0

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Running docker compose up airflow-init with the provided docker-compose.yml does not initialize the database since it will execute the command provided in the docker-compose followed by /entrypoint, none of which do initialize the database.
What worked for me was to docker compose run airflow-init instead ("run" instead of "up")
@jtommi jtommi requested a review from potiuk as a code owner January 5, 2024 09:53
@jtommi jtommi changed the title Fix docker compose command Fix docker compose command in "Running Airflow in Docker" Jan 5, 2024
Copy link
Contributor

@Taragolis Taragolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also (not included in this PR), the message after initialization for me looked like this (Docker-Desktop on Windows 11):

I guess output was changed couple versions ago and documentation not reflect this. However the internals should be remaining the same.

.. code-block:: bash

docker compose up airflow-init
docker compose run airflow-init
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker compose run has a different purpose (run a command) if compare to the docker compose up (run a service)

There is no any problem to the start container with up command at least in the CI and docker-compose sample tests as well as in Linux/MacOS, in addition this service would also use later for services dependencies check.

Copy link
Author

@jtommi jtommi Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm running this on Windows 11 with Docker Desktop and docker compose up airflow-init does not initialize the DB, but docker compose run airflow-init does.

in addition this service would also use later for services dependencies check.

That will be run anyway by the depends_on of the webserver and scheduler, so my change in the doc doesn't influence that.
The doc specifically mentions to initialize the DB "Before starting Airflow for the first time", so this is a one-time manual process.

Copy link
Member

@potiuk potiuk Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not against changing it to run.

It quite matches the purpose of run vs. up as described in https://docs.docker.com/compose/faq/#whats-the-difference-between-up-run-and-start

But I think you'd ne need to investigate why up does not work for you. It should. Can you explain HOW it does not work for you? What happens?

Also run has some properties that are different than up - for example it will leave the container behind when you use run, so generally you should also use --rm flag if you want to use run becaause (unlike up) it will create a new container every time you run it , where up will use the same project as the main project you use when you start other services in it.

Also any change here should be acompanied by updating our tests.

We are running docker compose quick start tests in our CI

https://github.com/apache/airflow/blob/main/.github/workflows/ci.yml#L1929C34-L1929C34

and thanks to that both me and @Taragolis believe that it works - because it passes the tests (and that's why we are asking you why it does not work for you and what problems does it solve).

Maybe there is a good reason it does not work for your Windows 11 desktop and it would be good if you could investigate why.

And yes - maybe the conclusion is that it should be changed to use run but if we do, then this PR should also add similar step to our docker-compose tests to make sure that what we have in our docs is also tested in our CI.,

@potiuk
Copy link
Member

potiuk commented Jan 5, 2024

Also (not included in this PR), the message after initialization for me looked like this (Docker-Desktop on Windows 11):

[2024-01-05T09:51:47.658+0000] {override.py:1820} INFO - Added Permission menu access on Permission Pairs to role Admin
[2024-01-05T09:51:47.934+0000] {override.py:1458} INFO - Added user airflow
User "airflow" created with role "Admin"
2.8.0

Yes. Because that's how run works. The output you see in the docs is from up command. The run command is more like docker run command and it will generally not start the service as such but it will run container that belongs to the service and usually run a custom command in it (for example it will not open ports associated with the service) - so output of run command is very different than output from up command. This is quite expected.

@potiuk
Copy link
Member

potiuk commented Jan 5, 2024

And just a general command @jtommi - the fact that something does not work for you - is not always a reason to change things - because it maybe YOU have specific problem which needs to be investigating. But when you develop product you have to think of many people and whenever you do any change you need to consider impact it might on others - that's why we generally have tests for such things and that's why try to do in most portable ways - that we believe is good for others. That's why in order to make such change it's not enough "it works for me" - we need to know why it did not work the other way (Which it should and should be tested) and only if we know why - we should cahnge it (and adapt the tests).

@potiuk
Copy link
Member

potiuk commented Jan 5, 2024

And also another comment - I am not 100% sure it works now - it might it be it does not and our tests simply did not catch it, it's just we need to understand what's going on (and also fix the tests if that's the case).

@potiuk
Copy link
Member

potiuk commented Jan 5, 2024

BTW. By a quick look it is quite possible it does not work.

The aiflow-init container runs airflow version which I think COULD in the past initialize the DB as side effect but this could have changed since. It's quite possible that airflow version does not initialize the DB now (because if it did, then it was a side-effect).

If that's the case then the right solution will be to change this command to one that actually initializes the DB rather than changing the documentation. Because it means that the airflow-init service is simply quite badly defined now.

And it seems that our tests do not actually run airflow-init - so the right fix here will be in this case to change our tests to run the init, then verify if the database have been created and only then run the whole compose setup.

@jtommi
Copy link
Author

jtommi commented Jan 5, 2024

Thanks for all those insights, very helpful.
I'm really sorry to have wasted both of your time @potiuk and @Taragolis
I was pretty confident because doing docker-compose up airflow-init literally said

airflow_init_1  | ERROR: You need to initialize the database. Please run `airflow db init`. Make sure the command is run using Airflow version 2.5.3.
airflow_init_1  | 2.5.3

... and looking into the command and /entrypoint, nowhere it ran db init (which in hindsight makes sense if airflow version does initialize the DB.

I was testing around with 2.5.3 and 2.8.0 thinking I was modifying the relevant things in the docker-compose file, spoiler, I wasn't.
In the end I wasn't able to reproduce the issue in 2.8.0 and for 2.5.3 it failed because I was using _AIRFLOW_DB_MIGRATE: 'true' instead of _AIRFLOW_DB_UPGRADE: 'true'.

That said, the output is still different, but not sure it's worth modifying it, I got:

airflow_init_1  | [2024-01-05T12:52:27.886+0000] {override.py:1458} INFO - Added user airflow
airflow_init_1  | User "airflow" created with role "Admin"
airflow_init_1  | 2.8.0
airflow_init_1  exited with code 0

And also, if all Airflow containers in the docker-compose are dependent on airflow-init, why does the doc specify that you should run it first, since it will run first anyway when you'll do docker compose up

@potiuk
Copy link
Member

potiuk commented Jan 5, 2024

Those are all good questions - and I think they need proper fixes. I think "why" it's not a good question. Better question is "how we can improve it" because as you see - there are some inconsistencies and this is a good opportunity to propose to improve it (but more comepletely, not by just changing up to run). Would you like to take on that task @jtommi ?

@Taragolis
Copy link
Contributor

The aiflow-init container runs airflow version which I think COULD in the past initialize the DB as side effect but this could have changed since.

I guess it still work, and just simple step for operate with airflow's entrypoint environment variables: _AIRFLOW_DB_MIGRATE and
_AIRFLOW_WWW_USER_CREATE, _AIRFLOW_WWW_USER_USERNAME, _AIRFLOW_WWW_USER_PASSWORD

And it seems that our tests do not actually run airflow-init - so the right fix here will be in this case to change our tests to run the init, then verify if the database have been created and only then run the whole compose setup.

It should not explicit run docker-compose up airflow-init because this command required for very old docker compose, something like below 1.29.0 which doesn't support service_completed_successfully depends on instruction

In our tests we run docker compose up -d --wait

run_command([*compose_command, "config"])
run_command([*compose_command, "down", "--volumes", "--remove-orphans"])
run_command([*compose_command, "up", "-d", "--wait"])

And according to the dependencies it should run initialise DB after Postgres started and before scheduler/webserver/worker/triggerer

graph TD;
    postgres-->airflow-init;
    redis-->airflow-init;

    airflow-init-->airflow-webserver;
    airflow-init-->airflow-scheduler;
    airflow-init-->airflow-worker;
    airflow-init-->airflow-triggerer;
    airflow-init-->airflow-cli;
    airflow-init-->flower;

    postgres-->airflow-webserver;
    postgres-->airflow-scheduler;
    postgres-->airflow-worker;
    postgres-->airflow-triggerer;
    postgres-->airflow-cli;
    postgres-->flower;

    redis-->airflow-webserver;
    redis-->airflow-scheduler;
    redis-->airflow-worker;
    redis-->airflow-triggerer;
    redis-->airflow-cli;
    postgres-->flower;
Loading

@potiuk
Copy link
Member

potiuk commented Jan 5, 2024

And according to the dependencies it should run initialise DB after Postgres started and before scheduler/webserver/worker/triggerer

Ah ... So looks like I am still stuck in that very old docker-compose of ours - so should we just .... remove the separate step?

@Taragolis
Copy link
Contributor

I guess so because we have requirements for Docker Compose V2 now.

My personal docker compose file for testing something internal on RC versions of Airflow/Providers also a bit out-date 🤣

BTW, I've tested on on docker compose which released with 2.5.3 and 2.8.0 and it works without any issues with specific this versions.

Airflow 2.5.3 from scratch

curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.5.3/docker-compose.yaml'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 11576  100 11576    0     0  54362      0 --:--:-- --:--:-- --:--:-- 54603docker compose down --volumes --remove-orphans
[+] Running 5/5
 ✔ Container airflow-native-airflow-init-1   Removed                                                                                                                            0.0s 
 ✔ Container airflow-native-postgres-1       Removed                                                                                                                            0.3s 
 ✔ Container airflow-native-redis-1          Removed                                                                                                                            0.3s 
 ✔ Volume airflow-native_postgres-db-volume  Removed                                                                                                                            0.1s 
 ✔ Network airflow-native_default            Removed                                                                                                                            0.1s docker compose up airflow-init
[+] Building 0.0s (0/0)                                                                                                                                         docker:desktop-linux
[+] Running 5/4
 ✔ Network airflow-native_default              Created                                                                                                                          0.0s 
 ✔ Volume "airflow-native_postgres-db-volume"  Created                                                                                                                          0.0s 
 ✔ Container airflow-native-redis-1            Created                                                                                                                          0.0s 
 ✔ Container airflow-native-postgres-1         Created                                                                                                                          0.0s 
 ✔ Container airflow-native-airflow-init-1     Created                                                                                                                          0.0s 
Attaching to airflow-native-airflow-init-1
...
airflow-native-airflow-init-1  | [2024-01-05 19:17:11,067] {manager.py:562} INFO - Added Permission can delete on DAGs to role Admin
airflow-native-airflow-init-1  | [2024-01-05 19:17:11,147] {manager.py:212} INFO - Added user airflow
airflow-native-airflow-init-1  | User "airflow" created with role "Admin"
airflow-native-airflow-init-1  | 2.5.3
airflow-native-airflow-init-1 exited with code 0

Upgrade from 2.5.3 to 2.8.0

curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.8.0/docker-compose.yaml'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10940  100 10940    0     0  65989      0 --:--:-- --:--:-- --:--:-- 65903docker compose up airflow-init
[+] Building 0.0s (0/0)                                                                                                                                         docker:desktop-linux
[+] Running 3/3
 ✔ Container airflow-native-redis-1         Running                                                                                                                             0.0s 
 ✔ Container airflow-native-postgres-1      Running                                                                                                                             0.0s 
 ✔ Container airflow-native-airflow-init-1  Recreated                                                                                                                           0.5s 
Attaching to airflow-native-airflow-init-1
airflow-native-airflow-init-1  | The container is run as root user. For security, consider using a regular user account.
airflow-native-airflow-init-1  | 
airflow-native-airflow-init-1  | DB: postgresql+psycopg2://airflow:***@postgres/airflow
airflow-native-airflow-init-1  | Performing upgrade to the metadata database postgresql+psycopg2://airflow:***@postgres/airflow
airflow-native-airflow-init-1  | [2024-01-05T19:18:36.404+0000] {migration.py:213} INFO - Context impl PostgresqlImpl.
airflow-native-airflow-init-1  | [2024-01-05T19:18:36.405+0000] {migration.py:216} INFO - Will assume transactional DDL.
airflow-native-airflow-init-1  | [2024-01-05T19:18:36.409+0000] {db.py:1615} INFO - Creating tables
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Will assume transactional DDL.
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Running upgrade 290244fb8b83 -> 6abdffdd4815, add dttm index on log table
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Running upgrade 6abdffdd4815 -> 98ae134e6fff, Increase length of user identifier columns in ``ab_user`` and ``ab_register_user`` tables
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Running upgrade 98ae134e6fff -> c804e5c76e3e, Add ``onupdate`` cascade to ``task_map`` table
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Running upgrade c804e5c76e3e -> 937cbd173ca1, Add index to task_instance table
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Running upgrade 937cbd173ca1 -> 788397e78828, Add custom_operator_name column
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Running upgrade 788397e78828 -> 405de8318b3a, add include_deferred column to pool
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Running upgrade 405de8318b3a -> 375a816bbbf4, add new field 'clear_number' to dagrun
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Running upgrade 375a816bbbf4 -> f7bf2a57d0a6, Add owner_display_name to (Audit) Log table
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Running upgrade f7bf2a57d0a6 -> bd5dfbe21f88, Make connection login/password TEXT
airflow-native-airflow-init-1  | INFO  [alembic.runtime.migration] Running upgrade bd5dfbe21f88 -> 10b52ebd31f7, Add processor_subdir to ImportError.
airflow-native-airflow-init-1  | Database migrating done!
airflow-native-airflow-init-1  | /home/airflow/.local/lib/python3.8/site-packages/flask_limiter/extension.py:336 UserWarning: Using the in-memory storage for tracking rate limits as no storage was explicitly specified. This is not recommended for production use. See: https://flask-limiter.readthedocs.io#configuring-a-storage-backend for documentation about configuring the storage backend.
airflow-native-airflow-init-1  | airflow already exist in the db
airflow-native-airflow-init-1  | 2.8.0
airflow-native-airflow-init-1 exited with code 0

Airflow 2.8.0 from scratch

curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.8.0/docker-compose.yaml'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10940  100 10940    0     0  75015      0 --:--:-- --:--:-- --:--:-- 75448docker compose down --volumes --remove-orphans
[+] Running 5/5
 ✔ Container airflow-native-airflow-init-1   Removed                                                                                                                            0.0s 
 ✔ Container airflow-native-redis-1          Removed                                                                                                                            0.3s 
 ✔ Container airflow-native-postgres-1       Removed                                                                                                                            0.2s 
 ✔ Volume airflow-native_postgres-db-volume  Removed                                                                                                                            0.1s 
 ✔ Network airflow-native_default            Removed                                                                                                                            0.1s docker compose up airflow-init
[+] Building 0.0s (0/0)                                                                                                                                         docker:desktop-linux
[+] Running 5/4
 ✔ Network airflow-native_default              Created                                                                                                                          0.0s 
 ✔ Volume "airflow-native_postgres-db-volume"  Created                                                                                                                          0.0s 
 ✔ Container airflow-native-redis-1            Created                                                                                                                          0.1s 
 ✔ Container airflow-native-postgres-1         Created                                                                                                                          0.1s 
 ✔ Container airflow-native-airflow-init-1     Created                                                                                                                          0.0s 
Attaching to airflow-native-airflow-init-1
...
airflow-native-airflow-init-1  | [2024-01-05T19:21:40.257+0000] {override.py:1820} INFO - Added Permission menu access on Permission Pairs to role Admin
airflow-native-airflow-init-1  | [2024-01-05T19:21:40.852+0000] {override.py:1458} INFO - Added user airflow
airflow-native-airflow-init-1  | User "airflow" created with role "Admin"
airflow-native-airflow-init-1  | 2.8.0
airflow-native-airflow-init-1 exited with code 0

@potiuk
Copy link
Member

potiuk commented Jan 5, 2024

So yeah. in this case I'd simply remove the "init" instructions.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Feb 20, 2024
@github-actions github-actions bot closed this Feb 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:documentation stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants