Fix docker compose command in "Running Airflow in Docker"#36611
Fix docker compose command in "Running Airflow in Docker"#36611jtommi wants to merge 1 commit intoapache:mainfrom
Conversation
Running docker compose up airflow-init with the provided docker-compose.yml does not initialize the database since it will execute the command provided in the docker-compose followed by /entrypoint, none of which do initialize the database.
What worked for me was to docker compose run airflow-init instead ("run" instead of "up")
Taragolis
left a comment
There was a problem hiding this comment.
Also (not included in this PR), the message after initialization for me looked like this (Docker-Desktop on Windows 11):
I guess output was changed couple versions ago and documentation not reflect this. However the internals should be remaining the same.
| .. code-block:: bash | ||
|
|
||
| docker compose up airflow-init | ||
| docker compose run airflow-init |
There was a problem hiding this comment.
docker compose run has a different purpose (run a command) if compare to the docker compose up (run a service)
There is no any problem to the start container with up command at least in the CI and docker-compose sample tests as well as in Linux/MacOS, in addition this service would also use later for services dependencies check.
There was a problem hiding this comment.
I'm running this on Windows 11 with Docker Desktop and docker compose up airflow-init does not initialize the DB, but docker compose run airflow-init does.
in addition this service would also use later for services dependencies check.
That will be run anyway by the depends_on of the webserver and scheduler, so my change in the doc doesn't influence that.
The doc specifically mentions to initialize the DB "Before starting Airflow for the first time", so this is a one-time manual process.
There was a problem hiding this comment.
I am not against changing it to run.
It quite matches the purpose of run vs. up as described in https://docs.docker.com/compose/faq/#whats-the-difference-between-up-run-and-start
But I think you'd ne need to investigate why up does not work for you. It should. Can you explain HOW it does not work for you? What happens?
Also run has some properties that are different than up - for example it will leave the container behind when you use run, so generally you should also use --rm flag if you want to use run becaause (unlike up) it will create a new container every time you run it , where up will use the same project as the main project you use when you start other services in it.
Also any change here should be acompanied by updating our tests.
We are running docker compose quick start tests in our CI
https://github.com/apache/airflow/blob/main/.github/workflows/ci.yml#L1929C34-L1929C34
and thanks to that both me and @Taragolis believe that it works - because it passes the tests (and that's why we are asking you why it does not work for you and what problems does it solve).
Maybe there is a good reason it does not work for your Windows 11 desktop and it would be good if you could investigate why.
And yes - maybe the conclusion is that it should be changed to use run but if we do, then this PR should also add similar step to our docker-compose tests to make sure that what we have in our docs is also tested in our CI.,
Yes. Because that's how |
|
And just a general command @jtommi - the fact that something does not work for |
|
And also another comment - I am not 100% sure it works now - it might it be it does not and our tests simply did not catch it, it's just we need to understand what's going on (and also fix the tests if that's the case). |
|
BTW. By a quick look it is quite possible it does not work. The If that's the case then the right solution will be to change this command to one that actually initializes the DB rather than changing the documentation. Because it means that the And it seems that our tests do not actually run |
|
Thanks for all those insights, very helpful. ... and looking into the command and /entrypoint, nowhere it ran I was testing around with 2.5.3 and 2.8.0 thinking I was modifying the relevant things in the docker-compose file, spoiler, I wasn't. That said, the output is still different, but not sure it's worth modifying it, I got: And also, if all Airflow containers in the docker-compose are dependent on airflow-init, why does the doc specify that you should run it first, since it will run first anyway when you'll do |
|
Those are all good questions - and I think they need proper fixes. I think "why" it's not a good question. Better question is "how we can improve it" because as you see - there are some inconsistencies and this is a good opportunity to propose to improve it (but more comepletely, not by just changing |
I guess it still work, and just simple step for operate with airflow's entrypoint environment variables:
It should not explicit run In our tests we run airflow/docker_tests/test_docker_compose_quick_start.py Lines 140 to 142 in 5b4e950 And according to the dependencies it should run initialise DB after Postgres started and before scheduler/webserver/worker/triggerer graph TD;
postgres-->airflow-init;
redis-->airflow-init;
airflow-init-->airflow-webserver;
airflow-init-->airflow-scheduler;
airflow-init-->airflow-worker;
airflow-init-->airflow-triggerer;
airflow-init-->airflow-cli;
airflow-init-->flower;
postgres-->airflow-webserver;
postgres-->airflow-scheduler;
postgres-->airflow-worker;
postgres-->airflow-triggerer;
postgres-->airflow-cli;
postgres-->flower;
redis-->airflow-webserver;
redis-->airflow-scheduler;
redis-->airflow-worker;
redis-->airflow-triggerer;
redis-->airflow-cli;
postgres-->flower;
|
Ah ... So looks like I am still stuck in that very old docker-compose of ours - so should we just .... remove the separate step? |
|
I guess so because we have requirements for Docker Compose V2 now. My personal docker compose file for testing something internal on RC versions of Airflow/Providers also a bit out-date 🤣 BTW, I've tested on on docker compose which released with 2.5.3 and 2.8.0 and it works without any issues with specific this versions. Airflow 2.5.3 from scratch ❯ curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.5.3/docker-compose.yaml'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 11576 100 11576 0 0 54362 0 --:--:-- --:--:-- --:--:-- 54603
❯ docker compose down --volumes --remove-orphans
[+] Running 5/5
✔ Container airflow-native-airflow-init-1 Removed 0.0s
✔ Container airflow-native-postgres-1 Removed 0.3s
✔ Container airflow-native-redis-1 Removed 0.3s
✔ Volume airflow-native_postgres-db-volume Removed 0.1s
✔ Network airflow-native_default Removed 0.1s
❯ docker compose up airflow-init
[+] Building 0.0s (0/0) docker:desktop-linux
[+] Running 5/4
✔ Network airflow-native_default Created 0.0s
✔ Volume "airflow-native_postgres-db-volume" Created 0.0s
✔ Container airflow-native-redis-1 Created 0.0s
✔ Container airflow-native-postgres-1 Created 0.0s
✔ Container airflow-native-airflow-init-1 Created 0.0s
Attaching to airflow-native-airflow-init-1
...
airflow-native-airflow-init-1 | [2024-01-05 19:17:11,067] {manager.py:562} INFO - Added Permission can delete on DAGs to role Admin
airflow-native-airflow-init-1 | [2024-01-05 19:17:11,147] {manager.py:212} INFO - Added user airflow
airflow-native-airflow-init-1 | User "airflow" created with role "Admin"
airflow-native-airflow-init-1 | 2.5.3
airflow-native-airflow-init-1 exited with code 0Upgrade from 2.5.3 to 2.8.0 ❯ curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.8.0/docker-compose.yaml'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 10940 100 10940 0 0 65989 0 --:--:-- --:--:-- --:--:-- 65903
❯ docker compose up airflow-init
[+] Building 0.0s (0/0) docker:desktop-linux
[+] Running 3/3
✔ Container airflow-native-redis-1 Running 0.0s
✔ Container airflow-native-postgres-1 Running 0.0s
✔ Container airflow-native-airflow-init-1 Recreated 0.5s
Attaching to airflow-native-airflow-init-1
airflow-native-airflow-init-1 | The container is run as root user. For security, consider using a regular user account.
airflow-native-airflow-init-1 |
airflow-native-airflow-init-1 | DB: postgresql+psycopg2://airflow:***@postgres/airflow
airflow-native-airflow-init-1 | Performing upgrade to the metadata database postgresql+psycopg2://airflow:***@postgres/airflow
airflow-native-airflow-init-1 | [2024-01-05T19:18:36.404+0000] {migration.py:213} INFO - Context impl PostgresqlImpl.
airflow-native-airflow-init-1 | [2024-01-05T19:18:36.405+0000] {migration.py:216} INFO - Will assume transactional DDL.
airflow-native-airflow-init-1 | [2024-01-05T19:18:36.409+0000] {db.py:1615} INFO - Creating tables
airflow-native-airflow-init-1 | INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
airflow-native-airflow-init-1 | INFO [alembic.runtime.migration] Will assume transactional DDL.
airflow-native-airflow-init-1 | INFO [alembic.runtime.migration] Running upgrade 290244fb8b83 -> 6abdffdd4815, add dttm index on log table
airflow-native-airflow-init-1 | INFO [alembic.runtime.migration] Running upgrade 6abdffdd4815 -> 98ae134e6fff, Increase length of user identifier columns in ``ab_user`` and ``ab_register_user`` tables
airflow-native-airflow-init-1 | INFO [alembic.runtime.migration] Running upgrade 98ae134e6fff -> c804e5c76e3e, Add ``onupdate`` cascade to ``task_map`` table
airflow-native-airflow-init-1 | INFO [alembic.runtime.migration] Running upgrade c804e5c76e3e -> 937cbd173ca1, Add index to task_instance table
airflow-native-airflow-init-1 | INFO [alembic.runtime.migration] Running upgrade 937cbd173ca1 -> 788397e78828, Add custom_operator_name column
airflow-native-airflow-init-1 | INFO [alembic.runtime.migration] Running upgrade 788397e78828 -> 405de8318b3a, add include_deferred column to pool
airflow-native-airflow-init-1 | INFO [alembic.runtime.migration] Running upgrade 405de8318b3a -> 375a816bbbf4, add new field 'clear_number' to dagrun
airflow-native-airflow-init-1 | INFO [alembic.runtime.migration] Running upgrade 375a816bbbf4 -> f7bf2a57d0a6, Add owner_display_name to (Audit) Log table
airflow-native-airflow-init-1 | INFO [alembic.runtime.migration] Running upgrade f7bf2a57d0a6 -> bd5dfbe21f88, Make connection login/password TEXT
airflow-native-airflow-init-1 | INFO [alembic.runtime.migration] Running upgrade bd5dfbe21f88 -> 10b52ebd31f7, Add processor_subdir to ImportError.
airflow-native-airflow-init-1 | Database migrating done!
airflow-native-airflow-init-1 | /home/airflow/.local/lib/python3.8/site-packages/flask_limiter/extension.py:336 UserWarning: Using the in-memory storage for tracking rate limits as no storage was explicitly specified. This is not recommended for production use. See: https://flask-limiter.readthedocs.io#configuring-a-storage-backend for documentation about configuring the storage backend.
airflow-native-airflow-init-1 | airflow already exist in the db
airflow-native-airflow-init-1 | 2.8.0
airflow-native-airflow-init-1 exited with code 0Airflow 2.8.0 from scratch ❯ curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.8.0/docker-compose.yaml'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 10940 100 10940 0 0 75015 0 --:--:-- --:--:-- --:--:-- 75448
❯ docker compose down --volumes --remove-orphans
[+] Running 5/5
✔ Container airflow-native-airflow-init-1 Removed 0.0s
✔ Container airflow-native-redis-1 Removed 0.3s
✔ Container airflow-native-postgres-1 Removed 0.2s
✔ Volume airflow-native_postgres-db-volume Removed 0.1s
✔ Network airflow-native_default Removed 0.1s
❯ docker compose up airflow-init
[+] Building 0.0s (0/0) docker:desktop-linux
[+] Running 5/4
✔ Network airflow-native_default Created 0.0s
✔ Volume "airflow-native_postgres-db-volume" Created 0.0s
✔ Container airflow-native-redis-1 Created 0.1s
✔ Container airflow-native-postgres-1 Created 0.1s
✔ Container airflow-native-airflow-init-1 Created 0.0s
Attaching to airflow-native-airflow-init-1
...
airflow-native-airflow-init-1 | [2024-01-05T19:21:40.257+0000] {override.py:1820} INFO - Added Permission menu access on Permission Pairs to role Admin
airflow-native-airflow-init-1 | [2024-01-05T19:21:40.852+0000] {override.py:1458} INFO - Added user airflow
airflow-native-airflow-init-1 | User "airflow" created with role "Admin"
airflow-native-airflow-init-1 | 2.8.0
airflow-native-airflow-init-1 exited with code 0 |
|
So yeah. in this case I'd simply remove the "init" instructions. |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
On the page Running Airflow in Docker
Running
docker compose up airflow-initwith the provided docker-compose.yaml does not initialize the database.It will execute the command provided in the docker-compose.yaml followed by /entrypoint, none of which do initialize the database.
What worked for me was to
docker compose run airflow-init("run" instead of "up")Also (not included in this PR), the message after initialization for me looked like this (Docker-Desktop on Windows 11):
not like the doc mentions:
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.