Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better diagnostics and self-healing of docker-compose #17484

Merged
merged 1 commit into from Aug 9, 2021

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Aug 7, 2021

There are several ways people might get the quick-start
docker-compose running messed up (especially on linux):

  1. they do not run initialization steps and run docker-compose-up
  2. they do not run docker-compose-init first

Also on MacOS/Windows default memory/disk settings are not
enough to run Airflow via docker-compose and people are reporting
"Airflow not working" where they simply do not allocate enough
resources.

Finally the docker compose does not support all versions of airflow
and various problems might occur when you use this
docker compose with old version of airflow.

This change adds the following improvements:

  • automated check of minimum version of airflow supported
  • mkdir -p in the directories creation in instructions
  • automated checking if AIRFLOW_UID has been set (and printing
    error and instruction link in case it is not)
  • prints warning about too-low memory, cpu, disk allocation
    and instruction link where to read about it
  • automated fixing of ownership of the directories created in
    case they were not created initially and ended up owned by
    root user

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@potiuk
Copy link
Member Author

potiuk commented Aug 7, 2021

This should help in cases like #16325

There were many more examples like that. Rather than continue explaining people, I've added diagnostics of such problems and sent people to documentation in such cases. And added self-healing if someone manages to create directories owned by root.

Screenshot 2021-08-07 15 42 44

@potiuk potiuk added this to the Airflow 2.1.3 milestone Aug 7, 2021
@potiuk potiuk force-pushed the self-healing-docker-compose branch from bbfc1a8 to 95f6790 Compare August 7, 2021 14:28
@github-actions
Copy link

github-actions bot commented Aug 7, 2021

The PR is likely ready to be merged. No tests are needed as no important environment files, nor python files were modified by it. However, committers might decide that full test matrix is needed and add the 'full tests needed' label. Then you should rebase it to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the okay to merge It's ok to merge this PR as it does not require more tests label Aug 7, 2021
@mik-laj
Copy link
Member

mik-laj commented Aug 7, 2021

I will check it, test it and review it tomorrow. I need some time to get Linux machine.

@potiuk
Copy link
Member Author

potiuk commented Aug 8, 2021

Just use GCloud one :)

@potiuk potiuk force-pushed the self-healing-docker-compose branch 2 times, most recently from 336aeb5 to 4a049ce Compare August 8, 2021 10:46
@potiuk
Copy link
Member Author

potiuk commented Aug 8, 2021

I actually upgraded it quite a bit . We also recently had MANY problems with people not having enough resources (mainly memory) to run docker compose (especially for MacOS/Windows). So I copied the diagnostics we have in Breeze and added it to the same init script. I also updated the script and put it in YAML with minimum escaping needed (just $$ in case of env variables) and proper indenting, so that it's easier to reason about and modify / copy&paste for testing.

The message is also slightly updated to print what is wrong (including missing AIRFLOW_UID) followed by instructions in white and link in separate line for easier ctrl+clicking/copying.

image

Same with resource warnings. They are in Yellow (and do not exit with error) but they also print instruction link in separate line:

image

@potiuk potiuk force-pushed the self-healing-docker-compose branch from 4a049ce to 7df9bc6 Compare August 8, 2021 10:50
@potiuk potiuk force-pushed the self-healing-docker-compose branch 2 times, most recently from 80f5d7b to 40ad85b Compare August 8, 2021 19:59
@potiuk
Copy link
Member Author

potiuk commented Aug 8, 2021

I just double-checked and indeed Airflow 2.0.0 an 2.0.1 would not work. I have added adjustment and min-version check that will work for all versions. So I've added an extra check that returns error message in such case:

image

@potiuk
Copy link
Member Author

potiuk commented Aug 8, 2021

Seems the version check works nice - also (as a side effect) the checks for resources /IUID/GID happen before the database initialization and it will make it faster to give feedback

@potiuk potiuk force-pushed the self-healing-docker-compose branch from 40ad85b to 37def2f Compare August 8, 2021 21:25
@potiuk
Copy link
Member Author

potiuk commented Aug 8, 2021

Also added some instructions on how to cleanup the docker-compose environment.

@mik-laj
Copy link
Member

mik-laj commented Aug 8, 2021

This docker-compose file support at least Airflow 2.1, because we use airflow jobs check command.

Airflow 2.1.0, 2021-05-21

- Add ``airflow jobs check`` CLI command to check health of jobs (Scheduler etc) (#14519)

@potiuk
Copy link
Member Author

potiuk commented Aug 9, 2021

This docker-compose file support at least Airflow 2.1, because we use airflow jobs check command.

Aha - so that's one more reason why version check should be there. I saw a question from user who used our docker compose and containers were marked as unhealthy - I was not able to help then, but now when you are explaining that - I think it could be that the user used 2.0.2. I checked that 2.0.2 was starting with the docker-compose but likely then health check would not work.

So explicit check of version is definitely good. I will bump it to 2.1.0 minimum then

@potiuk potiuk force-pushed the self-healing-docker-compose branch 3 times, most recently from e1b15dd to 7b0dd6c Compare August 9, 2021 07:50
There are several ways people might get the quick-start
docker-compose running messed up (especially on linux):

1) they do not run initialization steps and run docker-compose-up
2) they do not run docker-compose-init first

Also on MacOS/Windows default memory/disk settings are not
enough to run Airflow via docker-compose and people are reporting
"Airflow not working" where they simply do not allocate enough
resources.

Finally the docker compose does not support all versions of airflow
and various problems might occur when you use this
docker compose with old version of airflow.

This change adds the following improvements:

* automated check of minimum version of airflow supported
* mkdir -p in the directories creation in instructions
* automated checking if AIRFLOW_UID has been set (and printing
  error and instruction link in case it is not)
* prints warning about too-low memory, cpu, disk allocation
  and instruction link where to read about it
* automated fixing of ownership of the directories created in
  case they were not created initially and ended up owned by
  root user
@potiuk potiuk force-pushed the self-healing-docker-compose branch from 7b0dd6c to d6607a4 Compare August 9, 2021 07:53
@potiuk potiuk merged commit 763860c into apache:main Aug 9, 2021
@potiuk potiuk deleted the self-healing-docker-compose branch August 9, 2021 08:41
potiuk added a commit that referenced this pull request Aug 9, 2021
There are several ways people might get the quick-start
docker-compose running messed up (especially on linux):

1) they do not run initialization steps and run docker-compose-up
2) they do not run docker-compose-init first

Also on MacOS/Windows default memory/disk settings are not
enough to run Airflow via docker-compose and people are reporting
"Airflow not working" where they simply do not allocate enough
resources.

Finally the docker compose does not support all versions of airflow
and various problems might occur when you use this
docker compose with old version of airflow.

This change adds the following improvements:

* automated check of minimum version of airflow supported
* mkdir -p in the directories creation in instructions
* automated checking if AIRFLOW_UID has been set (and printing
  error and instruction link in case it is not)
* prints warning about too-low memory, cpu, disk allocation
  and instruction link where to read about it
* automated fixing of ownership of the directories created in
  case they were not created initially and ended up owned by
  root user

(cherry picked from commit 763860c)
jhtimmins pushed a commit that referenced this pull request Aug 11, 2021
There are several ways people might get the quick-start
docker-compose running messed up (especially on linux):

1) they do not run initialization steps and run docker-compose-up
2) they do not run docker-compose-init first

Also on MacOS/Windows default memory/disk settings are not
enough to run Airflow via docker-compose and people are reporting
"Airflow not working" where they simply do not allocate enough
resources.

Finally the docker compose does not support all versions of airflow
and various problems might occur when you use this
docker compose with old version of airflow.

This change adds the following improvements:

* automated check of minimum version of airflow supported
* mkdir -p in the directories creation in instructions
* automated checking if AIRFLOW_UID has been set (and printing
  error and instruction link in case it is not)
* prints warning about too-low memory, cpu, disk allocation
  and instruction link where to read about it
* automated fixing of ownership of the directories created in
  case they were not created initially and ended up owned by
  root user

(cherry picked from commit 763860c)
kaxil pushed a commit that referenced this pull request Aug 17, 2021
There are several ways people might get the quick-start
docker-compose running messed up (especially on linux):

1) they do not run initialization steps and run docker-compose-up
2) they do not run docker-compose-init first

Also on MacOS/Windows default memory/disk settings are not
enough to run Airflow via docker-compose and people are reporting
"Airflow not working" where they simply do not allocate enough
resources.

Finally the docker compose does not support all versions of airflow
and various problems might occur when you use this
docker compose with old version of airflow.

This change adds the following improvements:

* automated check of minimum version of airflow supported
* mkdir -p in the directories creation in instructions
* automated checking if AIRFLOW_UID has been set (and printing
  error and instruction link in case it is not)
* prints warning about too-low memory, cpu, disk allocation
  and instruction link where to read about it
* automated fixing of ownership of the directories created in
  case they were not created initially and ended up owned by
  root user

(cherry picked from commit 763860c)
jhtimmins pushed a commit that referenced this pull request Aug 17, 2021
There are several ways people might get the quick-start
docker-compose running messed up (especially on linux):

1) they do not run initialization steps and run docker-compose-up
2) they do not run docker-compose-init first

Also on MacOS/Windows default memory/disk settings are not
enough to run Airflow via docker-compose and people are reporting
"Airflow not working" where they simply do not allocate enough
resources.

Finally the docker compose does not support all versions of airflow
and various problems might occur when you use this
docker compose with old version of airflow.

This change adds the following improvements:

* automated check of minimum version of airflow supported
* mkdir -p in the directories creation in instructions
* automated checking if AIRFLOW_UID has been set (and printing
  error and instruction link in case it is not)
* prints warning about too-low memory, cpu, disk allocation
  and instruction link where to read about it
* automated fixing of ownership of the directories created in
  case they were not created initially and ended up owned by
  root user

(cherry picked from commit 763860c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:documentation okay to merge It's ok to merge this PR as it does not require more tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants