Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Scheduler crashloopbackoff when using hostname_callable #24999

Merged

Conversation

V0lantis
Copy link
Contributor

@V0lantis V0lantis commented Jul 12, 2022

Fix a scheduler crashLoopBackOff when hostname_callable was defined with a function which returned something different than the hostname command from linux.

Context

In our production Airflow, I updated the hostname_callable config with airflow.utils.net.get_host_ip_address so that we could read the pod logs produced by our celery workers. But this new config was making kubernetes to restart the scheduler pods because the command defined in helm was returning a "No alive jobs found."
I therefore updated the cli command jobs to allow the use of --use-hostname-callable in:

airflow jobs check --job-type SchedulerJob --use-hostname-callable

to use the hostname_callable that the user defined in its Airflow conf. Also updated the helm chart to with this new conf since I think it will be more reliable than to use linux hostname command.

airflow/cli/commands/jobs_command.py Outdated Show resolved Hide resolved
chart/templates/_helpers.yaml Outdated Show resolved Hide resolved
@@ -897,6 +897,13 @@ def string_lower_type(val):
help="The hostname of job(s) that will be checked.",
)

ARG_JOB_HOSTNAME_CALLABLE_FILTER = Arg(
("--use-hostname-callable",),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I love use-hostname-callable. Maybe local-only? @dstandish @ephraimbuddy

Copy link
Contributor Author

@V0lantis V0lantis Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a nice suggestion @jedcunningham. I don't like so much use-hostname-callable either, but I am not sure local-only would be very telling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about not adding a new arg but having the hostname arg default to get_hostname?
i.e

ARG_JOB_HOSTNAME_FILTER = Arg(
    ("--hostname",),
    default=get_hostname(),
    type=str,
    help="The hostname of job(s) that will be checked.",
)

The effect of this would be that the CLI check command would always check with hostname instead of none.

As per the name, what do you think about --use-configured-hostname or --use-default-hostname

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds very good, thanks for the proposal. Going to implement this

@V0lantis V0lantis force-pushed the fix/crashloopbackoff_on_scheduler_k8s branch from 5c88862 to dd34bd2 Compare July 16, 2022 17:01
@V0lantis
Copy link
Contributor Author

Thanks both for your review. I finally decided to implement the great suggestion of @ephraimbuddy, giving the get_hostname default argument to the check jobs command. I also created a if ... else condition to use this new functionality only from Airflow 2.4.0 release

@jedcunningham
Copy link
Member

Just adding a default for hostname is a good idea, but it does break backwards compatibility. At the very least we'd need a way do not filter by hostname, and even then I'm not sure about this.

@V0lantis
Copy link
Contributor Author

V0lantis commented Jul 24, 2022

Just adding a default for hostname is a good idea, but it does break backwards compatibility. At the very least we'd need a way do not filter by hostname, and even then I'm not sure about this.

Tank you @jedcunningham. So am I going back with the previous argument I implemented at first (use-hostname-callable local-only or whatever we want to call it)?

@jedcunningham
Copy link
Member

Yeah, I think we need to. I'm having trouble coming up with anything better than --local though.

@V0lantis V0lantis force-pushed the fix/crashloopbackoff_on_scheduler_k8s branch 2 times, most recently from e41c302 to 07f2cc3 Compare August 18, 2022 20:03
@V0lantis
Copy link
Contributor Author

@jedcunningham, I rebased on main, remove the default param on ARG_JOB_HOSTNAME_FILTER. Hope it will suit you.

@V0lantis V0lantis force-pushed the fix/crashloopbackoff_on_scheduler_k8s branch from db3f314 to 6472e54 Compare August 18, 2022 20:07
@V0lantis V0lantis force-pushed the fix/crashloopbackoff_on_scheduler_k8s branch 3 times, most recently from 0e80b7f to 1265559 Compare August 26, 2022 19:53
@V0lantis V0lantis force-pushed the fix/crashloopbackoff_on_scheduler_k8s branch from 1265559 to fb2590f Compare September 3, 2022 07:18
Fix a scheduler crashLoopBackOff when `hostname_callable` was defined
with a function which returned something different than the hostname
command from linux.
@V0lantis V0lantis force-pushed the fix/crashloopbackoff_on_scheduler_k8s branch from fb2590f to 7992f75 Compare September 3, 2022 07:25
@V0lantis V0lantis requested review from jedcunningham and ephraimbuddy and removed request for dstandish and jedcunningham September 3, 2022 07:25
airflow/cli/cli_parser.py Outdated Show resolved Hide resolved
airflow/cli/cli_parser.py Outdated Show resolved Hide resolved
airflow/cli/cli_parser.py Outdated Show resolved Hide resolved
chart/templates/_helpers.yaml Outdated Show resolved Hide resolved
chart/templates/_helpers.yaml Outdated Show resolved Hide resolved
Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
@V0lantis
Copy link
Contributor Author

V0lantis commented Oct 3, 2022

Thank you @jedcunningham

@jedcunningham jedcunningham merged commit 5604121 into apache:main Oct 3, 2022
@jedcunningham jedcunningham added the type:improvement Changelog: Improvements label Oct 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:CLI area:helm-chart Airflow Helm Chart type:improvement Changelog: Improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants