Add config to control Kubernetes Client retry behaviour by hterik · Pull Request #26710 · apache/airflow

hterik · 2022-09-27T14:40:49Z

Occasionally, a request to the Kubernetes API might fail due to temporary network glitches. By default, such requests are retried 3 times, without any delay between.
On the final failure, the entire scheduler crashes.

This configuration allows the urllib retry behaviour to be adjusted, mainly to allow some backoff in between each retry, giving the network time to recover before the final attempt.

Fixes #24748

Occasionally, a request to the Kubernetes API might fail due to temporary network glitches. By default, such requests are retried 3 times, without any delay between. On the final failure, the entire scheduler crashes. This configuration allows the urllib retry behaviour to be adjusted, mainly to allow some backoff in between each retry, giving the network time to recover before the final attempt. Fixes apache#24748

hterik

Will need to work on adding tests and running the existing tests. Haven't done that yet, would first like some feedback on the design.

hterik · 2022-09-27T14:44:25Z

airflow/kubernetes/kube_client.py

-        else:
-            configuration = Configuration()
-        configuration.verify_ssl = False
-        Configuration.set_default(configuration)


Is it ok to remove this set_default and only rely on every other code path going through get_kube_client?

I see in pod_generator and TaskInstance creates ApiClient in many places, but only uses it for offline operations.

It's also created by hooks.kubernetes, haven't looked into what that does yet.

Maybe it's safer to keep it this way and incorporate the new config using this same method?

hterik · 2022-09-27T14:46:21Z

airflow/kubernetes/kube_client.py

+
+    retryparams = conf.getjson('kubernetes', 'client_retry_configuration_kwargs', fallback={})
+    if retryparams != {}:
+        client_config.retries = urllib3.util.Retry(**retryparams)


Is this level of configuration granularity good? Or is it enough to only expose the backoff and number?
I could even go as far as saying some kind of backoff should be enabled by default, without configuration.

github-actions · 2022-11-12T00:14:23Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

hterik · 2022-11-14T06:26:35Z

stale ping

github-actions · 2022-12-31T00:10:38Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

dinedal · 2023-06-08T17:11:38Z

Hi, this is a problem for us in production still, can we make forward progress here?

hterik · 2023-10-05T05:33:57Z

Hi, this is a problem for us in production still, can we make forward progress here?

It was fixed in #29809 instead

boring-cyborg bot added the provider:cncf-kubernetes Kubernetes (k8s) provider related issues label Sep 27, 2022

hterik commented Sep 27, 2022

View reviewed changes

hterik mentioned this pull request Oct 7, 2022

Kubernetes scheduler crashes on transient Kubernets API 500 errors. #21465

Closed

2 tasks

github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Nov 12, 2022

github-actions bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Nov 15, 2022

github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Dec 31, 2022

github-actions bot closed this Jan 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add config to control Kubernetes Client retry behaviour#26710

Add config to control Kubernetes Client retry behaviour#26710
hterik wants to merge 1 commit intoapache:mainfrom
hterik:k8sretrybackoff

hterik commented Sep 27, 2022

Uh oh!

hterik left a comment

Uh oh!

hterik Sep 27, 2022

Uh oh!

hterik Sep 27, 2022

Uh oh!

github-actions bot commented Nov 12, 2022

Uh oh!

hterik commented Nov 14, 2022

Uh oh!

github-actions bot commented Dec 31, 2022

Uh oh!

dinedal commented Jun 8, 2023

Uh oh!

hterik commented Oct 5, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

hterik commented Sep 27, 2022

Uh oh!

hterik left a comment

Choose a reason for hiding this comment

Uh oh!

hterik Sep 27, 2022

Choose a reason for hiding this comment

Uh oh!

hterik Sep 27, 2022

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 12, 2022

Uh oh!

hterik commented Nov 14, 2022

Uh oh!

github-actions bot commented Dec 31, 2022

Uh oh!

dinedal commented Jun 8, 2023

Uh oh!

hterik commented Oct 5, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants