Skip to content

Fix KubernetesPodOperator loses connection to worker pod#52866

Closed
vmtuan12 wants to merge 2 commits intoapache:mainfrom
vmtuan12:feature/fix-kubernetes-executor-pod-event
Closed

Fix KubernetesPodOperator loses connection to worker pod#52866
vmtuan12 wants to merge 2 commits intoapache:mainfrom
vmtuan12:feature/fix-kubernetes-executor-pod-event

Conversation

@vmtuan12
Copy link
Contributor

@vmtuan12 vmtuan12 commented Jul 4, 2025

Sometimes, KubernetesOperator faces urllib3.exceptions.ProtocolError if the pod execution time is too long. Add wait and retry mechanism would fix this, after just 1 or 2 attempts. Detail has been discussed in #52865

Closes: #52865

Test evidence:

I have implemented a KubernetesPodOperator based on the original one, with wait & retry mechanism. And, it works like a charm

Before implementing wait & retry:

image

After implementing wait & retry:

image


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@boring-cyborg boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels Jul 4, 2025
Sometimes, KubernetesPodOperator faces urllib3.exceptions.ProtocolError if the pod execution time is too long. Add wait and retry mechanism would fix this, after just 1 or 2 attempts.
@vmtuan12 vmtuan12 force-pushed the feature/fix-kubernetes-executor-pod-event branch from 35b5a05 to 9b4d842 Compare July 4, 2025 12:18
@potiuk
Copy link
Member

potiuk commented Jul 6, 2025

Yep. Adding test case would be good.

@vmtuan12
Copy link
Contributor Author

vmtuan12 commented Jul 6, 2025

Yep. Adding test case would be good.

Thanks! I have added test

@vmtuan12
Copy link
Contributor Author

vmtuan12 commented Jul 7, 2025

This problem happens occasionally without a rule, so reproducing the exception is not really practicable, therefore adding a test for this is quite hard
cc @potiuk @jedcunningham @hussein-awala

@potiuk
Copy link
Member

potiuk commented Jul 8, 2025

This problem happens occasionally without a rule, so reproducing the exception is not really practicable, therefore adding a test for this is quite hard cc @potiuk @jedcunningham @hussein-awala

You can mock it

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Aug 23, 2025
@github-actions github-actions bot closed this Aug 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues stale Stale PRs per the .github/workflows/stale.yml policy file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KubernetesPodOperator loses connection to worker pod

2 participants