New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent KubernetesPodOperator
from finding pods with the wrong name
#25882
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
a53a349
to
9580170
Compare
Can you also add a unit test please? |
Happy to! I am playing around with the testing suite for the first time. I would like to acknowledge a logical flaw in this PR. The name gets mutated with a hash string at the end, so we will need to account for that. If I create a pod like this: KubernetesPodOperator(
*args,
name="example",
**kwargs,
) The resulting pod name may look like this:
I wonder if the this mutated name is accessible somewhere (need to keep digging ⛏), or I can just
|
Looking at the existing test code below, it looks like we are safe to just use
|
ae746db
to
5b99dde
Compare
Not yet tested, as I am still learning breeze
@RNHTTR I wrote a draft of some Kubernetes test code. I have not tried running it yet, because I am still learning how to use the testing suite. I went ahead and pushed a commit, so others could see and maybe help me out. I piggy-backed on another similar test ( Here is my pseudocode of the essential functionality that should be tested:
|
I'm not super familiar with the inner workings of KPO, but it mostly looks good at first glance. A couple nits:
|
elif num_pods == 1 and pod_list[0].metadata.name.startswith(self.name): | ||
pod = pod_list[0] | ||
self.log.info("Found matching pod %s with labels %s", pod.metadata.name, pod.metadata.labels) | ||
self.log.info("`try_number` of task_instance: %s", context['ti'].try_number) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also log something if a pod is found but under the wrong name so the user can understand the context better when debugging.
@@ -370,11 +370,11 @@ def find_pod( | |||
label_selector=label_selector, | |||
).items | |||
|
|||
pod = None | |||
pod: Optional[k8s.V1Pod] = None | |||
num_pods = len(pod_list) | |||
if num_pods > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it also mean: if more than one pod are found here, we can actually use the name to match & find the Pod we desire and then proceed, rather than raising an exception?
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
Did this make it in? What is blocking it? I can take it to completion if nobody is working on it actively @XD-DENG |
Changes
This PR adds a condition to the method
find_pod
to verify that the name of the pod matches the name defined by the user. This will ensure two pods with different names but identical context do not collide.Issue
Two (or more) pods created with
KubernetesPodOperator
from identical context (namespace
,dag_id
,task_id
,run_id
, andmap_index
) across different Airflow environments (e.g. staging + production) collide, causing race conditions. This happens because the methodfind_pods
inKubernetesPodOperator
sees these pods as identical.At first, I thought this was a name collision, so I prepended the string
stg-
, orprd-
(based on an Airflow Variable) to the pod name. It seems this did not fix my problem; however it made the problem easier to debug.As you can see in the logs below, instead of creating the pod
prd-example-9ba5a14b73ab41988c1c57afe5ec81f4
, KPO found the "matching" podstg-example-ce40ace00c704b87a56241b87bc4ff47
.I would like to see the
find_pod
method fail to find a pod in this situation, because the name does not match what the user defined.Test
Run two Airflow projects with identical DAGs (same
dag_id
and schedule args) with an identical task (sametask_id
) like below:Change the argument
name
, so it is unique across your Airflow instances.Trigger these DAGs at approximately the same time for the same date (ensure
run_id
is the same). Then observe logs for race condition behavior described above.Astronomer Support
This relates to Astronomer Support ticket #11647.
Notes
I am a new contributor, so feel free to educate me on contribution best practices.