Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod name incorrect in RenderedTaskInstanceFields #28186

Open
1 of 2 tasks
hterik opened this issue Dec 7, 2022 · 9 comments
Open
1 of 2 tasks

Pod name incorrect in RenderedTaskInstanceFields #28186

hterik opened this issue Dec 7, 2022 · 9 comments

Comments

@hterik
Copy link
Contributor

hterik commented Dec 7, 2022

Apache Airflow version

2.5.0

What happened

  1. Start a task running with KubernetesExecutor
  2. Open the /rendered-k8s?dag_id=...&task_id=... endpoint in the web UI
  3. Copy the metadata.name
  4. kubectl logs pod/$name
  5. Error from server (NotFound): pods "example-sla-dag-sleep-20-1bc7f8d6892849eaac4642c4177e2eab" not found

By inspecting the

What you think should happen instead

Pod name presented in Rendered K8s Pod Spec should match the pod name used for running task. Otherwise pod name should not be presented at all.

This is not only a problem for users of the UI, also in the code it can be useful to have the real pod-id.
I'm in the process of troubleshooting strange errors with task adoption, where there are frequently orphaned tasks found. Currently there appears to be no way to go from TI to pod_id, the adoption-process instead goes the other way, using pod labels to find TIs. So for any TIs that were not found in the adoption, one can not find if they at any point have had matching pods or not.

How to reproduce

No response

Operating System

Ubuntu 22.04

Versions of Apache Airflow Providers

No response

Deployment

Other Docker-based deployment

Deployment details

No response

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@hterik hterik added area:core kind:bug This is a clearly a bug labels Dec 7, 2022
@eladkal
Copy link
Contributor

eladkal commented Dec 7, 2022

Could it be that you can't find the pod because it was deleted when reached final state? Did you set is_delete_operator_pod=False?

@hterik
Copy link
Contributor Author

hterik commented Dec 7, 2022

The pod executing the task is not deleted, it is visible in kubernetes but with a different name.

Pod creation happens in the scheduler, kubernetes_executor.run_pod_async. Here the real name is generated that is sent to kubernetes.

However, if i understand the flow correctly, the RenderedTaskInstanceFields is later created inside the pod itself, via the local executor when it calls taskinstance._execute_task_with_callbacks

            rtif = RenderedTaskInstanceFields(ti=self, render_templates=False)
            RenderedTaskInstanceFields.write(rtif)
            RenderedTaskInstanceFields.delete_old_records(self.task_id, self.dag_id)

RenderedTaskInstanceFields calls ti.render_k8s_pod_yaml() which calls construct_pod, where the pod name is regenerated using make_unique_pod_id, which as the name suggests is random and a new unique value :)

@eladkal
Copy link
Contributor

eladkal commented Dec 8, 2022

So it seems that this effects KubernetesExecutor but not using KPO as stand alone
cc @jedcunningham @dstandish

@eladkal eladkal added provider:cncf-kubernetes Kubernetes provider related issues affected_version:2.5 Issues Reported for 2.5 labels Dec 8, 2022
@csp33
Copy link
Contributor

csp33 commented Dec 23, 2022

I found out that the correct pod name is shown under Task Instance Details -> Hostname.
BTW, issue also affects Airflow 2.4.3
image
image

image

@eladkal
Copy link
Contributor

eladkal commented Dec 24, 2022

ah right. so this is not a bug.. maybe we just need to clarify what pod is shown in render tab for users of Kubernetes executor

@eladkal eladkal added kind:documentation good first issue and removed kind:bug This is a clearly a bug area:core affected_version:2.5 Issues Reported for 2.5 labels Dec 24, 2022
@eladkal
Copy link
Contributor

eladkal commented Dec 24, 2022

@csp33 would you like to raise PR ?

@csp33
Copy link
Contributor

csp33 commented Dec 27, 2022

@eladkal sure! However, I still do not understand what is the pod name listed under the "K8s Pod Spec" tab.
I think its value should match the task hostname.

@jedcunningham
Copy link
Member

That rendered pod spec is essentially the result of regenerating the pod spec on the fly. You'll notice if you refresh you get a new pod name every time.

Similarly, if you look at the pod spec for an old, historical task, you see what the pod would look like if it was ran now, not the spec it used when it was ran. For example, say you updated Airflow versions/images since it was ran, you'd see the new image not the old image in the spec.

@dstandish
Copy link
Contributor

i guess the issue is that, RTIF gets stamped within _run_raw_task. and by the time it gets here, we're already within the pod. so we no longer have the actual pod details.

probably this pod spec does not belong in RTIF. (it's not a templated field, after all) probably it was just a convenient / easy way to implement this feature, nowithstanding the limitations pointed out here.

i suppose one thing we could do is, create the RTIF record earlier for KE (just with pod spec), then in run raw update it with the RTIFs.

another option would be, put more KE info (or if you want to make an abstraction, executor info) on the TI record. we could include spec here. adding namespace and pod name to the TI record would also help with certain logging scenarios.

another option would be add separate table like TIExecutorInfo which is 1-1 with TI and has arbitrary executor info such as this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants