Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Istio containers with Kubernetes Executor Pod adoption #1318

Merged
merged 3 commits into from
Jul 12, 2021

Conversation

kaxil
Copy link

@kaxil kaxil commented Jul 11, 2021

closes https://github.com/astronomer/issues/issues/3030

This edge case deals specifically with task instances that ended in the UP_FOR_RETRY state when a scheduler is adopting orphaned task. Generally, this issue does not affec OSS Airflow since the template kubernetes worker pods spawned doesn't have additional containers that would prevent the pod from going into the Succeeded pod state. Those pods in the Succeeded state are handled by the scheduler's adoption process in _adopt_completed_pods().

Since Astronomer's kubernetes worker pods have an additional container (istio-proxy), they are in the NotReady state when tasks are not killed and they are not eligible for adoption.

This can also happen for "completed" pods that have sidecars. Same process though, just a slightly different scenario: If a worker finishes while not being watched by a scheduler, it never gets adopted by another scheduler in _adopt_completed_pods() as the pod is still 'Running', but the TI also isn't in a resettable state so scheduler_job never asks the executor to adopt it! It's in limbo - "complete" in Airflows view (based on TI state) but "Running" in k8s view (since the sidecar is still running).

This commit re-uses current Istio code and handles those pods.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

closes astronomer/issues#3030

>This edge case deals specifically with task instances that ended in the UP_FOR_RETRY state when a scheduler is adopting orphaned task. Generally, this issue does not affec OSS Airflow since the template kubernetes worker pods spawned doesn't have additional containers that would prevent the pod from going into the Succeeded pod state. Those pods in the Succeeded state are handled by the scheduler's adoption process in _adopt_completed_pods().

Since Astronomer's kubernetes worker pods have an additional container (istio-proxy), they are in the NotReady state when tasks are not killed and they are not eligible for adoption.

This can also happen for "completed" pods that have sidecars. Same process though, just a slightly different scenario: If a worker finishes while not being watched by a scheduler, it never gets adopted by another scheduler in _adopt_completed_pods() as the pod is still 'Running', but the TI also isn't in a resettable state so scheduler_job never asks the executor to adopt it! It's in limbo - "complete" in Airflows view (based on TI state) but "Running" in k8s view (since the sidecar is still running).

This commit re-uses current Istio code and handles those pods.
"""
kwargs = {
'field_selector': "status.phase=Running",
'label_selector': 'kubernetes_executor=True',
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could and probably should filter this further to only checks pod where the label/annotation for Scheduler_job_id != Current job id

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd rather want to look at any scheduler_job_id that isn't still running given HA. Unless I'm mistaken we'd need to loop to do that, and over time it's probably faster to simply iterate all running pods instead.

This does raise the question of a possible race between this adoption logic handling the sidecar and the normal executor though. Makes me wonder if both sides need to be in a try:except (though I think it is more likely to happen on this side). Also with HA, more than 1 could try handling these zombies at nearly the same time too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@@ -607,6 +607,7 @@ def try_adopt_task_instances(self, tis: List[TaskInstance]) -> List[TaskInstance
for pod in pod_list.items:
self.adopt_launched_task(kube_client, pod, pod_ids)
self._adopt_completed_pods(kube_client)
self._handle_zombied_istio_pods(kube_client)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does work, but it does take an extra adoption cycle before it actually handles these pods. Probably fine.

"""
kwargs = {
'field_selector': "status.phase=Running",
'label_selector': 'kubernetes_executor=True',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd rather want to look at any scheduler_job_id that isn't still running given HA. Unless I'm mistaken we'd need to loop to do that, and over time it's probably faster to simply iterate all running pods instead.

This does raise the question of a possible race between this adoption logic handling the sidecar and the normal executor though. Makes me wonder if both sides need to be in a try:except (though I think it is more likely to happen on this side). Also with HA, more than 1 could try handling these zombies at nearly the same time too.

@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@jedcunningham
Copy link
Member

Manually tested the Success scenario and it looks good. (This should also fix UP_FOR_RETRY, but I didn't explicitly test it)

@kaxil kaxil merged commit 3f309b0 into v2-0-0 Jul 12, 2021
@kaxil kaxil deleted the handle-istio-proxy-pod branch July 12, 2021 23:55
kaxil added a commit that referenced this pull request Jul 12, 2021
closes astronomer/issues#3030

>This edge case deals specifically with task instances that ended in the UP_FOR_RETRY state when a scheduler is adopting orphaned task. Generally, this issue does not affec OSS Airflow since the template kubernetes worker pods spawned doesn't have additional containers that would prevent the pod from going into the Succeeded pod state. Those pods in the Succeeded state are handled by the scheduler's adoption process in _adopt_completed_pods().

Since Astronomer's kubernetes worker pods have an additional container (istio-proxy), they are in the NotReady state when tasks are not killed and they are not eligible for adoption.

This can also happen for "completed" pods that have sidecars. Same process though, just a slightly different scenario: If a worker finishes while not being watched by a scheduler, it never gets adopted by another scheduler in _adopt_completed_pods() as the pod is still 'Running', but the TI also isn't in a resettable state so scheduler_job never asks the executor to adopt it! It's in limbo - "complete" in Airflows view (based on TI state) but "Running" in k8s view (since the sidecar is still running).

This commit re-uses current Istio code and handles those pods.

(cherry picked from commit 3f309b0)
kaxil added a commit that referenced this pull request Jul 12, 2021
closes astronomer/issues#3030

>This edge case deals specifically with task instances that ended in the UP_FOR_RETRY state when a scheduler is adopting orphaned task. Generally, this issue does not affec OSS Airflow since the template kubernetes worker pods spawned doesn't have additional containers that would prevent the pod from going into the Succeeded pod state. Those pods in the Succeeded state are handled by the scheduler's adoption process in _adopt_completed_pods().

Since Astronomer's kubernetes worker pods have an additional container (istio-proxy), they are in the NotReady state when tasks are not killed and they are not eligible for adoption.

This can also happen for "completed" pods that have sidecars. Same process though, just a slightly different scenario: If a worker finishes while not being watched by a scheduler, it never gets adopted by another scheduler in _adopt_completed_pods() as the pod is still 'Running', but the TI also isn't in a resettable state so scheduler_job never asks the executor to adopt it! It's in limbo - "complete" in Airflows view (based on TI state) but "Running" in k8s view (since the sidecar is still running).

This commit re-uses current Istio code and handles those pods.

(cherry picked from commit 3f309b0)
kaxil added a commit that referenced this pull request Jul 13, 2021
…#1318)

closes astronomer/issues#3030

>This edge case deals specifically with task instances that ended in the UP_FOR_RETRY state when a scheduler is adopting orphaned task. Generally, this issue does not affec OSS Airflow since the template kubernetes worker pods spawned doesn't have additional containers that would prevent the pod from going into the Succeeded pod state. Those pods in the Succeeded state are handled by the scheduler's adoption process in _adopt_completed_pods().

Since Astronomer's kubernetes worker pods have an additional container (istio-proxy), they are in the NotReady state when tasks are not killed and they are not eligible for adoption.

This can also happen for "completed" pods that have sidecars. Same process though, just a slightly different scenario: If a worker finishes while not being watched by a scheduler, it never gets adopted by another scheduler in _adopt_completed_pods() as the pod is still 'Running', but the TI also isn't in a resettable state so scheduler_job never asks the executor to adopt it! It's in limbo - "complete" in Airflows view (based on TI state) but "Running" in k8s view (since the sidecar is still running).

This commit re-uses current Istio code and handles those pods.

(cherry picked from commit 3f309b0)
kaxil added a commit that referenced this pull request Aug 17, 2021
closes astronomer/issues#3030

>This edge case deals specifically with task instances that ended in the UP_FOR_RETRY state when a scheduler is adopting orphaned task. Generally, this issue does not affec OSS Airflow since the template kubernetes worker pods spawned doesn't have additional containers that would prevent the pod from going into the Succeeded pod state. Those pods in the Succeeded state are handled by the scheduler's adoption process in _adopt_completed_pods().

Since Astronomer's kubernetes worker pods have an additional container (istio-proxy), they are in the NotReady state when tasks are not killed and they are not eligible for adoption.

This can also happen for "completed" pods that have sidecars. Same process though, just a slightly different scenario: If a worker finishes while not being watched by a scheduler, it never gets adopted by another scheduler in _adopt_completed_pods() as the pod is still 'Running', but the TI also isn't in a resettable state so scheduler_job never asks the executor to adopt it! It's in limbo - "complete" in Airflows view (based on TI state) but "Running" in k8s view (since the sidecar is still running).

This commit re-uses current Istio code and handles those pods.

(cherry picked from commit 3f309b0)
(cherry picked from commit 58cfc68)
danielhoherd pushed a commit that referenced this pull request Jan 5, 2023
Istio service mesh is not compatible by default with Kubernetes Jobs.
The normal behavior is that a Job will be started, get an istio-proxy
sidecar attached to it via the istio mutating webhook, run until
completion, then the 'main' container in the pod stops, but istio-proxy
hangs around indefinitely. This change handles cleanly exiting the
Istio sidecar 'istio-proxy' when a Kubernetes Executor task completes.

(cherry picked from commit 84fa48f)
(cherry picked from commit 6ed59bf)
(cherry picked from commit ba60ede)
(cherry picked from commit 80ac218)

Handle Istio containers with Kubernetes Executor Pod adoption (#1318)

closes https://github.com/astronomer/issues/issues/3030

>This edge case deals specifically with task instances that ended in the UP_FOR_RETRY state when a scheduler is adopting orphaned task. Generally, this issue does not affec OSS Airflow since the template kubernetes worker pods spawned doesn't have additional containers that would prevent the pod from going into the Succeeded pod state. Those pods in the Succeeded state are handled by the scheduler's adoption process in _adopt_completed_pods().

Since Astronomer's kubernetes worker pods have an additional container (istio-proxy), they are in the NotReady state when tasks are not killed and they are not eligible for adoption.

This can also happen for "completed" pods that have sidecars. Same process though, just a slightly different scenario: If a worker finishes while not being watched by a scheduler, it never gets adopted by another scheduler in _adopt_completed_pods() as the pod is still 'Running', but the TI also isn't in a resettable state so scheduler_job never asks the executor to adopt it! It's in limbo - "complete" in Airflows view (based on TI state) but "Running" in k8s view (since the sidecar is still running).

This commit re-uses current Istio code and handles those pods.

(cherry picked from commit 3f309b0)
(cherry picked from commit 58cfc68)
(cherry picked from commit 92a8289)

[astro] Fix istio sidecar shutdown on newer GKE

Newer GKE verions have started to emit multiple running events for a
given pod with the sidecar still being shown as running. We will put
retries around shutting down the sidecar and also check the current
status of the sidecar, not just the status at the time of the event.

e.g: GKE > 1.18.20.901

(cherry picked from commit cbd50ef)
(cherry picked from commit d1025e1)
(cherry picked from commit d56ba74)
(cherry picked from commit 11a80ae)
(cherry picked from commit 1f0e8be)
(cherry picked from commit 20b0bad)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants