Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check whether the task finishes before deferring the task for KubernetesPodOperatorAsync #1104

Merged
merged 3 commits into from
Jun 7, 2023

Conversation

Lee-W
Copy link
Contributor

@Lee-W Lee-W commented May 16, 2023

Like the SnowflakeOperatorAsync, we try to verify if the submitted job has already completed before deferring it to prevent unnecessary deferring. This way, we can skip deferring the task if it has already been finished.

@Lee-W
Copy link
Contributor Author

Lee-W commented May 16, 2023

圖片

@Lee-W Lee-W force-pushed the KubernetesPodOperatorAsync-check-before-deferring-task branch from e6c08cb to f867f2d Compare May 17, 2023 02:19
@codecov
Copy link

codecov bot commented May 17, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (ebb00d0) 98.58% compared to head (75841ed) 98.58%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1104   +/-   ##
=======================================
  Coverage   98.58%   98.58%           
=======================================
  Files          90       90           
  Lines        5377     5383    +6     
=======================================
+ Hits         5301     5307    +6     
  Misses         76       76           
Impacted Files Coverage Δ
...viders/cncf/kubernetes/operators/kubernetes_pod.py 94.11% <100.00%> (+0.46%) ⬆️
...oviders/cncf/kubernetes/triggers/wait_container.py 98.64% <100.00%> (+0.01%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@Lee-W Lee-W marked this pull request as ready for review May 17, 2023 02:40
@Lee-W Lee-W force-pushed the KubernetesPodOperatorAsync-check-before-deferring-task branch from f867f2d to a2dc9aa Compare May 17, 2023 02:45
@Lee-W Lee-W force-pushed the KubernetesPodOperatorAsync-check-before-deferring-task branch from a2dc9aa to 9bda075 Compare May 30, 2023 04:25
"status": "failed",
"message": self.pod.status.message,
}
return self.trigger_reentry(context=context, event=event)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the event payload that we send to trigger_reentry handled correctly? I am doubting if it is consistent with how we return it from the triggerer and that it gets handled correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder. In the original implementation, I utilized the logic in oss airflow but I made a change yesterday to use the WaitContainerTrigger logic instead.

This has been tested as well

image

@Lee-W Lee-W force-pushed the KubernetesPodOperatorAsync-check-before-deferring-task branch from 9bda075 to 2503200 Compare June 2, 2023 09:49
@Lee-W Lee-W requested a review from pankajkoti June 2, 2023 23:42
self.pod_request_obj = self.build_pod_request_obj(context)
self.pod: k8s.V1Pod = self.get_or_create_pod(self.pod_request_obj, context)
pod_status = self.pod.status.phase
if pod_status in PodPhase.terminal_states or not container_is_running(
pod=self.pod, container_name=self.BASE_CONTAINER_NAME
Copy link
Collaborator

@pankajkoti pankajkoti Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there always be a base container in all the pods? Also, if there is a base container what does it do? I am thinking if there are multiple containers in the pod then should we not check for the running status of all the containers in the pod? Or the base container is meant to keep a check on the running status of other containers in the pod?

If it's not possible to check the status of all containers I think we could just remove the or condition which checks the container status and then the PR looks good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid we have a wrong implementation there with respect to the above questions. It's done a bit differently in the OSS provider, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, I think we implement the logic in different ways

@Lee-W Lee-W force-pushed the KubernetesPodOperatorAsync-check-before-deferring-task branch from 2503200 to 75841ed Compare June 6, 2023 07:58
Copy link
Collaborator

@pankajkoti pankajkoti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong reservation on this. Since this is consistent with the existing trigger implementation, other issues would be outside the scope of the PR.

@Lee-W
Copy link
Contributor Author

Lee-W commented Jun 7, 2023

Thanks! I'll merge it for now. As deferrable mode has been added in OSS airflow, I think we'll eventually do something like #1169

@Lee-W Lee-W merged commit 89ccc7e into main Jun 7, 2023
10 checks passed
@Lee-W Lee-W deleted the KubernetesPodOperatorAsync-check-before-deferring-task branch June 7, 2023 02:10
pankajkoti added a commit that referenced this pull request Jun 22, 2023
…peratorAsync (#1104)"

This reverts commit 89ccc7e.

PR #1104 adds logic to poke for Pod status before putting the status
check on deferral. However, it is observed that our DAG fails always
when we go on checking the status.phase on pods as Pod.status is None
before the pod gets scheduled on a None. So, in most scenarios it
does not make sense to check for the pod status immediately to verify
that the pod has completed it's desired execution and hence, we revert
this poke in case of Kubernetes Pod operator.
pankajkoti added a commit that referenced this pull request Jun 22, 2023
…peratorAsync (#1104)" (#1209)

This reverts commit 89ccc7e.

PR #1104 adds logic to poke for Pod status before putting the status 
check on deferral. However, it is observed that our DAG fails always 
when we go on checking the status.phase on pods as Pod.status is None 
before the pod gets scheduled on a node. So, in most scenarios it does 
not make sense to check for the pod status immediately to verify that the 
pod has completed its desired execution and hence, we revert this poke 
in case of Kubernetes Pod operator.

closes: #1208
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants