You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
Like agents on Prefect 1, when using Prefect 2 kube workers, make kube events that occur before the pod starts visible via Prefect.
At the moment all we get via Prefect is:
Worker 'KubernetesWorker edf7261c-e388-46f8-a0c5-9eb7de8c7c0f' submitting flow run 'ee662a6b-5303-430f-9ab7-acd5468f5d22' 02:20:25 PM prefect.flow_runs.worker
Creating Kubernetes job... 02:20:26 PM prefect.flow_runs.worker
Completed submission of flow run 'ee662a6b-5303-430f-9ab7-acd5468f5d22' 02:20:26 PM prefect.flow_runs.worker
Job 'beige-stingray-tzkrv': Pod has status 'Pending'. 02:20:27 PM prefect.flow_runs.worker
Job 'beige-stingray-tzkrv': Pod never started. 02:21:26 PM prefect.flow_runs.worker
Reported flow run 'ee662a6b-5303-430f-9ab7-acd5468f5d22' as crashed: Flow run infrastructure exited with non-zero status code -1. 02:21:27 PM prefect.flow_runs.worker
It would be useful to get the kube events so that we can diagnose this from Prefect without having to use kubectl etc.
prefect-kubernetes 0.2.8
Traceback / Example
Example events that are available via kubectl but not prefect:
17m Normal SuccessfulCreate job/beige-stingray-tzkrv Created pod: beige-stingray-tzkrv-vglpm
16m Normal TriggeredScaleUp pod/beige-stingray-tzkrv-vglpm pod triggered scale-up: [{gpu-accelerated-us-east-1a 1->2 (max: 5)}]
16m Warning FailedScheduling pod/beige-stingray-tzkrv-vglpm 0/51 nodes are available: 1 node(s) had taint {sandbox: true}, that the pod didn't tolerate, 1 node(s) were unschedulable, 45 Insufficient cpu, 49 Insufficient nvidia.com/gpu, 7 Insufficient memory.
16m Warning FailedScheduling pod/beige-stingray-tzkrv-vglpm 0/51 nodes are available: 1 node(s) had taint {sandbox: true}, that the pod didn't tolerate, 1 node(s) were unschedulable, 45 Insufficient cpu, 49 Insufficient nvidia.com/gpu, 6 Insufficient memory.
15m Warning FailedScheduling pod/beige-stingray-tzkrv-vglpm 0/51 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 1 node(s) had taint {sandbox: true}, that the pod didn't tolerate, 45 Insufficient cpu, 49 Insufficient nvidia.com/gpu, 6 Insufficient memory.
15m Warning FailedScheduling pod/beige-stingray-tzkrv-vglpm 0/51 nodes are available: 1 node(s) had taint {sandbox: true}, that the pod didn't tolerate, 45 Insufficient cpu, 50 Insufficient nvidia.com/gpu, 6 Insufficient memory.
14m Normal Scheduled pod/beige-stingray-tzkrv-vglpm Successfully assigned awesome-app/beige-stingray-tzkrv-vglpm to ip-10-144-171-199.ec2.internal
I would like to help contribute a pull request to resolve this!
The text was updated successfully, but these errors were encountered:
hi @tekumara - I don't exactly recall how this worked in Prefect 1, but this seems useful! We'll need some internal feedback on how / which events should be obtained / displayed
Expectation / Proposal
Like agents on Prefect 1, when using Prefect 2 kube workers, make kube events that occur before the pod starts visible via Prefect.
At the moment all we get via Prefect is:
It would be useful to get the kube events so that we can diagnose this from Prefect without having to use
kubectl
etc.prefect-kubernetes 0.2.8
Traceback / Example
Example events that are available via kubectl but not prefect:
The text was updated successfully, but these errors were encountered: