Replicate kube events to Prefect that occur before the pod starts #86

tekumara · 2023-08-21T04:43:50Z

Expectation / Proposal

Like agents on Prefect 1, when using Prefect 2 kube workers, make kube events that occur before the pod starts visible via Prefect.

At the moment all we get via Prefect is:

Worker 'KubernetesWorker edf7261c-e388-46f8-a0c5-9eb7de8c7c0f' submitting flow run 'ee662a6b-5303-430f-9ab7-acd5468f5d22' 02:20:25 PM prefect.flow_runs.worker
Creating Kubernetes job... 02:20:26 PM prefect.flow_runs.worker
Completed submission of flow run 'ee662a6b-5303-430f-9ab7-acd5468f5d22' 02:20:26 PM prefect.flow_runs.worker
Job 'beige-stingray-tzkrv': Pod has status 'Pending'. 02:20:27 PM prefect.flow_runs.worker
Job 'beige-stingray-tzkrv': Pod never started. 02:21:26 PM prefect.flow_runs.worker
Reported flow run 'ee662a6b-5303-430f-9ab7-acd5468f5d22' as crashed: Flow run infrastructure exited with non-zero status code -1. 02:21:27 PM prefect.flow_runs.worker

It would be useful to get the kube events so that we can diagnose this from Prefect without having to use kubectl etc.

prefect-kubernetes 0.2.8

Traceback / Example

Example events that are available via kubectl but not prefect:

17m         Normal    SuccessfulCreate    job/beige-stingray-tzkrv              Created pod: beige-stingray-tzkrv-vglpm
16m         Normal    TriggeredScaleUp    pod/beige-stingray-tzkrv-vglpm        pod triggered scale-up: [{gpu-accelerated-us-east-1a 1->2 (max: 5)}]
16m         Warning   FailedScheduling    pod/beige-stingray-tzkrv-vglpm        0/51 nodes are available: 1 node(s) had taint {sandbox: true}, that the pod didn't tolerate, 1 node(s) were unschedulable, 45 Insufficient cpu, 49 Insufficient nvidia.com/gpu, 7 Insufficient memory.
16m         Warning   FailedScheduling    pod/beige-stingray-tzkrv-vglpm        0/51 nodes are available: 1 node(s) had taint {sandbox: true}, that the pod didn't tolerate, 1 node(s) were unschedulable, 45 Insufficient cpu, 49 Insufficient nvidia.com/gpu, 6 Insufficient memory.
15m         Warning   FailedScheduling    pod/beige-stingray-tzkrv-vglpm        0/51 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate, 1 node(s) had taint {sandbox: true}, that the pod didn't tolerate, 45 Insufficient cpu, 49 Insufficient nvidia.com/gpu, 6 Insufficient memory.
15m         Warning   FailedScheduling    pod/beige-stingray-tzkrv-vglpm        0/51 nodes are available: 1 node(s) had taint {sandbox: true}, that the pod didn't tolerate, 45 Insufficient cpu, 50 Insufficient nvidia.com/gpu, 6 Insufficient memory.
14m         Normal    Scheduled           pod/beige-stingray-tzkrv-vglpm        Successfully assigned awesome-app/beige-stingray-tzkrv-vglpm to ip-10-144-171-199.ec2.internal

I would like to help contribute a pull request to resolve this!

The text was updated successfully, but these errors were encountered:

zzstoatzz · 2023-08-21T15:25:47Z

hi @tekumara - I don't exactly recall how this worked in Prefect 1, but this seems useful! We'll need some internal feedback on how / which events should be obtained / displayed

** redacted my previously confused comments 🙂 **

tekumara · 2023-09-10T02:51:14Z

It's possible this is resolved by #90 ... I'd be happy to test this, is there a new release planned soon?

desertaxle · 2024-03-12T17:25:57Z

Addressed by #91

desertaxle closed this as completed Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replicate kube events to Prefect that occur before the pod starts #86

Replicate kube events to Prefect that occur before the pod starts #86

tekumara commented Aug 21, 2023 •

edited

Loading

zzstoatzz commented Aug 21, 2023 •

edited

Loading

tekumara commented Sep 10, 2023

desertaxle commented Mar 12, 2024

Replicate kube events to Prefect that occur before the pod starts #86

Replicate kube events to Prefect that occur before the pod starts #86

Comments

tekumara commented Aug 21, 2023 • edited Loading

Expectation / Proposal

Traceback / Example

zzstoatzz commented Aug 21, 2023 • edited Loading

tekumara commented Sep 10, 2023

desertaxle commented Mar 12, 2024

tekumara commented Aug 21, 2023 •

edited

Loading

zzstoatzz commented Aug 21, 2023 •

edited

Loading