Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Aug 26, 2024

What changes were proposed in this pull request?

This PR aims to use podCreationTimeout instead of podAllocationDelay when getReusablePVCs excludes the newly created PVCs of previous batches.

Why are the changes needed?

K8s control plane pod creation can be delayed due to the unknown reasons. So, podAllocationDelay (default: 1s) is insufficient to say that the previous allocation batch's pods are created with their PVCs. We had better wait until podCreationTimeout.

Does this PR introduce any user-facing change?

This affects only the initial set of executors because the baseline is PVC's getCreationTimestamp. So, this fixes only a buggy situation where a PVC is shared by two executors due to the long pending executor pod.

How was this patch tested?

Pass the CIs with newly updated test cases.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member Author

Thank you, @HyukjinKwon . Merged to master/3.5/3.4.

dongjoon-hyun added a commit that referenced this pull request Aug 26, 2024
…instead of `podAllocationDelay`

This PR aims to use `podCreationTimeout` instead of `podAllocationDelay` when `getReusablePVCs` excludes the newly created PVCs of previous batches.

K8s control plane pod creation can be delayed due to the unknown reasons. So, `podAllocationDelay (default: 1s)` is insufficient to say that the previous allocation batch's pods are created with their PVCs. We had better wait until `podCreationTimeout`.

This affects only the initial set of executors because the baseline is PVC's `getCreationTimestamp`. So, this fixes only a buggy situation where a PVC is shared by two executors due to the long pending executor pod.

Pass the CIs with newly updated test cases.

No.

Closes #47867 from dongjoon-hyun/SPARK-49385.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit f596079)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit that referenced this pull request Aug 26, 2024
…instead of `podAllocationDelay`

This PR aims to use `podCreationTimeout` instead of `podAllocationDelay` when `getReusablePVCs` excludes the newly created PVCs of previous batches.

K8s control plane pod creation can be delayed due to the unknown reasons. So, `podAllocationDelay (default: 1s)` is insufficient to say that the previous allocation batch's pods are created with their PVCs. We had better wait until `podCreationTimeout`.

This affects only the initial set of executors because the baseline is PVC's `getCreationTimestamp`. So, this fixes only a buggy situation where a PVC is shared by two executors due to the long pending executor pod.

Pass the CIs with newly updated test cases.

No.

Closes #47867 from dongjoon-hyun/SPARK-49385.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit f596079)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@dongjoon-hyun dongjoon-hyun deleted the SPARK-49385 branch August 26, 2024 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants