-
Notifications
You must be signed in to change notification settings - Fork 47
[SPARK-54010] Support applicationTolerations.restartConfig.restartCounterResetMillis
#405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
### What changes were proposed in this pull request? This PR adds support for automatic restart counter reset based on application attempt duration. The feature introduces a new `restartCounterResetMillis` field in RestartConfig that allows the restart counter to be reset if an application runs successfully for a specified duration before terminating. Also added unit test and enhanced existing test `assertGeneratedCRDMatchesHelmChart` to give diff fore readability. ### Why are the changes needed? With this feature, users can distinguish between persistent failures (quick consecutive crashes) and applications that run for long periods between failures. ### Does this PR introduce _any_ user-facing change? A new optional configuration field restartCounterResetMillis added to the RestartConfig spec. ### How was this patch tested? Added unit test that validates restart counter works as expected. ### Was this patch authored or co-authored using generative AI tooling? No
3d041b3 to
5df833c
Compare
|
cc @peter-toth can you please help to review this ? |
applicationTolerations.restartConfig.restartCounterResetMillis
|
Sorry, for the delay @jiangzho, I can review this PR Thursday or Friday. |
peter-toth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, I have just minor comments.
spark-operator-api/src/main/java/org/apache/spark/k8s/operator/status/ApplicationStatus.java
Show resolved
Hide resolved
d8594da to
a4e4a6d
Compare
|
@dongjoon-hyun , https://issues.apache.org/jira/browse/SPARK-54010 is under the 0.7.0 unbrella. Would you like us to wait with merging this or do you think we can move it to 0.6.0? |
|
Feel free to move, @peter-toth . |
|
Thank you @jiangzho for the fix! Merged to |
What changes were proposed in this pull request?
This PR adds support for automatic restart counter reset based on application attempt duration. The feature introduces a new
restartCounterResetMillisfield in RestartConfig that allows the restart counter to be reset if an application runs successfully for a specified duration before terminating.Also added unit test.
Why are the changes needed?
With this feature, users can distinguish between persistent failures (quick consecutive crashes) and applications that run for long periods between failures.
Does this PR introduce any user-facing change?
A new optional configuration field restartCounterResetMillis added to the RestartConfig spec.
How was this patch tested?
Added unit test that validates restart counter works as expected.
Was this patch authored or co-authored using generative AI tooling?
No