Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[K8S] Support to cleanup the spark driver pod after application terminates for retain period #5714

Closed
wants to merge 3 commits into from

Conversation

turboFei
Copy link
Member

@turboFei turboFei commented Nov 16, 2023

🔍 Description

Describe Your Solution 🔧

As title, support to cleanup the application pod after application terminates for retain period

Types of changes 🔖

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Test Plan 🧪

Test locally.


Checklists

📝 Author Self Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • This patch was not authored or co-authored using Generative Tooling

📝 Committer Pre-Merge Checklist

  • Pull request title is okay.
  • No license issues.
  • Milestone correctly set?
  • Test coverage is ok
  • Assignees are selected.
  • Minimum number of approvals
  • No changes are requested

Be nice. Be informative.

@turboFei turboFei force-pushed the kill_k8s_pod branch 2 times, most recently from 7a98c92 to f7191aa Compare November 16, 2023 09:32
@turboFei turboFei self-assigned this Nov 16, 2023
@turboFei turboFei added this to the v1.8.1 milestone Nov 16, 2023
@pan3793 pan3793 changed the title [KUBERNETES] Support to cleanup the application pod after application terminates for retain period [K8S] Support to cleanup the application pod after application terminates for retain period Nov 16, 2023
@turboFei turboFei changed the title [K8S] Support to cleanup the application pod after application terminates for retain period [K8S] Support to cleanup the spark driver pod after application terminates for retain period Nov 16, 2023
@codecov-commenter
Copy link

Codecov Report

Attention: 20 lines in your changes are missing coverage. Please review.

Comparison is base (23f32cf) 61.43% compared to head (1e07878) 61.38%.

Files Patch % Lines
...kyuubi/engine/KubernetesApplicationOperation.scala 13.04% 20 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #5714      +/-   ##
============================================
- Coverage     61.43%   61.38%   -0.06%     
  Complexity       23       23              
============================================
  Files           607      607              
  Lines         35735    35755      +20     
  Branches       4896     4900       +4     
============================================
- Hits          21955    21948       -7     
- Misses        11402    11415      +13     
- Partials       2378     2392      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@turboFei turboFei closed this in 88fae49 Nov 16, 2023
turboFei added a commit that referenced this pull request Nov 16, 2023
…plication terminates for retain period

# 🔍 Description

## Describe Your Solution 🔧

As title, support to cleanup the application pod after application terminates for retain period

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

Test locally.

---

# Checklists
## 📝 Author Self Checklist

- [x] My code follows the [style guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html) of this project
- [x] I have performed a self-review
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

## 📝 Committer Pre-Merge Checklist

- [x] Pull request title is okay.
- [x] No license issues.
- [x] Milestone correctly set?
- [ ] Test coverage is ok
- [x] Assignees are selected.
- [x] Minimum number of approvals
- [ ] No changes are requested

**Be nice. Be informative.**

Closes #5714 from turboFei/kill_k8s_pod.

Closes #5714

1e07878 [fwang12] doc
0c9ff1a [fwang12] cleanup pod
ab95d4c [fwang12] save

Authored-by: fwang12 <fwang12@ebay.com>
Signed-off-by: fwang12 <fwang12@ebay.com>
(cherry picked from commit 88fae49)
Signed-off-by: fwang12 <fwang12@ebay.com>
@turboFei turboFei deleted the kill_k8s_pod branch November 16, 2023 11:33
@turboFei
Copy link
Member Author

thanks, merged to master and branch-1.8

buildConf("kyuubi.kubernetes.spark.deleteDriverPodOnTermination.enabled")
.doc("If set to true then Kyuubi server will delete the spark driver pod after " +
s"the application terminates for ${KUBERNETES_TERMINATED_APPLICATION_RETAIN_PERIOD.key}.")
.version("1.8.1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feature may not mature for 1.8.1, after another thought:

  • my previous suggestion is bad, which causes the configuration name misleading due to the Pod delete moment.
  • users may want to only delete the Pod completed with exit code 0 but skip those abnormal exited ones, thus they can get more information to debug later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me think again and send a follow-up PR tomorrow(or weekend)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

users may want to only delete the Pod completed with exit code 0 but skip those abnormal exited ones, thus they can get more information to debug later.

Agree with this, maybe we can change this boolean config to enum policy config, such as NONE, ONLY_COMPLETED, ALL

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, we need also check whether the ApplicationInfo.name is null.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for NOT_FOUND, it might cause NPE.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for NPE, #5718

turboFei added a commit that referenced this pull request Nov 17, 2023
… pod

# 🔍 Description
#5714 followup to prevent NPE when deleting pod.
## Issue References 🔗

This pull request fixes #

## Describe Your Solution 🔧

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

## Types of changes 🔖

- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests

---

# Checklists
## 📝 Author Self Checklist

- [x] My code follows the [style guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html) of this project
- [ ] I have performed a self-review
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

## 📝 Committer Pre-Merge Checklist

- [x] Pull request title is okay.
- [ ] No license issues.
- [ ] Milestone correctly set?
- [ ] Test coverage is ok
- [ ] Assignees are selected.
- [ ] Minimum number of approvals
- [ ] No changes are requested

**Be nice. Be informative.**

Closes #5718 from turboFei/npe_fix.

Closes #5714

74349a3 [fwang12] nit
7093ed8 [Cheng Pan] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationOperation.scala
558381e [fwang12] fix npe
454be88 [fwang12] prevent npe

Lead-authored-by: fwang12 <fwang12@ebay.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: fwang12 <fwang12@ebay.com>
turboFei added a commit that referenced this pull request Nov 17, 2023
… pod

# 🔍 Description
#5714 followup to prevent NPE when deleting pod.
## Issue References 🔗

This pull request fixes #

## Describe Your Solution 🔧

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

## Types of changes 🔖

- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests

---

# Checklists
## 📝 Author Self Checklist

- [x] My code follows the [style guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html) of this project
- [ ] I have performed a self-review
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

## 📝 Committer Pre-Merge Checklist

- [x] Pull request title is okay.
- [ ] No license issues.
- [ ] Milestone correctly set?
- [ ] Test coverage is ok
- [ ] Assignees are selected.
- [ ] Minimum number of approvals
- [ ] No changes are requested

**Be nice. Be informative.**

Closes #5718 from turboFei/npe_fix.

Closes #5714

74349a3 [fwang12] nit
7093ed8 [Cheng Pan] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/KubernetesApplicationOperation.scala
558381e [fwang12] fix npe
454be88 [fwang12] prevent npe

Lead-authored-by: fwang12 <fwang12@ebay.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: fwang12 <fwang12@ebay.com>
(cherry picked from commit d4fa6fd)
Signed-off-by: fwang12 <fwang12@ebay.com>
pan3793 pushed a commit that referenced this pull request Nov 20, 2023
…ified cleanup strategy

# 🔍 Description

## Describe Your Solution 🔧

A new feature introduced from #5714 supports kyuubi to clean up spark driver pods automatically, but all pod would be clean up without considering app's terminated state.
This PR make user can chose which pods should be delete by setting up a cleanup strategy.

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

Test locally.

---

# Checklists
## 📝 Author Self Checklist

- [x] My code follows the [style guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html) of this project
- [x] I have performed a self-review
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

## 📝 Committer Pre-Merge Checklist

- [x] Pull request title is okay.
- [ ] No license issues.
- [ ] Milestone correctly set?
- [ ] Test coverage is ok
- [ ] Assignees are selected.
- [ ] Minimum number of approvals
- [ ] No changes are requested

**Be nice. Be informative.**

Closes #5728 from liaoyt/master.

Closes #5731

d2cc8cb [yeatsliao] regenerate docs
4caf8b1 [yeatsliao] rename conf 'KUBERNETES_SPARK_DELETE_DRIVER_POD_ON_TERMINATION' to 'KUBERNETES_SPARK_CLEANUP_TERMINATED_DRIVER_POD'
4d970fa [yeatsliao] [K8S] Support to cleanup the spark driver pod with specified clean up strategy

Authored-by: yeatsliao <liaoyt66066@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
pan3793 pushed a commit that referenced this pull request Nov 20, 2023
…ified cleanup strategy

# 🔍 Description

## Describe Your Solution 🔧

A new feature introduced from #5714 supports kyuubi to clean up spark driver pods automatically, but all pod would be clean up without considering app's terminated state.
This PR make user can chose which pods should be delete by setting up a cleanup strategy.

## Types of changes 🔖

- [ ] Bugfix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

Test locally.

---

# Checklists
## 📝 Author Self Checklist

- [x] My code follows the [style guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html) of this project
- [x] I have performed a self-review
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

## 📝 Committer Pre-Merge Checklist

- [x] Pull request title is okay.
- [ ] No license issues.
- [ ] Milestone correctly set?
- [ ] Test coverage is ok
- [ ] Assignees are selected.
- [ ] Minimum number of approvals
- [ ] No changes are requested

**Be nice. Be informative.**

Closes #5728 from liaoyt/master.

Closes #5731

d2cc8cb [yeatsliao] regenerate docs
4caf8b1 [yeatsliao] rename conf 'KUBERNETES_SPARK_DELETE_DRIVER_POD_ON_TERMINATION' to 'KUBERNETES_SPARK_CLEANUP_TERMINATED_DRIVER_POD'
4d970fa [yeatsliao] [K8S] Support to cleanup the spark driver pod with specified clean up strategy

Authored-by: yeatsliao <liaoyt66066@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
(cherry picked from commit dc03687)
Signed-off-by: Cheng Pan <chengpan@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants