Skip to content

Commit 8814cf1

Browse files
committed
feedback2 jpbetz
1 parent 0f7dde5 commit 8814cf1

File tree

1 file changed

+14
-6
lines changed
  • keps/sig-api-machinery/5366-graceful-leader-transition

1 file changed

+14
-6
lines changed

keps/sig-api-machinery/5366-graceful-leader-transition/README.md

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -293,19 +293,26 @@ Risk 1: Resource exhaustion: Memory leaks may exist in the processes that were
293293
previously masked by doing a full shutdown and restart loop.
294294

295295
- Severity: Medium high
296-
- Controllers will continue to function (potentially in degraded state due to lack of resources), and may be restarted frequently. However, cluster should continue to function.
296+
- Controllers will continue to function (potentially in degraded state due to
297+
lack of resources), and may be restarted frequently. However, cluster should
298+
continue to function.
297299

298-
Risk 2: Wedged KCM: There is a risk that controllers and the
299-
scheduler are not properly respecting context shutdowns. This can either result in multiple instances of controllers running or no instances running despite the lock being held.
300+
Risk 2: Wedged KCM: There is a risk that controllers and the scheduler are not
301+
properly respecting context shutdowns. This can either result in multiple
302+
instances of controllers running or no instances running despite the lock being
303+
held.
300304

301-
- Severity: Extreme
302-
- Breaking mutual exclusion guarantees can put the cluster into a non-desirable state. A manual user intervention is possible but if the problem is triggered due to a problematic component, the issue will resurface and the best path for mitigation is to turn off the feature.
305+
- Severity: High
306+
- Breaking mutual exclusion guarantees can put the cluster into a non-desirable
307+
state. A manual user intervention is possible but if the problem is triggered
308+
due to a problematic component, the issue will resurface and the best path for
309+
mitigation is to turn off the feature.
303310

304311
Risk 3: Futureproofing: An additional risk is that even if all the current code
305312
is safe and respects shutting down gracefully, new controllers/modifications to
306313
kcm or scheduler could create subtle problems in shutdown and transition.
307314

308-
- Severity: Medium
315+
- Severity: High
309316
- Leads to either risk 1 or 2.
310317

311318

@@ -447,6 +454,7 @@ Will test that feature enablement will still result in a functional cluster.
447454
#### Beta
448455

449456
- e2e tests
457+
- Address how to minimize risks of putting KCM or scheduler in a "wedged" state
450458

451459
#### GA
452460

0 commit comments

Comments
 (0)