Update Kafka PDB to use live Pods instead of the CR Spec #770

alungu · 2022-02-11T10:10:02Z

Q	A
Bug fix?	yes
New feature?	no
API breaks?	no
Deprecations?	no
Related tickets	#769
License	Apache 2.0

What's in this PR?

The Kafka PDB logic is update to consider the total number of healthy Kafka brokers as being the number of Running Kafka pods (instead of the number of Kafka brokers specified in the Kafka CR)

Why?

The previous implementation lead to a faulty PDB during scaling (both in and out) since adding or removing brokers requires some time.

Additional context

Checklist

Implementation tested
Error handling code meets the guideline
Logging code meets the guideline
User guide and development docs updated (if needed)

bartam1

Thank you for the PR!

pkg/resources/kafka/poddisruptionbudget.go

stoader · 2022-03-02T15:22:59Z

@alungu relying on the number of Kafka pods might not be the best approach. For example if you have a 6 broker cluster and all 6 broker pods are up and running thus the PDB:min available will be 5. When a broker pod is deleted for whatever reason outside of koperator that event will trigger a reconcile inkoperator which will update the PDB:min available to 4 as there are only 5 running broker pods found before the missing pod is recreated. After the missing pod comes up in the next reconcile flow PDB:min available will be set back to 5.

If the PDB handling https://github.com/banzaicloud/koperator/blob/master/pkg/resources/kafka/kafka.go#L162 is moved as a last step after https://github.com/banzaicloud/koperator/blob/master/pkg/resources/kafka/kafka.go#L287 than it not might be an issue.

What I'm thinking of is to determine the number of brokers for PDB handling from Status as that contains the list of running brokers and brokers pending graceful upscale/downscale. In this case the PDB handling could stay where it is now: https://github.com/banzaicloud/koperator/blob/master/pkg/resources/kafka/kafka.go#L162

What do you think?

alungu · 2022-03-18T10:37:10Z

@alungu relying on the number of Kafka pods might not be the best approach. For example if you have a 6 broker cluster and all 6 broker pods are up and running thus the PDB:min available will be 5. When a broker pod is deleted for whatever reason outside of koperator that event will trigger a reconcile inkoperator which will update the PDB:min available to 4 as there are only 5 running broker pods found before the missing pod is recreated. After the missing pod comes up in the next reconcile flow PDB:min available will be set back to 5.

If the PDB handling https://github.com/banzaicloud/koperator/blob/master/pkg/resources/kafka/kafka.go#L162 is moved as a last step after https://github.com/banzaicloud/koperator/blob/master/pkg/resources/kafka/kafka.go#L287 than it not might be an issue.

What I'm thinking of is to determine the number of brokers for PDB handling from Status as that contains the list of running brokers and brokers pending graceful upscale/downscale. In this case the PDB handling could stay where it is now: https://github.com/banzaicloud/koperator/blob/master/pkg/resources/kafka/kafka.go#L162

What do you think?

@stoader Thank you for mentioning this issue!
I performed a few more tests with all the options (Spec vs Status vs Pod List). All of them have blind spots, but I agree that on the issue you highlighted with the "Pod List" option.
As such, I will patch this PR to use the KafkaCluster Status instead of the "List Kafka Pods".

alungu · 2022-03-18T11:44:10Z

@bartam1 @stoader could you review the new solution, please? Thanks!

alungu requested a review from a team as a code owner February 11, 2022 10:10

alungu force-pushed the kafka-pdb branch 2 times, most recently from 3a3f558 to 8855602 Compare February 11, 2022 12:40

bartam1 requested changes Feb 22, 2022

View reviewed changes

pkg/resources/kafka/poddisruptionbudget.go Outdated Show resolved Hide resolved

alungu force-pushed the kafka-pdb branch from 8855602 to 9488639 Compare March 18, 2022 11:37

alungu requested a review from bartam1 March 18, 2022 11:43

Update Kafka PDB to use live Pods instead of the CR Spec

147edee

alungu force-pushed the kafka-pdb branch from 7724253 to 147edee Compare March 18, 2022 12:08

stoader approved these changes Mar 21, 2022

View reviewed changes

pregnor approved these changes Mar 21, 2022

View reviewed changes

Merge branch 'master' into kafka-pdb

d13fa58

bartam1 approved these changes Mar 22, 2022

View reviewed changes

stoader merged commit d4967e4 into banzaicloud:master Mar 22, 2022

stoader mentioned this pull request Mar 24, 2022

Kafka PDB (PodDisruptionBudget) is invalid during scaling operations #769

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Kafka PDB to use live Pods instead of the CR Spec #770

Update Kafka PDB to use live Pods instead of the CR Spec #770

alungu commented Feb 11, 2022 •

edited

bartam1 left a comment

stoader commented Mar 2, 2022

alungu commented Mar 18, 2022

alungu commented Mar 18, 2022

Update Kafka PDB to use live Pods instead of the CR Spec #770

Update Kafka PDB to use live Pods instead of the CR Spec #770

Conversation

alungu commented Feb 11, 2022 • edited

What's in this PR?

Why?

Additional context

Checklist

bartam1 left a comment

Choose a reason for hiding this comment

stoader commented Mar 2, 2022

alungu commented Mar 18, 2022

alungu commented Mar 18, 2022

alungu commented Feb 11, 2022 •

edited