You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This looks like a mistake. It happens to work because there are 5 nodes defined and with 3 nodes in the statefulset, zookeeper will consider that a quorum. However, it is extremely fragile as any outage of a single node will bring the ZK cluster (and hence, the kafka deployment) to hard down; eg bootstrap times out:
$ kafkacat -b k8s.internal.example.com:32401 -L
% ERROR: Failed to acquire metadata: Local: Broker transport failure
Observed log lines from zookeeper:
Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running (org.apache.zookeeper.server.NIOServerCnxn)
I will be testing this theory out soon by removing these two lines and seeing if zk stays happy with a single statefulset node failure.
While I'm here, the statefulsets should be defined with a parallel Pod Management policy so that if eg broker 0 goes down, the statefulset doesn't do a rolling restart of brokers 1+, and the system can recover from multi-node failures faster.
The text was updated successfully, but these errors were encountered:
Does this have to do with #34? Do you have two statefulsets or one? I agree with the argument for parallell pod management, but the dependence on zones through volumes is the essential backing of the argument. #191 is an alternative approach. And given the age of this repository I have to accept legacy.
My mistake; I searched for a 'zoo' deployment but didn't see it because it had been removed from our fork, so yes I only had one zookeeper statefulset. Thanks for the pointers!
Yeah, this is where templating would do wonders in this repo, generating the server.X list in zookeeper conf. I also thought of generating it in the init container based on looking up scale for the two statefulsets, but that'd introduce a lot more complexity. Helm would really obscure things, and I haven't yet had a proper look at kustomize. I'd very much appreciate a contribution.
The example/default configuration file lists 5 servers:
kubernetes-kafka/zookeeper/10zookeeper-config.yml
Line 27 in 727899a
This looks like a mistake. It happens to work because there are 5 nodes defined and with 3 nodes in the statefulset, zookeeper will consider that a quorum. However, it is extremely fragile as any outage of a single node will bring the ZK cluster (and hence, the kafka deployment) to hard down; eg bootstrap times out:
Observed log lines from zookeeper:
I will be testing this theory out soon by removing these two lines and seeing if zk stays happy with a single statefulset node failure.
While I'm here, the statefulsets should be defined with a parallel Pod Management policy so that if eg broker 0 goes down, the statefulset doesn't do a rolling restart of brokers 1+, and the system can recover from multi-node failures faster.
The text was updated successfully, but these errors were encountered: