Fix Zookeeper persistence #227

solsson · 2018-12-01T19:39:13Z

Topic information, though not the contents kept in Kafka, would be lost if all zookeeper pods had been down at the same time. Only the snapshots were actually saved to the persistent volume.

According to https://zookeeper.apache.org/doc/r3.4.13/zookeeperAdmin.html#sc_dataFileManagement "ZooKeeper can recover using this snapshot".

The regression probably dates back to ccb9e5d which was released with v2.0.0.

Fixes #89, "logs" which are actually data would end up outside the mount. Zookeeper's startup logs are more clear than the property file entries: INFO Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /var/lib/zookeeper/log/version-2 snapdir /var/lib/zookeeper/data/version-2

solsson · 2018-12-01T20:24:00Z

It looks like a change in mount path can not be kubectl applyd, which means that an upgrade can't restart pod by pod. Thus upgrades must probably try to preserve data with the new mount path. I've tested the following flow once:

kubectl replace -f zookeeper/10zookeeper-config.yml
kubectl -n kafka exec zoo-1 -- bash -c 'cd /var/lib/zookeeper && mkdir data/data && mv data/myid data/version-2 data/data/ && cp -r log data/'
kubectl -n kafka exec zoo-0 -- bash -c 'cd /var/lib/zookeeper && mkdir data/data && mv data/myid data/version-2 data/data/ && cp -r log data/'
kubectl delete -f zookeeper/51zoo.yml && kubectl apply -f zookeeper/51zoo.yml

kubectl -n kafka exec pzoo-2 -- bash -c 'cd /var/lib/zookeeper && mkdir data/data && mv data/myid data/version-2 data/data/ && cp -r log data/'
kubectl -n kafka exec pzoo-1 -- bash -c 'cd /var/lib/zookeeper && mkdir data/data && mv data/myid data/version-2 data/data/ && cp -r log data/'
kubectl -n kafka exec pzoo-0 -- bash -c 'cd /var/lib/zookeeper && mkdir data/data && mv data/myid data/version-2 data/data/ && cp -r log data/'
kubectl delete -f zookeeper/50pzoo.yml && kubectl apply -f zookeeper/50pzoo.yml

It could be a good idea to stop all kafka brokers before doing this. I found no method to stop zk in a way that didn't trigger pod restart.

solsson · 2018-12-01T20:25:03Z

@pavel-agarkov Care to test the above?

pavel-agarkov · 2018-12-01T20:28:51Z

Sure! But probably tomorrow since it is already midnight in my timezone.

solsson · 2018-12-01T20:55:34Z

The upgrade path would have been smoother if log dir was put inside snapshot dir, but they recommend against that in https://zookeeper.apache.org/doc/r3.4.13/zookeeperAdmin.html#sc_dataFileManagement. It could be that some setups add a separate volume for /var/lib/zookeeper/log instead.

Maybe the safest way is to back up /var/lib/zookeeper, add a sleep infinity at the top of init.sh and restart. Back up from inside the cluster is probably preferred, but kubectl -n kafka cp zoo-1:/var/lib/zookeeper zoo-1 is also possible. Also note that it's best to experiment with the zoo statefulset first as it has a minority of nodes.

This is still an arbitrary number

pavel-agarkov · 2018-12-03T12:33:55Z

Took me a while to make it work on my single node setup.
I don't know why it worked previously but now it didn't until I changed

# maxClientCnxns changed from 1 to 2
zookeeper.properties: |-
  ...
  maxClientCnxns=2
  ...

and some other fixes probably also related to the single node setup.

But I will know for sure how it works only after a few days of natural nodes killing 😅

EDIT: looks like you have removed this line but it somehow reappeared in my fork after the merge...

solsson · 2018-12-03T14:56:14Z

@pavel-agarkov

maxClientCnxns=2

The line is still there. See #230. I should make the change you suggested and release again... but... Did you see any error message that was specific about hitting this limit? That's what I wanted to experience before I started raising the limit.

pavel-agarkov · 2018-12-03T15:01:55Z

Yes, the whole zookeeper's log was filled with:

[2018-12-03 08:39:29,911] WARN Too many connections from /10.40.21.8 - max is 1 (org.apache.zookeeper.server.NIOServerCnxnFactory)

here is what was before that warning:

[2018-12-03 08:31:33,892] INFO Established session 0x100024b03900000 with negotiated timeout 6000 for client /10.40.21.8:44568 (org.apache.zookeeper.server.ZooKeeperServer)
[2018-12-03 08:31:43,680] INFO Got user-level KeeperException when processing sessionid:0x100024b03900000 type:setData cxid:0x41 zxid:0x1e txntype:-1 reqpath:n/a Error Path:/config/topics/__consumer_offsets Error:KeeperErrorCode = NoNode for /config/topics/__consumer_offsets (org.apache.zookeeper.server.PrepRequestProcessor)

solsson · 2018-12-03T23:24:50Z

Do you know which kind of pods that came from, like 10.40.21.8 in your log output?

pavel-agarkov · 2018-12-04T07:45:15Z

I haven't investigated it at that time and I failed to find it in logs now.

pavel-agarkov · 2018-12-09T06:32:49Z

It still works well after a week of pods killing. Topics are not being lost any more.
Thank you.

solsson added 3 commits December 1, 2018 21:31

The init container must have the same mount path

7c589cc

Fixes start from empty volume

8c23624

Lists this fix release as it's important

c43c6dd

Got OOMKilled a couple of times during maintenance with no load

de6e737

This is still an arbitrary number

solsson mentioned this pull request Dec 2, 2018

Upgrade path for the zookeeper persistence issue #228

Merged

solsson changed the base branch from master to 4.3.x December 2, 2018 13:11

solsson changed the base branch from 4.3.x to master December 2, 2018 13:12

solsson mentioned this pull request Dec 2, 2018

Java based producer fails to re-connect after upgrade #229

Open

Lists the 4.x zookeeper fix as well

57d8200

solsson merged commit e1d3f6f into master Dec 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Zookeeper persistence #227

Fix Zookeeper persistence #227

solsson commented Dec 1, 2018

solsson commented Dec 1, 2018 •

edited

solsson commented Dec 1, 2018

pavel-agarkov commented Dec 1, 2018

solsson commented Dec 1, 2018

pavel-agarkov commented Dec 3, 2018 •

edited

solsson commented Dec 3, 2018

pavel-agarkov commented Dec 3, 2018 •

edited

solsson commented Dec 3, 2018

pavel-agarkov commented Dec 4, 2018

pavel-agarkov commented Dec 9, 2018

Fix Zookeeper persistence #227

Fix Zookeeper persistence #227

Conversation

solsson commented Dec 1, 2018

solsson commented Dec 1, 2018 • edited

solsson commented Dec 1, 2018

pavel-agarkov commented Dec 1, 2018

solsson commented Dec 1, 2018

pavel-agarkov commented Dec 3, 2018 • edited

solsson commented Dec 3, 2018

pavel-agarkov commented Dec 3, 2018 • edited

solsson commented Dec 3, 2018

pavel-agarkov commented Dec 4, 2018

pavel-agarkov commented Dec 9, 2018

solsson commented Dec 1, 2018 •

edited

pavel-agarkov commented Dec 3, 2018 •

edited

pavel-agarkov commented Dec 3, 2018 •

edited