Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Zookeeper persistence #227

Merged
merged 6 commits into from
Dec 2, 2018
Merged

Fix Zookeeper persistence #227

merged 6 commits into from
Dec 2, 2018

Conversation

solsson
Copy link
Contributor

@solsson solsson commented Dec 1, 2018

Topic information, though not the contents kept in Kafka, would be lost if all zookeeper pods had been down at the same time. Only the snapshots were actually saved to the persistent volume.

According to https://zookeeper.apache.org/doc/r3.4.13/zookeeperAdmin.html#sc_dataFileManagement "ZooKeeper can recover using this snapshot".

The regression probably dates back to ccb9e5d which was released with v2.0.0.

Fixes #89, "logs" which are actually data would end up outside the mount.

Zookeeper's startup logs are more clear than the property file entries:
INFO Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /var/lib/zookeeper/log/version-2 snapdir /var/lib/zookeeper/data/version-2
@solsson
Copy link
Contributor Author

solsson commented Dec 1, 2018

It looks like a change in mount path can not be kubectl applyd, which means that an upgrade can't restart pod by pod. Thus upgrades must probably try to preserve data with the new mount path. I've tested the following flow once:

kubectl replace -f zookeeper/10zookeeper-config.yml
kubectl -n kafka exec zoo-1 -- bash -c 'cd /var/lib/zookeeper && mkdir data/data && mv data/myid data/version-2 data/data/ && cp -r log data/'
kubectl -n kafka exec zoo-0 -- bash -c 'cd /var/lib/zookeeper && mkdir data/data && mv data/myid data/version-2 data/data/ && cp -r log data/'
kubectl delete -f zookeeper/51zoo.yml && kubectl apply -f zookeeper/51zoo.yml

kubectl -n kafka exec pzoo-2 -- bash -c 'cd /var/lib/zookeeper && mkdir data/data && mv data/myid data/version-2 data/data/ && cp -r log data/'
kubectl -n kafka exec pzoo-1 -- bash -c 'cd /var/lib/zookeeper && mkdir data/data && mv data/myid data/version-2 data/data/ && cp -r log data/'
kubectl -n kafka exec pzoo-0 -- bash -c 'cd /var/lib/zookeeper && mkdir data/data && mv data/myid data/version-2 data/data/ && cp -r log data/'
kubectl delete -f zookeeper/50pzoo.yml && kubectl apply -f zookeeper/50pzoo.yml

It could be a good idea to stop all kafka brokers before doing this. I found no method to stop zk in a way that didn't trigger pod restart.

@solsson
Copy link
Contributor Author

solsson commented Dec 1, 2018

@pavel-agarkov Care to test the above?

@pavel-agarkov
Copy link

Sure! But probably tomorrow since it is already midnight in my timezone.

@solsson
Copy link
Contributor Author

solsson commented Dec 1, 2018

The upgrade path would have been smoother if log dir was put inside snapshot dir, but they recommend against that in https://zookeeper.apache.org/doc/r3.4.13/zookeeperAdmin.html#sc_dataFileManagement. It could be that some setups add a separate volume for /var/lib/zookeeper/log instead.

Maybe the safest way is to back up /var/lib/zookeeper, add a sleep infinity at the top of init.sh and restart. Back up from inside the cluster is probably preferred, but kubectl -n kafka cp zoo-1:/var/lib/zookeeper zoo-1 is also possible. Also note that it's best to experiment with the zoo statefulset first as it has a minority of nodes.

@solsson solsson changed the base branch from master to 4.3.x December 2, 2018 13:11
@solsson solsson changed the base branch from 4.3.x to master December 2, 2018 13:12
@solsson solsson merged commit e1d3f6f into master Dec 2, 2018
@pavel-agarkov
Copy link

pavel-agarkov commented Dec 3, 2018

Took me a while to make it work on my single node setup.
I don't know why it worked previously but now it didn't until I changed

# maxClientCnxns changed from 1 to 2
zookeeper.properties: |-
  ...
  maxClientCnxns=2
  ...

and some other fixes probably also related to the single node setup.

But I will know for sure how it works only after a few days of natural nodes killing 😅

EDIT: looks like you have removed this line but it somehow reappeared in my fork after the merge...

@solsson
Copy link
Contributor Author

solsson commented Dec 3, 2018

@pavel-agarkov

maxClientCnxns=2

The line is still there. See #230. I should make the change you suggested and release again... but... Did you see any error message that was specific about hitting this limit? That's what I wanted to experience before I started raising the limit.

@pavel-agarkov
Copy link

pavel-agarkov commented Dec 3, 2018

Yes, the whole zookeeper's log was filled with:

[2018-12-03 08:39:29,911] WARN Too many connections from /10.40.21.8 - max is 1 (org.apache.zookeeper.server.NIOServerCnxnFactory)

here is what was before that warning:

[2018-12-03 08:31:33,892] INFO Established session 0x100024b03900000 with negotiated timeout 6000 for client /10.40.21.8:44568 (org.apache.zookeeper.server.ZooKeeperServer)
[2018-12-03 08:31:43,680] INFO Got user-level KeeperException when processing sessionid:0x100024b03900000 type:setData cxid:0x41 zxid:0x1e txntype:-1 reqpath:n/a Error Path:/config/topics/__consumer_offsets Error:KeeperErrorCode = NoNode for /config/topics/__consumer_offsets (org.apache.zookeeper.server.PrepRequestProcessor)

@solsson
Copy link
Contributor Author

solsson commented Dec 3, 2018

Do you know which kind of pods that came from, like 10.40.21.8 in your log output?

@pavel-agarkov
Copy link

I haven't investigated it at that time and I failed to find it in logs now.

@pavel-agarkov
Copy link

It still works well after a week of pods killing. Topics are not being lost any more.
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants