ZOOKEEPER-2700 add JMX `takeSnapshot` method and Jetty Admin `snap` command to take snapshot #180

flier · 2017-02-17T13:58:10Z

When cold backup or remote offline sync Zookeeper instances, we need the latest snapshot.

Add a four letter snap command to force Zookeeper to generate snapshot.

revans2 · 2017-02-17T15:12:00Z

src/java/main/org/apache/zookeeper/server/command/SnapCommand.java

+        if (!isZKServerRunning()) {
+            pw.println(ZK_NOT_SERVING);
+        } else {
+            Thread snapInProcess = new ZooKeeperThread("Snapshot Thread") {


Wouldn't this also be a potential DoS attack? Especially with launching a new thread in the background for the snapshot. Could we put in some sort of throttling on taking the snapshot, so that we don't do it too frequently?

Since this PR is targeting master I suggest considering the option of adding a snap API to ZooKeeperAdmin, which is recently introduced to harden security around dynamic reconfiguration. ZooKeeperAdmin supports all sorts of authentications built in ZK and we can extend it such that only admin (or any users that explicitly being granted admin access to cluster) can issue snap command.

Also general comments about adding features to ZK: when you add a new feature, please also add tests :)

@hanm @revans2 cool we are on the same page: https://issues.apache.org/jira/browse/ZOOKEEPER-2700?focusedCommentId=15871945&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15871945 😄

@hanm FYI, I suggested to add as a JMX option too.

@eribeiro Yes that is one option - another would be Jetty AdminServer.

@revans2 now, only generate snapshot when idle and last zxid changed.

@hanm I think snap should be a command to impact only one instance, no an administration task to all instance in cluster.

@eribeiro added a takeSnapshot method to JMX

eribeiro · 2017-02-17T15:43:10Z

src/java/main/org/apache/zookeeper/server/command/SnapCommand.java

+            Thread snapInProcess = new ZooKeeperThread("Snapshot Thread") {
+                public void run() {
+                    try {
+                        zkServer.takeSnapshot();


Couldn't this call potentially enter on a race condition with this snippet below?

zookeeper/src/java/main/org/apache/zookeeper/server/SyncRequestProcessor.java

Lines 132 to 141 in ec20c54

snapInProcess = new ZooKeeperThread("Snapshot Thread") {

public void run() {

try {

zks.takeSnapshot();

} catch(Exception e) {

LOG.warn("Unexpected exception", e);

}

}

};

snapInProcess.start();

I guess we would need some way of returning early if there's a snapshot process already in progress.

I agree. It may be nice to have some check that generally make sure multiple snapshots cannot happen at the same time, as a sanity check.

I have wrapped the manual requests from snap command or JMX with a tryTakeSnapshot method, which will skip action when there are busy. If you believe it is necessary, we could merge the check into takeSnapshot, which may impact all the server workflow.

flier · 2017-02-20T03:56:30Z

According the review comments and ZOOKEEPER-1729, I have submited another patch to add JMX takeSnapshot method and Jetty Admin snap command to take snapshot, with test cases

eribeiro · 2017-02-20T08:55:21Z

src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java

@@ -126,6 +125,9 @@
    private final ZooKeeperServerListener listener;
    private ZooKeeperServerShutdownHandler zkShutdownHandler;

+    private volatile long lastSnapshotZxid;
+    private AtomicInteger isGeneratingSnapshot = new AtomicInteger(0);


You could use an AtomicBoolean here, right?

Make isGeneratingSnapshot final, please.

In fact, I'm not sure whether Zookeeper backend threads will take snapshot in parallel. So, I choose to use a AtomicInteger to protect manual call takeSnapshot.

eribeiro · 2017-02-20T08:58:13Z

src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java

        }
    }

+    public boolean tryTakeSnapshot() {


In Java codebases we usually replace the "try" prefix by "maybe". ;)

So it becomes maybeTakeSnapshot().

hmm...I haven't working on Java for years, forgive me :)

revans2

I am not an expert on the code, but I am a bit concerned about possibly data corruption, and the real possibility of having multiple snapshots be taken at the same time.

revans2 · 2017-02-21T14:45:01Z

src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java

        }
    }

+    public boolean maybeTakeSnapshot() {


Could you add some javadocs here? It would be nice to explain the difference between takeSnapshot and maybeTakeSnapshot.

revans2 · 2017-02-21T15:12:32Z

src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java

@@ -303,15 +305,38 @@ public void loadData() throws IOException, InterruptedException {

    public void takeSnapshot(){


We now have a lot of potential paths to take a snapshot. JMX, SyncRequestProcessor, Admin, and as part of the sync stage in ZABP. In the past it was just the SyncRequestProcessor and ZABP that would trigger snapshots. We didn't have much concern about collisions, because SyncRequestProcessor would only run as edits were being sent, and the edits would only be sent after the ZABP sync phase had completed. Now we have Admin and JMX possibly doing a snapshot in the background.

Now if a snapshot request comes in during the ZABP sync phase after we have clear out the in memory DB and not yet applied the new snapshot and then we crash before we can write out the new snapshot we could end up with data corruption. This should be super super rare so I don't really know if we care all that much about it, but I think it is something that we can fix.

I am not an expert on the code, so I am not sure of the cleanest way to fix this, but it feels like having maybeTakeSnapshot be a part of the SyncRequestProcessor instead of ZooKeeperServer. I don't know how simple it is to get to the SyncRequestProcessor from JMX/Admin but if you can then it means that we can reset the edit count so we are not taking a lot of snapshots all at once, one right after another. It also means that we can truly avoid having more then one snapshot be taken at any time.

+1. This is the reason why I advocated the use of AtomicBoolean instead of AtomicInteger to provide mutual exclusion on this code snippet.

If we are not expect to take snapshot on same time, the most easy way is to use AtomicBoolean protect the takeSnapshot from all the code path, which may block the SyncRequestProcessor a while if a manual task is ongoing. My current code is assuming the background SyncRequestProcessor have higher priority.

@eribeiro An AtomicBoolean is not enough as the SyncRequestProcessor ignores it, only the JMX and admin commands use it.

@flier there are several things going on here and I don't know the reasoning for all of them but I can guess.

Only take one snapshot at a time. I don't think this is super critical, because it is not a correctness problem. Having multiple snapshots happening at the same time should work correctly even in the face of crashes, but it becomes a performance problem. There is a limited amount of CPU, Memory, and most importantly disk bandwidth and IOps. The last successful snapshot wins. So having multiple snapshots all going at the same time means we are likely to be doing a lot of wasted work if everything does happen successfully. If we can schedule the snapshots so there is only one going on at a time then the full bandwidth and IOps can go to that snapshot. Even better would be to space them out, so if we force a snapshot it makes SyncRequestProcessor reset its counter so we don't take another one until X more transactions have been processed.

Taking a snapshot at the wrong time and then crashing can corrupt the DB on that node. This is potentially critical. The probability of this happening is very very small, but if we can fix it so it can never happen, I would be much happier.

I think we can fix both issues at the same time, by making JMX and Admin use SyncRequestProcessor instead of going to the ZookeeperServer directly. If no SyncRequestProcessor is ready then we can return such to the user so they know the node is not ready to take a snapshot.

I also really would like to understand the use case for this command, because I just don't see much value to a user to force another snapshot if one is already in progress. I also would like to understand when you would want to take a snapshot and if these changes would too severely limit that.

I have some scenes need to take snapshot, for example

First, our major Zookeeper cluster was deployed in an AWS zone, some observers running at a dozen IDC. We use this topological structure because Zookeeper cluster is not friendly to multi-IDC deployment. Besize, our zookeeper snapshot and transaction logs are huge, because some wrong client usage that hard to fix in short time :(

Sometimes, we plan to maintains the major cluster, we have to start another mirror cluster in same DC, and switch from the major cluster to the mirror cluster. If we do it fast enough, the observer and client will not concern the changes. That's why we need take snapshot to speed up the migration. If something got wrong, we could switch back to the old cluster, lost some transaction better than the whole system down.

Second, our backup policy need a daily/hourly offline backup, to AWS S3 or other DC. I would like to take and upload a latest and clean snapshot, instead of tar an old snapshot with a number of transaction logs.

Third, sometimes we need to deploy a new observer or a testing cluster in different DC, we have to copy the latest snapshot offline, because Zookeeper observer sync progress may become very slow, the TCP window could drop to 10-20KB/s in the 40-60% packet loss rate.

Cross DC deployment is an interesting topic and ZooKeeper does not intrinsically support it very well. Not saying that you don't need your snap command (I understand it is a quick and dirty way to get things working for your case), but here is some design that you might find useful for your deployment:

https://www.usenix.org/system/files/conference/atc16/atc16_paper-lev-ari.pdf
The basic idea of this is to partition your data to have multiple ZK ensembles (this loses global strong consistency) and then patch global consistency at client side. The client library is open sourced somewhere.

https://issues.apache.org/jira/browse/ZOOKEEPER-892
This is an old issue that no one driving at moment but sounds a good fit for your use case.

@flier Having SyncRequestProcessor look at the number of snapshot files feels like a bit of a hack, but I am not expert here. I would be fine with having SyncRequestProcessor delegate all throttling of snapshots to ZooKeeperServer, but then we need some sort of synchronization to prevent a snapshot from being taken when it is not in a proper state (during the ZABP sync phase).

@hanm Thanks for your advice :)

We are using a similar structure as 2.2 Alternative 2 – Learners in the paper, it is good enough for most of online scenes. I don't think it is worthy to introduce another layer because we give up the write operation to all observers, just use it as a read only view.

For the remote replication, I doubt it also blocked by packet loss rate like Observer. On the other hand, we have an internal project named zkpipe, it read Zookeeper snapshot/binlog and send it to a Kafka topic, our client could choose to rebuild the transaction or subscribe the changes. I believe it will better than hack Zookeeper itself. If you have interested, I could push it to github later.

@revans2 ok, let's me find some way to involve SyncRequestProcessor later

we have an internal project named zkpipe, it read Zookeeper snapshot/binlog and send it to a Kafka topic, our client could choose to rebuild the transaction or subscribe the changes. I believe it will better than hack Zookeeper itself. If you have interested, I could push it to github later.

@flier This sounds interesting. I am sure there are users of ZooKeeper that could benefit from this, because ZooKeeper does not work out of back for such backup scenarios. If you are OK / allowed to open source this work I recommend put it on github.

@hanm sure, please check the zkpipe project, we are using it to tracing and audit zookeeper operations

revans2 · 2017-02-21T15:16:28Z

Oh I also had the idea that it might be nice to provide feedback on the snap command for both JMX and the Admin command. You could provide more then just the last zxid, but you could also indicate if we are going to do the snapshot or not, and why.

flier · 2017-02-21T17:31:15Z

@revans2 added a generating field to the response of TakeSnapshotCommand command, and JMX takeSnapshot will return a boolean for same reason.

…snapshot

flier force-pushed the ZOOKEEPER-2700 branch 2 times, most recently from cba1a0b to d6b8e10 Compare February 17, 2017 14:26

flier changed the title ~~ZOOKEEPER-2700 add command to take snapshot~~ ZOOKEEPER-2700 add snap command to take snapshot Feb 17, 2017

revans2 reviewed Feb 17, 2017

View reviewed changes

eribeiro reviewed Feb 17, 2017

View reviewed changes

flier force-pushed the ZOOKEEPER-2700 branch 2 times, most recently from a9aa636 to ad60d28 Compare February 20, 2017 03:53

flier changed the title ~~ZOOKEEPER-2700 add snap command to take snapshot~~ ZOOKEEPER-2700 add JMX takeSnapshot method and Jetty Admin snap command to take snapshot Feb 20, 2017

eribeiro reviewed Feb 20, 2017

View reviewed changes

flier force-pushed the ZOOKEEPER-2700 branch 7 times, most recently from 2a982d3 to cb84192 Compare February 20, 2017 16:20

revans2 reviewed Feb 21, 2017

View reviewed changes

flier force-pushed the ZOOKEEPER-2700 branch from cb84192 to ef18512 Compare February 21, 2017 17:28

flier force-pushed the ZOOKEEPER-2700 branch from ef18512 to 57dee3a Compare February 21, 2017 17:42

add JMX takeSnapshot method and Jetty Admin snap command to take …

5c83438

…snapshot

flier force-pushed the ZOOKEEPER-2700 branch from 57dee3a to 5c83438 Compare February 21, 2017 17:46

anmolnar closed this Jan 30, 2019

maoling mentioned this pull request Jun 15, 2019

ZOOKEEPER-3318:[CLI way]Add a complete backup mechanism for zookeeper internal #917

Open

KoenDG mentioned this pull request Aug 12, 2019

Support enabling JMX port. 31z4/zookeeper-docker#74

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZOOKEEPER-2700 add JMX `takeSnapshot` method and Jetty Admin `snap` command to take snapshot #180

ZOOKEEPER-2700 add JMX `takeSnapshot` method and Jetty Admin `snap` command to take snapshot #180

flier commented Feb 17, 2017

revans2 Feb 17, 2017

hanm Feb 17, 2017

hanm Feb 17, 2017

eribeiro Feb 17, 2017

eribeiro Feb 17, 2017

hanm Feb 17, 2017

flier Feb 18, 2017

eribeiro Feb 17, 2017 •

edited

Loading

afine Feb 17, 2017

flier Feb 18, 2017

flier commented Feb 20, 2017

eribeiro Feb 20, 2017

flier Feb 20, 2017

eribeiro Feb 20, 2017

flier Feb 20, 2017

revans2 left a comment

revans2 Feb 21, 2017

revans2 Feb 21, 2017

eribeiro Feb 21, 2017

flier Feb 21, 2017

revans2 Feb 21, 2017

flier Feb 21, 2017 •

edited

Loading

hanm Feb 21, 2017

revans2 Feb 21, 2017

flier Feb 22, 2017 •

edited

Loading

hanm Mar 8, 2017

flier Mar 8, 2017

revans2 commented Feb 21, 2017

flier commented Feb 21, 2017

	snapInProcess = new ZooKeeperThread("Snapshot Thread") {
	public void run() {
	try {
	zks.takeSnapshot();
	} catch(Exception e) {
	LOG.warn("Unexpected exception", e);
	}
	}
	};
	snapInProcess.start();

		@@ -303,15 +305,38 @@ public void loadData() throws IOException, InterruptedException {

		public void takeSnapshot(){

ZOOKEEPER-2700 add JMX takeSnapshot method and Jetty Admin snap command to take snapshot #180

ZOOKEEPER-2700 add JMX takeSnapshot method and Jetty Admin snap command to take snapshot #180

Conversation

flier commented Feb 17, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eribeiro Feb 17, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flier commented Feb 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

revans2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flier Feb 21, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flier Feb 22, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

revans2 commented Feb 21, 2017

flier commented Feb 21, 2017

ZOOKEEPER-2700 add JMX `takeSnapshot` method and Jetty Admin `snap` command to take snapshot #180

ZOOKEEPER-2700 add JMX `takeSnapshot` method and Jetty Admin `snap` command to take snapshot #180

eribeiro Feb 17, 2017 •

edited

Loading

flier Feb 21, 2017 •

edited

Loading

flier Feb 22, 2017 •

edited

Loading