Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-2700 add JMX takeSnapshot method and Jetty Admin snap command to take snapshot #180

Closed
wants to merge 1 commit into from

Conversation

flier
Copy link

@flier flier commented Feb 17, 2017

When cold backup or remote offline sync Zookeeper instances, we need the latest snapshot.

Add a four letter snap command to force Zookeeper to generate snapshot.

@flier flier force-pushed the ZOOKEEPER-2700 branch 2 times, most recently from cba1a0b to d6b8e10 Compare February 17, 2017 14:26
@flier flier changed the title ZOOKEEPER-2700 add command to take snapshot ZOOKEEPER-2700 add snap command to take snapshot Feb 17, 2017
if (!isZKServerRunning()) {
pw.println(ZK_NOT_SERVING);
} else {
Thread snapInProcess = new ZooKeeperThread("Snapshot Thread") {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this also be a potential DoS attack? Especially with launching a new thread in the background for the snapshot. Could we put in some sort of throttling on taking the snapshot, so that we don't do it too frequently?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR is targeting master I suggest considering the option of adding a snap API to ZooKeeperAdmin, which is recently introduced to harden security around dynamic reconfiguration. ZooKeeperAdmin supports all sorts of authentications built in ZK and we can extend it such that only admin (or any users that explicitly being granted admin access to cluster) can issue snap command.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also general comments about adding features to ZK: when you add a new feature, please also add tests :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanm FYI, I suggested to add as a JMX option too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eribeiro Yes that is one option - another would be Jetty AdminServer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@revans2 now, only generate snapshot when idle and last zxid changed.

@hanm I think snap should be a command to impact only one instance, no an administration task to all instance in cluster.

@eribeiro added a takeSnapshot method to JMX

Thread snapInProcess = new ZooKeeperThread("Snapshot Thread") {
public void run() {
try {
zkServer.takeSnapshot();
Copy link
Contributor

@eribeiro eribeiro Feb 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this call potentially enter on a race condition with this snippet below?

snapInProcess = new ZooKeeperThread("Snapshot Thread") {
public void run() {
try {
zks.takeSnapshot();
} catch(Exception e) {
LOG.warn("Unexpected exception", e);
}
}
};
snapInProcess.start();

I guess we would need some way of returning early if there's a snapshot process already in progress.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It may be nice to have some check that generally make sure multiple snapshots cannot happen at the same time, as a sanity check.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have wrapped the manual requests from snap command or JMX with a tryTakeSnapshot method, which will skip action when there are busy. If you believe it is necessary, we could merge the check into takeSnapshot, which may impact all the server workflow.

@flier flier force-pushed the ZOOKEEPER-2700 branch 2 times, most recently from a9aa636 to ad60d28 Compare February 20, 2017 03:53
@flier flier changed the title ZOOKEEPER-2700 add snap command to take snapshot ZOOKEEPER-2700 add JMX takeSnapshot method and Jetty Admin snap command to take snapshot Feb 20, 2017
@flier
Copy link
Author

flier commented Feb 20, 2017

According the review comments and ZOOKEEPER-1729, I have submited another patch to add JMX takeSnapshot method and Jetty Admin snap command to take snapshot, with test cases

@@ -126,6 +125,9 @@
private final ZooKeeperServerListener listener;
private ZooKeeperServerShutdownHandler zkShutdownHandler;

private volatile long lastSnapshotZxid;
private AtomicInteger isGeneratingSnapshot = new AtomicInteger(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use an AtomicBoolean here, right?

Make isGeneratingSnapshot final, please.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I'm not sure whether Zookeeper backend threads will take snapshot in parallel. So, I choose to use a AtomicInteger to protect manual call takeSnapshot.

}
}

public boolean tryTakeSnapshot() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Java codebases we usually replace the "try" prefix by "maybe". ;)

So it becomes maybeTakeSnapshot().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm...I haven't working on Java for years, forgive me :)

@flier flier force-pushed the ZOOKEEPER-2700 branch 7 times, most recently from 2a982d3 to cb84192 Compare February 20, 2017 16:20
Copy link

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not an expert on the code, but I am a bit concerned about possibly data corruption, and the real possibility of having multiple snapshots be taken at the same time.

}
}

public boolean maybeTakeSnapshot() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some javadocs here? It would be nice to explain the difference between takeSnapshot and maybeTakeSnapshot.

@@ -303,15 +305,38 @@ public void loadData() throws IOException, InterruptedException {

public void takeSnapshot(){
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have a lot of potential paths to take a snapshot. JMX, SyncRequestProcessor, Admin, and as part of the sync stage in ZABP. In the past it was just the SyncRequestProcessor and ZABP that would trigger snapshots. We didn't have much concern about collisions, because SyncRequestProcessor would only run as edits were being sent, and the edits would only be sent after the ZABP sync phase had completed. Now we have Admin and JMX possibly doing a snapshot in the background.

Now if a snapshot request comes in during the ZABP sync phase after we have clear out the in memory DB and not yet applied the new snapshot and then we crash before we can write out the new snapshot we could end up with data corruption. This should be super super rare so I don't really know if we care all that much about it, but I think it is something that we can fix.

I am not an expert on the code, so I am not sure of the cleanest way to fix this, but it feels like having maybeTakeSnapshot be a part of the SyncRequestProcessor instead of ZooKeeperServer. I don't know how simple it is to get to the SyncRequestProcessor from JMX/Admin but if you can then it means that we can reset the edit count so we are not taking a lot of snapshots all at once, one right after another. It also means that we can truly avoid having more then one snapshot be taken at any time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. This is the reason why I advocated the use of AtomicBoolean instead of AtomicInteger to provide mutual exclusion on this code snippet.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are not expect to take snapshot on same time, the most easy way is to use AtomicBoolean protect the takeSnapshot from all the code path, which may block the SyncRequestProcessor a while if a manual task is ongoing. My current code is assuming the background SyncRequestProcessor have higher priority.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eribeiro An AtomicBoolean is not enough as the SyncRequestProcessor ignores it, only the JMX and admin commands use it.

@flier there are several things going on here and I don't know the reasoning for all of them but I can guess.

  1. Only take one snapshot at a time. I don't think this is super critical, because it is not a correctness problem. Having multiple snapshots happening at the same time should work correctly even in the face of crashes, but it becomes a performance problem. There is a limited amount of CPU, Memory, and most importantly disk bandwidth and IOps. The last successful snapshot wins. So having multiple snapshots all going at the same time means we are likely to be doing a lot of wasted work if everything does happen successfully. If we can schedule the snapshots so there is only one going on at a time then the full bandwidth and IOps can go to that snapshot. Even better would be to space them out, so if we force a snapshot it makes SyncRequestProcessor reset its counter so we don't take another one until X more transactions have been processed.

  2. Taking a snapshot at the wrong time and then crashing can corrupt the DB on that node. This is potentially critical. The probability of this happening is very very small, but if we can fix it so it can never happen, I would be much happier.

I think we can fix both issues at the same time, by making JMX and Admin use SyncRequestProcessor instead of going to the ZookeeperServer directly. If no SyncRequestProcessor is ready then we can return such to the user so they know the node is not ready to take a snapshot.

I also really would like to understand the use case for this command, because I just don't see much value to a user to force another snapshot if one is already in progress. I also would like to understand when you would want to take a snapshot and if these changes would too severely limit that.

Copy link
Author

@flier flier Feb 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some scenes need to take snapshot, for example

First, our major Zookeeper cluster was deployed in an AWS zone, some observers running at a dozen IDC. We use this topological structure because Zookeeper cluster is not friendly to multi-IDC deployment. Besize, our zookeeper snapshot and transaction logs are huge, because some wrong client usage that hard to fix in short time :(

Sometimes, we plan to maintains the major cluster, we have to start another mirror cluster in same DC, and switch from the major cluster to the mirror cluster. If we do it fast enough, the observer and client will not concern the changes. That's why we need take snapshot to speed up the migration. If something got wrong, we could switch back to the old cluster, lost some transaction better than the whole system down.

Second, our backup policy need a daily/hourly offline backup, to AWS S3 or other DC. I would like to take and upload a latest and clean snapshot, instead of tar an old snapshot with a number of transaction logs.

Third, sometimes we need to deploy a new observer or a testing cluster in different DC, we have to copy the latest snapshot offline, because Zookeeper observer sync progress may become very slow, the TCP window could drop to 10-20KB/s in the 40-60% packet loss rate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross DC deployment is an interesting topic and ZooKeeper does not intrinsically support it very well. Not saying that you don't need your snap command (I understand it is a quick and dirty way to get things working for your case), but here is some design that you might find useful for your deployment:

https://www.usenix.org/system/files/conference/atc16/atc16_paper-lev-ari.pdf
The basic idea of this is to partition your data to have multiple ZK ensembles (this loses global strong consistency) and then patch global consistency at client side. The client library is open sourced somewhere.

https://issues.apache.org/jira/browse/ZOOKEEPER-892
This is an old issue that no one driving at moment but sounds a good fit for your use case.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flier Having SyncRequestProcessor look at the number of snapshot files feels like a bit of a hack, but I am not expert here. I would be fine with having SyncRequestProcessor delegate all throttling of snapshots to ZooKeeperServer, but then we need some sort of synchronization to prevent a snapshot from being taken when it is not in a proper state (during the ZABP sync phase).

Copy link
Author

@flier flier Feb 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanm Thanks for your advice :)

We are using a similar structure as 2.2 Alternative 2 – Learners in the paper, it is good enough for most of online scenes. I don't think it is worthy to introduce another layer because we give up the write operation to all observers, just use it as a read only view.

For the remote replication, I doubt it also blocked by packet loss rate like Observer. On the other hand, we have an internal project named zkpipe, it read Zookeeper snapshot/binlog and send it to a Kafka topic, our client could choose to rebuild the transaction or subscribe the changes. I believe it will better than hack Zookeeper itself. If you have interested, I could push it to github later.

@revans2 ok, let's me find some way to involve SyncRequestProcessor later

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have an internal project named zkpipe, it read Zookeeper snapshot/binlog and send it to a Kafka topic, our client could choose to rebuild the transaction or subscribe the changes. I believe it will better than hack Zookeeper itself. If you have interested, I could push it to github later.

@flier This sounds interesting. I am sure there are users of ZooKeeper that could benefit from this, because ZooKeeper does not work out of back for such backup scenarios. If you are OK / allowed to open source this work I recommend put it on github.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanm sure, please check the zkpipe project, we are using it to tracing and audit zookeeper operations

@revans2
Copy link

revans2 commented Feb 21, 2017

Oh I also had the idea that it might be nice to provide feedback on the snap command for both JMX and the Admin command. You could provide more then just the last zxid, but you could also indicate if we are going to do the snapshot or not, and why.

@flier
Copy link
Author

flier commented Feb 21, 2017

@revans2 added a generating field to the response of TakeSnapshotCommand command, and JMX takeSnapshot will return a boolean for same reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants