-
Notifications
You must be signed in to change notification settings - Fork 7.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZOOKEEPER-2700 add JMX takeSnapshot
method and Jetty Admin snap
command to take snapshot
#180
Conversation
cba1a0b
to
d6b8e10
Compare
snap
command to take snapshot
if (!isZKServerRunning()) { | ||
pw.println(ZK_NOT_SERVING); | ||
} else { | ||
Thread snapInProcess = new ZooKeeperThread("Snapshot Thread") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this also be a potential DoS attack? Especially with launching a new thread in the background for the snapshot. Could we put in some sort of throttling on taking the snapshot, so that we don't do it too frequently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this PR is targeting master I suggest considering the option of adding a snap API to ZooKeeperAdmin, which is recently introduced to harden security around dynamic reconfiguration. ZooKeeperAdmin supports all sorts of authentications built in ZK and we can extend it such that only admin (or any users that explicitly being granted admin access to cluster) can issue snap command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also general comments about adding features to ZK: when you add a new feature, please also add tests :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanm FYI, I suggested to add as a JMX option too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eribeiro Yes that is one option - another would be Jetty AdminServer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thread snapInProcess = new ZooKeeperThread("Snapshot Thread") { | ||
public void run() { | ||
try { | ||
zkServer.takeSnapshot(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this call potentially enter on a race condition with this snippet below?
zookeeper/src/java/main/org/apache/zookeeper/server/SyncRequestProcessor.java
Lines 132 to 141 in ec20c54
snapInProcess = new ZooKeeperThread("Snapshot Thread") { | |
public void run() { | |
try { | |
zks.takeSnapshot(); | |
} catch(Exception e) { | |
LOG.warn("Unexpected exception", e); | |
} | |
} | |
}; | |
snapInProcess.start(); |
I guess we would need some way of returning early if there's a snapshot process already in progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. It may be nice to have some check that generally make sure multiple snapshots cannot happen at the same time, as a sanity check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have wrapped the manual requests from snap
command or JMX with a tryTakeSnapshot
method, which will skip action when there are busy. If you believe it is necessary, we could merge the check into takeSnapshot
, which may impact all the server workflow.
a9aa636
to
ad60d28
Compare
snap
command to take snapshottakeSnapshot
method and Jetty Admin snap
command to take snapshot
According the review comments and ZOOKEEPER-1729, I have submited another patch to add JMX |
@@ -126,6 +125,9 @@ | |||
private final ZooKeeperServerListener listener; | |||
private ZooKeeperServerShutdownHandler zkShutdownHandler; | |||
|
|||
private volatile long lastSnapshotZxid; | |||
private AtomicInteger isGeneratingSnapshot = new AtomicInteger(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use an AtomicBoolean
here, right?
Make isGeneratingSnapshot
final, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I'm not sure whether Zookeeper backend threads will take snapshot in parallel. So, I choose to use a AtomicInteger
to protect manual call takeSnapshot
.
} | ||
} | ||
|
||
public boolean tryTakeSnapshot() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Java codebases we usually replace the "try" prefix by "maybe". ;)
So it becomes maybeTakeSnapshot()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm...I haven't working on Java for years, forgive me :)
2a982d3
to
cb84192
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not an expert on the code, but I am a bit concerned about possibly data corruption, and the real possibility of having multiple snapshots be taken at the same time.
} | ||
} | ||
|
||
public boolean maybeTakeSnapshot() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add some javadocs here? It would be nice to explain the difference between takeSnapshot and maybeTakeSnapshot.
@@ -303,15 +305,38 @@ public void loadData() throws IOException, InterruptedException { | |||
|
|||
public void takeSnapshot(){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now have a lot of potential paths to take a snapshot. JMX, SyncRequestProcessor, Admin, and as part of the sync stage in ZABP. In the past it was just the SyncRequestProcessor and ZABP that would trigger snapshots. We didn't have much concern about collisions, because SyncRequestProcessor would only run as edits were being sent, and the edits would only be sent after the ZABP sync phase had completed. Now we have Admin and JMX possibly doing a snapshot in the background.
Now if a snapshot request comes in during the ZABP sync phase after we have clear out the in memory DB and not yet applied the new snapshot and then we crash before we can write out the new snapshot we could end up with data corruption. This should be super super rare so I don't really know if we care all that much about it, but I think it is something that we can fix.
I am not an expert on the code, so I am not sure of the cleanest way to fix this, but it feels like having maybeTakeSnapshot be a part of the SyncRequestProcessor instead of ZooKeeperServer. I don't know how simple it is to get to the SyncRequestProcessor from JMX/Admin but if you can then it means that we can reset the edit count so we are not taking a lot of snapshots all at once, one right after another. It also means that we can truly avoid having more then one snapshot be taken at any time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. This is the reason why I advocated the use of AtomicBoolean
instead of AtomicInteger
to provide mutual exclusion on this code snippet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are not expect to take snapshot on same time, the most easy way is to use AtomicBoolean
protect the takeSnapshot
from all the code path, which may block the SyncRequestProcessor
a while if a manual task is ongoing. My current code is assuming the background SyncRequestProcessor
have higher priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eribeiro An AtomicBoolean is not enough as the SyncRequestProcessor ignores it, only the JMX and admin commands use it.
@flier there are several things going on here and I don't know the reasoning for all of them but I can guess.
-
Only take one snapshot at a time. I don't think this is super critical, because it is not a correctness problem. Having multiple snapshots happening at the same time should work correctly even in the face of crashes, but it becomes a performance problem. There is a limited amount of CPU, Memory, and most importantly disk bandwidth and IOps. The last successful snapshot wins. So having multiple snapshots all going at the same time means we are likely to be doing a lot of wasted work if everything does happen successfully. If we can schedule the snapshots so there is only one going on at a time then the full bandwidth and IOps can go to that snapshot. Even better would be to space them out, so if we force a snapshot it makes SyncRequestProcessor reset its counter so we don't take another one until X more transactions have been processed.
-
Taking a snapshot at the wrong time and then crashing can corrupt the DB on that node. This is potentially critical. The probability of this happening is very very small, but if we can fix it so it can never happen, I would be much happier.
I think we can fix both issues at the same time, by making JMX and Admin use SyncRequestProcessor instead of going to the ZookeeperServer directly. If no SyncRequestProcessor is ready then we can return such to the user so they know the node is not ready to take a snapshot.
I also really would like to understand the use case for this command, because I just don't see much value to a user to force another snapshot if one is already in progress. I also would like to understand when you would want to take a snapshot and if these changes would too severely limit that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some scenes need to take snapshot, for example
First, our major Zookeeper cluster was deployed in an AWS zone, some observers running at a dozen IDC. We use this topological structure because Zookeeper cluster is not friendly to multi-IDC deployment. Besize, our zookeeper snapshot and transaction logs are huge, because some wrong client usage that hard to fix in short time :(
Sometimes, we plan to maintains the major cluster, we have to start another mirror cluster in same DC, and switch from the major cluster to the mirror cluster. If we do it fast enough, the observer and client will not concern the changes. That's why we need take snapshot to speed up the migration. If something got wrong, we could switch back to the old cluster, lost some transaction better than the whole system down.
Second, our backup policy need a daily/hourly offline backup, to AWS S3 or other DC. I would like to take and upload a latest and clean snapshot, instead of tar an old snapshot with a number of transaction logs.
Third, sometimes we need to deploy a new observer or a testing cluster in different DC, we have to copy the latest snapshot offline, because Zookeeper observer sync progress may become very slow, the TCP window could drop to 10-20KB/s in the 40-60% packet loss rate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cross DC deployment is an interesting topic and ZooKeeper does not intrinsically support it very well. Not saying that you don't need your snap command (I understand it is a quick and dirty way to get things working for your case), but here is some design that you might find useful for your deployment:
https://www.usenix.org/system/files/conference/atc16/atc16_paper-lev-ari.pdf
The basic idea of this is to partition your data to have multiple ZK ensembles (this loses global strong consistency) and then patch global consistency at client side. The client library is open sourced somewhere.
https://issues.apache.org/jira/browse/ZOOKEEPER-892
This is an old issue that no one driving at moment but sounds a good fit for your use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@flier Having SyncRequestProcessor look at the number of snapshot files feels like a bit of a hack, but I am not expert here. I would be fine with having SyncRequestProcessor delegate all throttling of snapshots to ZooKeeperServer, but then we need some sort of synchronization to prevent a snapshot from being taken when it is not in a proper state (during the ZABP sync phase).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanm Thanks for your advice :)
We are using a similar structure as 2.2 Alternative 2 – Learners
in the paper, it is good enough for most of online scenes. I don't think it is worthy to introduce another layer because we give up the write operation to all observers, just use it as a read only view.
For the remote replication, I doubt it also blocked by packet loss rate like Observer. On the other hand, we have an internal project named zkpipe
, it read Zookeeper snapshot/binlog and send it to a Kafka topic, our client could choose to rebuild the transaction or subscribe the changes. I believe it will better than hack Zookeeper itself. If you have interested, I could push it to github later.
@revans2 ok, let's me find some way to involve SyncRequestProcessor
later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have an internal project named zkpipe, it read Zookeeper snapshot/binlog and send it to a Kafka topic, our client could choose to rebuild the transaction or subscribe the changes. I believe it will better than hack Zookeeper itself. If you have interested, I could push it to github later.
@flier This sounds interesting. I am sure there are users of ZooKeeper that could benefit from this, because ZooKeeper does not work out of back for such backup scenarios. If you are OK / allowed to open source this work I recommend put it on github.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I also had the idea that it might be nice to provide feedback on the snap command for both JMX and the Admin command. You could provide more then just the last zxid, but you could also indicate if we are going to do the snapshot or not, and why. |
cb84192
to
ef18512
Compare
@revans2 added a generating field to the response of |
ef18512
to
57dee3a
Compare
57dee3a
to
5c83438
Compare
When cold backup or remote offline sync Zookeeper instances, we need the latest snapshot.
Add a four letter
snap
command to force Zookeeper to generate snapshot.