-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix to maintain correctness when out-of-order ZK updates are received #1161
Conversation
I'm curious if this is at all related to #1109 |
@drcrallen looks like #1109 is solving a different problem. |
@himanshug are you saying that curator is not properly updating its cache? |
@himanshug can you share which version of ZooKeeper you are using? ZooKeeper should have certain guarantees with respect to ordering, but there also have been some fixes related to message ordering in recent versions. |
@xvrl it has been observed on ZK 3.4.5 . Also, I'm not entirely sure if this is a zookeeper issue alone, curator might cause this in corner cases when connection etc goes down and it retries stuff. so, it is a good idea to not rely on cached data but get latest from zk when event is observed. |
@himanshug Thanks for this fix. This particular situation has been a white whale for me. It is extremely difficult to us to reproduce in production but it has occurred. The logs around curator inventory manager were specifically added to catch this problem. |
@fjy I think the logs certainly helped as it is so difficult to reproduce and understand otherwise. thanks. |
@xvrl @himanshug I think we should file a bug with either Curator or Zookeeper to let them know we have this particular problem. |
@himanshug Can we add a comment explaining that this is just a workaround? Also, it would be great to address the following to make sure we track down the root cause:
|
This seems like a Curator bug, or possibly a ZK bug expressing itself in Curator (probably not ZOOKEEPER-1667 though, since that sounds like it affects watcher kind but not order). I think we should try to fix Curator rather than working around the problem in Druid. Curator already keeps a path-to-version cache so it should be possible to do @himanshug's first suggestion there (refuse to apply out of order updates) without increasing memory use. |
@xvrl I've updated the pull request description. |
@gianm I agree that it should be fixed in curator/zookeeper but it is OK to have this patch in druid |
@himanshug Are you saying you have observed the problem on 3.4.6 as well, or are you saying that it is hard to test and reproduce on 3.4.5 and therefore you have not tried upgrading yet? |
@xvrl we have tried and seen only on 3.4.5 . None of our systems are running 3.4.6 yet. we can't upgrade zookeeper on our druid cluster (due to various other reasons and my team doesn't own the druid cluster in question). and, it is very difficult to reproduce on a test setup with zk-3.4.6 . |
b426f10
to
59c350a
Compare
Hey guys, in an attempt to move forward, I propose we merge this PR as it solves a fairly critical problem and open a new issue to investigate the root cause in Curator/ZK. I agree the fix should probably be there, but it does take time to interact with their communities to get the problem resolved. How does that sound to everyone? |
@@ -257,9 +276,19 @@ public void childEvent(CuratorFramework client, PathChildrenCacheEvent event) th | |||
case CHILD_UPDATED: | |||
synchronized (lock) { | |||
final ChildData child = event.getData(); | |||
|
|||
byte[] data = getZkDataForNode(child.getPath()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make sure we always get the latests data, should we perform a sync on the path first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"curatorFramework.getData().decompressed().forPath(path)" should always get latest data from ZK.
According to this thread this may also be an issue with 3.4.6, so I am on board with working around this. I also agree with @gianm that addressing it in curator would be cleaner and may also avoid the added network calls to get the node data every time. As long as we:
then I am on board with merging this as a temporary fix. @himanshug would you mind adding a comment and leading the effort to try to submit a patch to curator to address the problem as you described in the first solution you had in mind? |
+1 on @xvrl's comment. I think we can merge this if we comment the code that we should remove the workaround once we update to a curator version with the fix and file a new issue to do the update. |
…tead of taking it from the event which might be stale due to event coming out of order etc
59c350a
to
dda2a62
Compare
Fix to maintain correctness when out-of-order ZK updates are received
Backport #1161: Fix to maintain correctness when out-of-order ZK updates are received
It is possible that CuratorInventoryManager receives ZK path events which are duplicate and out-of-order. At broker, this corrupts the information about historical server to served segment mapping(when "batch" segment announcer is used) and leads to following errors when queries are sent...
"No partition chunk found for [SegmentDescriptor{interval=2015-01-27T00:00:00.000Z/2015-01-28T00:00:00.000Z, version='2015-01-28T10:37:48.739Z', partitionNumber=43}]! Looks like segments were dropped while queries were still in queue"
Our quick workaround was to keep restarting broker and coordinator periodically to refresh/correct the cache time to time.
This patch should fix the issue (as long as events are not missed and ADD event comes before REMOVE). Note that, in the ideal world, we wouldn't have this problem if curator/zk did not give us out-of-order and duplicate events. So, another way of looking at this patch is to consider it being a workaround that problem.
We thought of 2 options
Some evidence from the logs regarding out-of-order and duplicate events...
2015-02-16 11:18:32,412 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[3]
2015-02-16 11:19:04,773 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[5]
2015-02-16 11:20:09,039 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[4]
2015-02-16 11:20:41,420 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[5]
2015-02-16 11:20:41,437 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[6]
2015-02-16 11:21:13,737 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[6]
2015-02-16 11:22:51,089 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[7]
2015-02-16 11:23:23,400 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[8]
2015-02-16 11:27:08,895 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[7]
2015-02-16 11:27:41,591 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[9]
2015-02-16 11:27:41,705 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[8]
2015-02-16 11:32:32,751 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[10]