Fix to maintain correctness when out-of-order ZK updates are received #1161

himanshug · 2015-03-01T18:17:36Z

It is possible that CuratorInventoryManager receives ZK path events which are duplicate and out-of-order. At broker, this corrupts the information about historical server to served segment mapping(when "batch" segment announcer is used) and leads to following errors when queries are sent...

"No partition chunk found for [SegmentDescriptor{interval=2015-01-27T00:00:00.000Z/2015-01-28T00:00:00.000Z, version='2015-01-28T10:37:48.739Z', partitionNumber=43}]! Looks like segments were dropped while queries were still in queue"

Our quick workaround was to keep restarting broker and coordinator periodically to refresh/correct the cache time to time.
This patch should fix the issue (as long as events are not missed and ADD event comes before REMOVE). Note that, in the ideal world, we wouldn't have this problem if curator/zk did not give us out-of-order and duplicate events. So, another way of looking at this patch is to consider it being a workaround that problem.

We thought of 2 options

Keep a path to version cache inside CuratorInventoryManager (where cache keys are invalidated after configured amount of time, say 60 minutes) and ignore the event if higher version of same path event has been seen already.This would put more memory pressure on the broker and coordinator.
When an event arrives, don't use the cached data from the event but get latest data from ZK directly. Then, even if events come out of order, data used is always latest. This option is easier to implement, does not put additional memory requirement and We chose to use it to solve the problem.

Some evidence from the logs regarding out-of-order and duplicate events...

2015-02-16 11:18:32,412 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[3]
2015-02-16 11:19:04,773 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[5]
2015-02-16 11:20:09,039 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[4]
2015-02-16 11:20:41,420 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[5]
2015-02-16 11:20:41,437 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[6]
2015-02-16 11:21:13,737 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[6]
2015-02-16 11:22:51,089 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[7]
2015-02-16 11:23:23,400 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[8]
2015-02-16 11:27:08,895 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[7]
2015-02-16 11:27:41,591 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[9]
2015-02-16 11:27:41,705 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[8]
2015-02-16 11:32:32,751 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[10]

drcrallen · 2015-03-02T02:59:56Z

I'm curious if this is at all related to #1109

himanshug · 2015-03-02T16:35:12Z

@drcrallen looks like #1109 is solving a different problem.

xvrl · 2015-03-02T17:49:50Z

@himanshug are you saying that curator is not properly updating its cache?

xvrl · 2015-03-02T17:56:36Z

@himanshug can you share which version of ZooKeeper you are using? ZooKeeper should have certain guarantees with respect to ordering, but there also have been some fixes related to message ordering in recent versions.

himanshug · 2015-03-02T18:38:32Z

@xvrl it has been observed on ZK 3.4.5 . Also, I'm not entirely sure if this is a zookeeper issue alone, curator might cause this in corner cases when connection etc goes down and it retries stuff.

so, it is a good idea to not rely on cached data but get latest from zk when event is observed.

fjy · 2015-03-02T18:39:19Z

@himanshug Thanks for this fix. This particular situation has been a white whale for me. It is extremely difficult to us to reproduce in production but it has occurred. The logs around curator inventory manager were specifically added to catch this problem.

himanshug · 2015-03-02T18:41:08Z

@fjy I think the logs certainly helped as it is so difficult to reproduce and understand otherwise. thanks.

fjy · 2015-03-02T18:57:08Z

@xvrl @himanshug I think we should file a bug with either Curator or Zookeeper to let them know we have this particular problem.

xvrl · 2015-03-02T19:01:49Z

@himanshug Can we add a comment explaining that this is just a workaround?

Also, it would be great to address the following to make sure we track down the root cause:

Check if this is still an issue with ZooKeeper 3.4.6, there seem to be bugfixes related to child updates in 3.4.6, such as this one
File a corresponding issue with Curator, since PathChildrenCache should not update its cache if it sees an older version.

gianm · 2015-03-02T19:16:47Z

This seems like a Curator bug, or possibly a ZK bug expressing itself in Curator (probably not ZOOKEEPER-1667 though, since that sounds like it affects watcher kind but not order).

I think we should try to fix Curator rather than working around the problem in Druid. Curator already keeps a path-to-version cache so it should be possible to do @himanshug's first suggestion there (refuse to apply out of order updates) without increasing memory use.

himanshug · 2015-03-02T20:12:42Z

@xvrl I've updated the pull request description.
It is really difficult to test and reproduce it on 3.4.6 though.
also, created an issue with curator https://issues.apache.org/jira/browse/CURATOR-191

himanshug · 2015-03-02T20:15:52Z

@gianm I agree that it should be fixed in curator/zookeeper but it is OK to have this patch in druid
(1) for safeguarding ourselves
(2) to have some fix while it is being corrected in curator.

xvrl · 2015-03-02T20:31:27Z

@himanshug Are you saying you have observed the problem on 3.4.6 as well, or are you saying that it is hard to test and reproduce on 3.4.5 and therefore you have not tried upgrading yet?

himanshug · 2015-03-02T20:34:53Z

@xvrl we have tried and seen only on 3.4.5 . None of our systems are running 3.4.6 yet. we can't upgrade zookeeper on our druid cluster (due to various other reasons and my team doesn't own the druid cluster in question).

and, it is very difficult to reproduce on a test setup with zk-3.4.6 .

fjy · 2015-03-03T18:17:08Z

Hey guys, in an attempt to move forward, I propose we merge this PR as it solves a fairly critical problem and open a new issue to investigate the root cause in Curator/ZK. I agree the fix should probably be there, but it does take time to interact with their communities to get the problem resolved. How does that sound to everyone?

xvrl · 2015-03-03T18:42:26Z

server/src/main/java/io/druid/curator/inventory/CuratorInventoryManager.java

@@ -257,9 +276,19 @@ public void childEvent(CuratorFramework client, PathChildrenCacheEvent event) th
        case CHILD_UPDATED:
          synchronized (lock) {
            final ChildData child = event.getData();
+
+            byte[] data = getZkDataForNode(child.getPath());


to make sure we always get the latests data, should we perform a sync on the path first?

"curatorFramework.getData().decompressed().forPath(path)" should always get latest data from ZK.

xvrl · 2015-03-03T19:41:20Z

According to this thread this may also be an issue with 3.4.6, so I am on board with working around this. I also agree with @gianm that addressing it in curator would be cleaner and may also avoid the added network calls to get the node data every time.

As long as we:

actively seeking a fix / workaround at a lower level, and
add a comment in the code to indicate this is a temporary workaround

then I am on board with merging this as a temporary fix.

@himanshug would you mind adding a comment and leading the effort to try to submit a patch to curator to address the problem as you described in the first solution you had in mind?

cheddar · 2015-03-03T22:03:53Z

apache/curator#68

himanshug · 2015-03-03T22:10:17Z

@xvrl got pulled into some stuff today and cudn't reply msgs but @cheddar is way faster it seems :)

fjy · 2015-03-03T22:14:54Z

+1 on @xvrl's comment. I think we can merge this if we comment the code that we should remove the workaround once we update to a curator version with the fix and file a new issue to do the update.

…tead of taking it from the event which might be stale due to event coming out of order etc

himanshug · 2015-03-03T22:23:08Z

@xvrl @fjy added the comment with curator bug and info that its a temporary workaround.

Fix to maintain correctness when out-of-order ZK updates are received

Backport #1161: Fix to maintain correctness when out-of-order ZK updates are received

himanshug changed the title ~~Workaround to maintain correctness when out-of-order ZK updates are received~~ Fix to maintain correctness when out-of-order ZK updates are received Mar 1, 2015

himanshug force-pushed the zk_ood_updates branch from b426f10 to 59c350a Compare March 3, 2015 02:42

xvrl reviewed Mar 3, 2015
View reviewed changes

on receiving ZK ADD/UPDATE events, get latest data from zookeeper ins…

dda2a62

…tead of taking it from the event which might be stale due to event coming out of order etc

himanshug force-pushed the zk_ood_updates branch from 59c350a to dda2a62 Compare March 3, 2015 22:22

fjy added a commit that referenced this pull request Mar 3, 2015

Merge pull request #1161 from himanshug/zk_ood_updates

d8e199a

Fix to maintain correctness when out-of-order ZK updates are received

fjy merged commit d8e199a into apache:master Mar 3, 2015

fjy mentioned this pull request Mar 3, 2015

Remove Druid workaround code for out of order events in Curator #1168

Closed

xvrl added a commit that referenced this pull request Mar 5, 2015

Merge pull request #1169 from druid-io/backport-zk-fix

24ab451

Backport #1161: Fix to maintain correctness when out-of-order ZK updates are received

xvrl added this to the 0.7.1 milestone Mar 13, 2015

himanshug deleted the zk_ood_updates branch May 16, 2015 22:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix to maintain correctness when out-of-order ZK updates are received #1161

Fix to maintain correctness when out-of-order ZK updates are received #1161

himanshug commented Mar 1, 2015

drcrallen commented Mar 2, 2015

himanshug commented Mar 2, 2015

xvrl commented Mar 2, 2015

xvrl commented Mar 2, 2015

himanshug commented Mar 2, 2015

fjy commented Mar 2, 2015

himanshug commented Mar 2, 2015

fjy commented Mar 2, 2015

xvrl commented Mar 2, 2015

gianm commented Mar 2, 2015

himanshug commented Mar 2, 2015

himanshug commented Mar 2, 2015

xvrl commented Mar 2, 2015

himanshug commented Mar 2, 2015

fjy commented Mar 3, 2015

xvrl Mar 3, 2015

himanshug Mar 3, 2015

xvrl commented Mar 3, 2015

cheddar commented Mar 3, 2015

himanshug commented Mar 3, 2015

fjy commented Mar 3, 2015

himanshug commented Mar 3, 2015

Fix to maintain correctness when out-of-order ZK updates are received #1161

Fix to maintain correctness when out-of-order ZK updates are received #1161

Conversation

himanshug commented Mar 1, 2015

drcrallen commented Mar 2, 2015

himanshug commented Mar 2, 2015

xvrl commented Mar 2, 2015

xvrl commented Mar 2, 2015

himanshug commented Mar 2, 2015

fjy commented Mar 2, 2015

himanshug commented Mar 2, 2015

fjy commented Mar 2, 2015

xvrl commented Mar 2, 2015

gianm commented Mar 2, 2015

himanshug commented Mar 2, 2015

himanshug commented Mar 2, 2015

xvrl commented Mar 2, 2015

himanshug commented Mar 2, 2015

fjy commented Mar 3, 2015

xvrl Mar 3, 2015

Choose a reason for hiding this comment

himanshug Mar 3, 2015

Choose a reason for hiding this comment

xvrl commented Mar 3, 2015

cheddar commented Mar 3, 2015

himanshug commented Mar 3, 2015

fjy commented Mar 3, 2015

himanshug commented Mar 3, 2015