Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix to maintain correctness when out-of-order ZK updates are received #1161

Merged
merged 1 commit into from
Mar 3, 2015

Conversation

himanshug
Copy link
Contributor

It is possible that CuratorInventoryManager receives ZK path events which are duplicate and out-of-order. At broker, this corrupts the information about historical server to served segment mapping(when "batch" segment announcer is used) and leads to following errors when queries are sent...

"No partition chunk found for [SegmentDescriptor{interval=2015-01-27T00:00:00.000Z/2015-01-28T00:00:00.000Z, version='2015-01-28T10:37:48.739Z', partitionNumber=43}]! Looks like segments were dropped while queries were still in queue"

Our quick workaround was to keep restarting broker and coordinator periodically to refresh/correct the cache time to time.
This patch should fix the issue (as long as events are not missed and ADD event comes before REMOVE). Note that, in the ideal world, we wouldn't have this problem if curator/zk did not give us out-of-order and duplicate events. So, another way of looking at this patch is to consider it being a workaround that problem.

We thought of 2 options

  1. Keep a path to version cache inside CuratorInventoryManager (where cache keys are invalidated after configured amount of time, say 60 minutes) and ignore the event if higher version of same path event has been seen already.This would put more memory pressure on the broker and coordinator.
  2. When an event arrives, don't use the cached data from the event but get latest data from ZK directly. Then, even if events come out of order, data used is always latest. This option is easier to implement, does not put additional memory requirement and We chose to use it to solve the problem.

Some evidence from the logs regarding out-of-order and duplicate events...

2015-02-16 11:18:32,412 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[3]
2015-02-16 11:19:04,773 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[5]
2015-02-16 11:20:09,039 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[4]
2015-02-16 11:20:41,420 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[5]
2015-02-16 11:20:41,437 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[6]
2015-02-16 11:21:13,737 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[6]
2015-02-16 11:22:51,089 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[7]
2015-02-16 11:23:23,400 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[8]
2015-02-16 11:27:08,895 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[7]
2015-02-16 11:27:41,591 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[9]
2015-02-16 11:27:41,705 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[8]
2015-02-16 11:32:32,751 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - CHILD_UPDATED[2015-02-11T22:01:43.441Z3] with version[10]

@himanshug himanshug changed the title Workaround to maintain correctness when out-of-order ZK updates are received Fix to maintain correctness when out-of-order ZK updates are received Mar 1, 2015
@drcrallen
Copy link
Contributor

I'm curious if this is at all related to #1109

@himanshug
Copy link
Contributor Author

@drcrallen looks like #1109 is solving a different problem.

@xvrl
Copy link
Member

xvrl commented Mar 2, 2015

@himanshug are you saying that curator is not properly updating its cache?

@xvrl
Copy link
Member

xvrl commented Mar 2, 2015

@himanshug can you share which version of ZooKeeper you are using? ZooKeeper should have certain guarantees with respect to ordering, but there also have been some fixes related to message ordering in recent versions.

@himanshug
Copy link
Contributor Author

@xvrl it has been observed on ZK 3.4.5 . Also, I'm not entirely sure if this is a zookeeper issue alone, curator might cause this in corner cases when connection etc goes down and it retries stuff.

so, it is a good idea to not rely on cached data but get latest from zk when event is observed.

@fjy
Copy link
Contributor

fjy commented Mar 2, 2015

@himanshug Thanks for this fix. This particular situation has been a white whale for me. It is extremely difficult to us to reproduce in production but it has occurred. The logs around curator inventory manager were specifically added to catch this problem.

@himanshug
Copy link
Contributor Author

@fjy I think the logs certainly helped as it is so difficult to reproduce and understand otherwise. thanks.

@fjy
Copy link
Contributor

fjy commented Mar 2, 2015

@xvrl @himanshug I think we should file a bug with either Curator or Zookeeper to let them know we have this particular problem.

@xvrl
Copy link
Member

xvrl commented Mar 2, 2015

@himanshug Can we add a comment explaining that this is just a workaround?

Also, it would be great to address the following to make sure we track down the root cause:

  • Check if this is still an issue with ZooKeeper 3.4.6, there seem to be bugfixes related to child updates in 3.4.6, such as this one
  • File a corresponding issue with Curator, since PathChildrenCache should not update its cache if it sees an older version.

@gianm
Copy link
Contributor

gianm commented Mar 2, 2015

This seems like a Curator bug, or possibly a ZK bug expressing itself in Curator (probably not ZOOKEEPER-1667 though, since that sounds like it affects watcher kind but not order).

I think we should try to fix Curator rather than working around the problem in Druid. Curator already keeps a path-to-version cache so it should be possible to do @himanshug's first suggestion there (refuse to apply out of order updates) without increasing memory use.

@himanshug
Copy link
Contributor Author

@xvrl I've updated the pull request description.
It is really difficult to test and reproduce it on 3.4.6 though.
also, created an issue with curator https://issues.apache.org/jira/browse/CURATOR-191

@himanshug
Copy link
Contributor Author

@gianm I agree that it should be fixed in curator/zookeeper but it is OK to have this patch in druid
(1) for safeguarding ourselves
(2) to have some fix while it is being corrected in curator.

@xvrl
Copy link
Member

xvrl commented Mar 2, 2015

@himanshug Are you saying you have observed the problem on 3.4.6 as well, or are you saying that it is hard to test and reproduce on 3.4.5 and therefore you have not tried upgrading yet?

@himanshug
Copy link
Contributor Author

@xvrl we have tried and seen only on 3.4.5 . None of our systems are running 3.4.6 yet. we can't upgrade zookeeper on our druid cluster (due to various other reasons and my team doesn't own the druid cluster in question).

and, it is very difficult to reproduce on a test setup with zk-3.4.6 .

@fjy
Copy link
Contributor

fjy commented Mar 3, 2015

Hey guys, in an attempt to move forward, I propose we merge this PR as it solves a fairly critical problem and open a new issue to investigate the root cause in Curator/ZK. I agree the fix should probably be there, but it does take time to interact with their communities to get the problem resolved. How does that sound to everyone?

@@ -257,9 +276,19 @@ public void childEvent(CuratorFramework client, PathChildrenCacheEvent event) th
case CHILD_UPDATED:
synchronized (lock) {
final ChildData child = event.getData();

byte[] data = getZkDataForNode(child.getPath());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to make sure we always get the latests data, should we perform a sync on the path first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"curatorFramework.getData().decompressed().forPath(path)" should always get latest data from ZK.

@xvrl
Copy link
Member

xvrl commented Mar 3, 2015

According to this thread this may also be an issue with 3.4.6, so I am on board with working around this. I also agree with @gianm that addressing it in curator would be cleaner and may also avoid the added network calls to get the node data every time.

As long as we:

  • actively seeking a fix / workaround at a lower level, and
  • add a comment in the code to indicate this is a temporary workaround

then I am on board with merging this as a temporary fix.

@himanshug would you mind adding a comment and leading the effort to try to submit a patch to curator to address the problem as you described in the first solution you had in mind?

@cheddar
Copy link
Contributor

cheddar commented Mar 3, 2015

apache/curator#68

@himanshug
Copy link
Contributor Author

@xvrl got pulled into some stuff today and cudn't reply msgs but @cheddar is way faster it seems :)

@fjy
Copy link
Contributor

fjy commented Mar 3, 2015

+1 on @xvrl's comment. I think we can merge this if we comment the code that we should remove the workaround once we update to a curator version with the fix and file a new issue to do the update.

…tead of taking it from the event which might be stale due to event coming out of order etc
@himanshug
Copy link
Contributor Author

@xvrl @fjy added the comment with curator bug and info that its a temporary workaround.

fjy added a commit that referenced this pull request Mar 3, 2015
Fix to maintain correctness when out-of-order ZK updates are received
@fjy fjy merged commit d8e199a into apache:master Mar 3, 2015
xvrl added a commit that referenced this pull request Mar 5, 2015
Backport #1161: Fix to maintain correctness when out-of-order ZK updates are received
@xvrl xvrl added this to the 0.7.1 milestone Mar 13, 2015
@xvrl xvrl added this to the 0.7.1 milestone Mar 13, 2015
@himanshug himanshug deleted the zk_ood_updates branch May 16, 2015 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants