New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CURATOR-447] TreeCache: Improve memory usage and concurrent update logic #250
Conversation
also simplify by having TreeNode extend AtomicReference<ChildData>
cd747d6
to
5bc92c0
Compare
@dragonsinth @Randgalt any interest in these updates? FWIW I've had them running in production for some time. |
I'll defer to @dragonsinth |
Hi @njhill just got back to work from vacation, and this is a little more than I can just eyeball, I gotta think through it a bit when I have more time. I think I would like to see the removal of It's not clear to me that Have you done memory use measurements? I would expect the memory footprint of |
Thanks @dragonsinth, there's of course no rush at all! I agree that The
Good point. The description of CURATOR-374 implied this might have some mem overhead benefit for very large caches, but removal of the |
and add TreeNode serial version id
(add back AtomicReferenceFieldUpdater for this)
@dragonsinth I've removed the |
@dragonsinth any interest in these changes? Maybe I should rename the PR since as you pointed out the memory saving aspect isn't very significant (I measured about 2% on a cache of 50k empty nodes), but hopefully the state and update logic simplification is beneficial? |
Sorry, been busy then out of town. I'll look tomorrow. |
I'm sure this is probably right, but the old code has a lot of miles on it. I have to read through super carefully and convince myself that there are no behavioral changes. |
wasCreated(); | ||
} | ||
break; | ||
case CHILDREN: | ||
if ( event.getResultCode() == KeeperException.Code.OK.intValue() ) | ||
{ | ||
ChildData oldChildData = childData; | ||
if ( oldChildData != null && oldChildData.getStat().getMzxid() == newStat.getMzxid() ) | ||
//TODO consider doing update of cversion, pzxid, numChildren only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems reasonable to me
{ | ||
childMap = children; | ||
} | ||
if ( childrenUpdater.compareAndSet(this, null, childMap) ) break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see why you turned this into a loop (nice catch), but I think it reads a little clearly and more deliberateif you revert back to the original code and just change the if
-> while
. Ideally if Curator moves wholesale to Java 1.8, funcs, like:
ConcurrentMap<String, TreeNode> childMap = childrenUpdater.updateAndGet(this, (oldChildren) -> {
return oldChildren != null ? oldChildren : Maps.newConcurrentMap();
});
@@ -704,8 +687,7 @@ private TreeNode find(String findPath) | |||
{ | |||
TreeNode childNode = entry.getValue(); | |||
ChildData childData = childNode.childData; | |||
// Double-check liveness after retreiving data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of liked this comment, although I misspelled "retrieving" here and below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I removed it is that unlike before it's not really "double" checking anymore against the separate nodeState
field, there's only a single check for liveness which is now implicit in the value of childData
. But I could well have misinterpreted the reason for comment, no problem with leaving it in!
{ | ||
return null; | ||
} | ||
ChildData result = node.childData; | ||
// Double-check liveness after retreiving data. | ||
return node.nodeState == NodeState.LIVE ? result : null; | ||
return result != DEAD ? result : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be isLive
for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reason was that isLive
also checks for null
which isn't necessary here (since we're returning null
anyhow). But no harm in making it consistent.
// Ordinary nodes are not allowed to transition from dead -> live; | ||
// make sure this isn't a delayed response that came in after death. | ||
|| (parent != null && oldChildData == DEAD) ) { | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we break up these cases for clarity?
final ChildData oldChildData = childData;
// Ignore this event if we've already processed a newer update for this node.
if ( (isLive(oldChildData) && newStat.getMzxid() <= oldChildData.getStat().getMzxid()) )
{
break;
}
// Non-root nodes are not allowed to transition from dead -> live;
// make sure this isn't a delayed response that came in after death.
if ( parent != null && oldChildData == DEAD )
{
break;
}
{ | ||
publishEvent(TreeCacheEvent.Type.NODE_UPDATED, toPublish); | ||
} | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the second check is now superfluous since we checked above.
if ( childDataUpdater.compareAndSet(this, oldChildData, toUpdate) )
{
publishEvent(isLive(oldChildData) ? TreeCacheEvent.Type.NODE_UPDATED : TreeCacheEvent.Type.NODE_ADDED, toPublish);
break;
}
@njhill this looks good to me with a few changes. I pushed one extra commit onto https://github.com/apache/curator/tree/CURATOR-447 for you to look at. If you agree with my changes (some of them are to conform to the project style) please sync your branch to mine and I'll merge this. |
Thanks alot @dragonsinth for going through this so carefully. I like your additional simplifications, and sorry for the project style inconsistencies that I missed! I've pulled in your commit and added one more just to remove some parentheses no longer needed after splitting those conditions into separate |
if ( childData != null && childNode.nodeState == NodeState.LIVE ) | ||
ChildData childData = entry.getValue().childData; | ||
// Double-check liveness after retrieving data. | ||
if ( isLive(childData) ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's subtle, but the point I'm trying to make is that you have to read .childData
exactly once, then check the liveness, then use only that value going forward. It would be an error to remove the local variable here and reference entry.getValue().childData
twice.
It's a "double" check because generally there shouldn't be any non-LIVE children in the parent children
map that we're iterating. But due to volatility we have to "double check" that child nodes read from the map are still alive after pulling .childData
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM will merge
Jira https://issues.apache.org/jira/browse/CURATOR-374 reduced per-node memory usage in
TreeCache
. It can be improved further via removal of thenodeState
field - itsLIVE
state corresponds exactly to the adjacentchildData
field being non-null, and a sentinelChildData
value can be used for theDEAD
state. This simplification also reduces the room for bugs and state inconsistencies.Other improvements included:
A further simplification to haveTreeNode
extendAtomicReference
, which obviates the need for an explicitchildData
fieldAvoid overhead of incrementing/decrementing theoutstandingOps
atomic integer post-initializationSee corresponding JIRA: CURATOR-447