Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CURATOR-447] TreeCache: Improve memory usage and concurrent update logic #250

Merged
merged 5 commits into from Feb 13, 2018

Conversation

njhill
Copy link
Contributor

@njhill njhill commented Jan 2, 2018

Jira https://issues.apache.org/jira/browse/CURATOR-374 reduced per-node memory usage in TreeCache. It can be improved further via removal of the nodeState field - its LIVE state corresponds exactly to the adjacent childData field being non-null, and a sentinel ChildData value can be used for the DEAD state. This simplification also reduces the room for bugs and state inconsistencies.

Other improvements included:

  • More robust cache update logic (in get-children and get-data event callbacks)
  • A further simplification to have TreeNode extend AtomicReference, which obviates the need for an explicit childData field
  • Avoid overhead of incrementing/decrementing the outstandingOps atomic integer post-initialization

See corresponding JIRA: CURATOR-447

also simplify by having TreeNode extend AtomicReference<ChildData>
@njhill njhill changed the title [CURATOR-447] TreeCache: Improve memory usage and thread safety [CURATOR-447] TreeCache: Improve memory usage and concurrent update logic Jan 3, 2018
@njhill njhill force-pushed the CURATOR-447 branch 3 times, most recently from cd747d6 to 5bc92c0 Compare January 3, 2018 12:36
@njhill
Copy link
Contributor Author

njhill commented Jan 8, 2018

@dragonsinth @Randgalt any interest in these updates? FWIW I've had them running in production for some time.

@Randgalt
Copy link
Member

Randgalt commented Jan 8, 2018

I'll defer to @dragonsinth

@dragonsinth
Copy link
Contributor

Hi @njhill just got back to work from vacation, and this is a little more than I can just eyeball, I gotta think through it a bit when I have more time. I think I would like to see the removal of nodeState in isolation, perhaps with an explicit PENDING sentinel rather than just null.

It's not clear to me that TreeNode extends AtomicReference and @Nullable outstandingOps are wins. The first one doesn't really save any memory, and for the second one, the ongoing CPU cost to update outstandingOps is basically nothing, it's almost certainly immeasurably tiny, and you have to introduce racy field reads to accomplish it.

Have you done memory use measurements? I would expect the memory footprint of ConcurrentMap<String, TreeNode> children to mostly dominate the memory overhead of our own classes, not to mention all the Watchers.

@njhill
Copy link
Contributor Author

njhill commented Jan 8, 2018

Thanks @dragonsinth, there's of course no rush at all!

I agree that TreeNode extends AtomicReference isn't a "win" as such, it just looked cleaner to me than using another AtomicReferenceFieldUpdater. But it's likely subjective and certainly a minor thing.

The outstandingOps change seems clearer to me - the field has no use for most of the life of the cache and yet is incremented/decremented on every single change. It's not the CPU cycle overhead but more the (CPU) cache coherency overhead since it's an AtomicLong. I didn't think the change introduces any raciness - the only times when outstandingOps needs to be modified (during initialization or re-initialization) it will be guaranteed to be non-null.

I would expect the memory footprint of ConcurrentMap<String, TreeNode> children to mostly dominate the memory overhead of our own classes

Good point. The description of CURATOR-374 implied this might have some mem overhead benefit for very large caches, but removal of the nodeState field was motivated more by state simplification and consolidation of atomic updates. I'll do some quick measurements but I acknowledge percentage-wise the saving might not be significant.

and add TreeNode serial version id
(add back AtomicReferenceFieldUpdater for this)
@njhill
Copy link
Contributor Author

njhill commented Jan 9, 2018

@dragonsinth I've removed the outstandingOps change commit from the PR, and added a new commit which reverts the TreeNode extends AtomicReference<ChildData> change. Hopefully that helps to make this more palatable!

@njhill
Copy link
Contributor Author

njhill commented Feb 6, 2018

@dragonsinth any interest in these changes? Maybe I should rename the PR since as you pointed out the memory saving aspect isn't very significant (I measured about 2% on a cache of 50k empty nodes), but hopefully the state and update logic simplification is beneficial?

@dragonsinth
Copy link
Contributor

Sorry, been busy then out of town. I'll look tomorrow.

@dragonsinth
Copy link
Contributor

I'm sure this is probably right, but the old code has a lot of miles on it. I have to read through super carefully and convince myself that there are no behavioral changes.

wasCreated();
}
break;
case CHILDREN:
if ( event.getResultCode() == KeeperException.Code.OK.intValue() )
{
ChildData oldChildData = childData;
if ( oldChildData != null && oldChildData.getStat().getMzxid() == newStat.getMzxid() )
//TODO consider doing update of cversion, pzxid, numChildren only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable to me

{
childMap = children;
}
if ( childrenUpdater.compareAndSet(this, null, childMap) ) break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why you turned this into a loop (nice catch), but I think it reads a little clearly and more deliberateif you revert back to the original code and just change the if -> while. Ideally if Curator moves wholesale to Java 1.8, funcs, like:

                    ConcurrentMap<String, TreeNode> childMap = childrenUpdater.updateAndGet(this, (oldChildren) -> {
                        return oldChildren != null ? oldChildren : Maps.newConcurrentMap();
                    });

@@ -704,8 +687,7 @@ private TreeNode find(String findPath)
{
TreeNode childNode = entry.getValue();
ChildData childData = childNode.childData;
// Double-check liveness after retreiving data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of liked this comment, although I misspelled "retrieving" here and below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I removed it is that unlike before it's not really "double" checking anymore against the separate nodeState field, there's only a single check for liveness which is now implicit in the value of childData. But I could well have misinterpreted the reason for comment, no problem with leaving it in!

{
return null;
}
ChildData result = node.childData;
// Double-check liveness after retreiving data.
return node.nodeState == NodeState.LIVE ? result : null;
return result != DEAD ? result : null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be isLive for consistency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason was that isLive also checks for null which isn't necessary here (since we're returning null anyhow). But no harm in making it consistent.

// Ordinary nodes are not allowed to transition from dead -> live;
// make sure this isn't a delayed response that came in after death.
|| (parent != null && oldChildData == DEAD) ) {
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we break up these cases for clarity?

                        final ChildData oldChildData = childData;
                        // Ignore this event if we've already processed a newer update for this node.
                        if ( (isLive(oldChildData) && newStat.getMzxid() <= oldChildData.getStat().getMzxid()) )
                        {
                            break;
                        }
                        // Non-root nodes are not allowed to transition from dead -> live;
                        // make sure this isn't a delayed response that came in after death.
                        if ( parent != null && oldChildData == DEAD )
                        {
                            break;
                        }

{
publishEvent(TreeCacheEvent.Type.NODE_UPDATED, toPublish);
}
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the second check is now superfluous since we checked above.

                        if ( childDataUpdater.compareAndSet(this, oldChildData, toUpdate) )
                        {
                            publishEvent(isLive(oldChildData) ? TreeCacheEvent.Type.NODE_UPDATED : TreeCacheEvent.Type.NODE_ADDED, toPublish);
                            break;
                        }

@dragonsinth
Copy link
Contributor

@njhill this looks good to me with a few changes. I pushed one extra commit onto https://github.com/apache/curator/tree/CURATOR-447 for you to look at. If you agree with my changes (some of them are to conform to the project style) please sync your branch to mine and I'll merge this.

@njhill
Copy link
Contributor Author

njhill commented Feb 12, 2018

Thanks alot @dragonsinth for going through this so carefully. I like your additional simplifications, and sorry for the project style inconsistencies that I missed!

I've pulled in your commit and added one more just to remove some parentheses no longer needed after splitting those conditions into separate ifs.

if ( childData != null && childNode.nodeState == NodeState.LIVE )
ChildData childData = entry.getValue().childData;
// Double-check liveness after retrieving data.
if ( isLive(childData) )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's subtle, but the point I'm trying to make is that you have to read .childData exactly once, then check the liveness, then use only that value going forward. It would be an error to remove the local variable here and reference entry.getValue().childData twice.

It's a "double" check because generally there shouldn't be any non-LIVE children in the parent children map that we're iterating. But due to volatility we have to "double check" that child nodes read from the map are still alive after pulling .childData

Copy link
Contributor

@dragonsinth dragonsinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM will merge

@asfgit asfgit merged commit 2cc3f1f into apache:master Feb 13, 2018
@njhill njhill deleted the CURATOR-447 branch February 13, 2018 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants