[CURATOR-447] TreeCache: Improve memory usage and concurrent update logic #250

njhill · 2018-01-02T20:36:44Z

Jira https://issues.apache.org/jira/browse/CURATOR-374 reduced per-node memory usage in TreeCache. It can be improved further via removal of the nodeState field - its LIVE state corresponds exactly to the adjacent childData field being non-null, and a sentinel ChildData value can be used for the DEAD state. This simplification also reduces the room for bugs and state inconsistencies.

Other improvements included:

More robust cache update logic (in get-children and get-data event callbacks)
~~A further simplification to have TreeNode extend AtomicReference, which obviates the need for an explicit childData field~~
~~Avoid overhead of incrementing/decrementing the outstandingOps atomic integer post-initialization~~

See corresponding JIRA: CURATOR-447

also simplify by having TreeNode extend AtomicReference<ChildData>

njhill · 2018-01-08T15:48:13Z

@dragonsinth @Randgalt any interest in these updates? FWIW I've had them running in production for some time.

Randgalt · 2018-01-08T15:52:49Z

I'll defer to @dragonsinth

dragonsinth · 2018-01-08T21:03:25Z

Hi @njhill just got back to work from vacation, and this is a little more than I can just eyeball, I gotta think through it a bit when I have more time. I think I would like to see the removal of nodeState in isolation, perhaps with an explicit PENDING sentinel rather than just null.

It's not clear to me that TreeNode extends AtomicReference and @Nullable outstandingOps are wins. The first one doesn't really save any memory, and for the second one, the ongoing CPU cost to update outstandingOps is basically nothing, it's almost certainly immeasurably tiny, and you have to introduce racy field reads to accomplish it.

Have you done memory use measurements? I would expect the memory footprint of ConcurrentMap<String, TreeNode> children to mostly dominate the memory overhead of our own classes, not to mention all the Watchers.

njhill · 2018-01-08T23:54:19Z

Thanks @dragonsinth, there's of course no rush at all!

I agree that TreeNode extends AtomicReference isn't a "win" as such, it just looked cleaner to me than using another AtomicReferenceFieldUpdater. But it's likely subjective and certainly a minor thing.

The outstandingOps change seems clearer to me - the field has no use for most of the life of the cache and yet is incremented/decremented on every single change. It's not the CPU cycle overhead but more the (CPU) cache coherency overhead since it's an AtomicLong. I didn't think the change introduces any raciness - the only times when outstandingOps needs to be modified (during initialization or re-initialization) it will be guaranteed to be non-null.

I would expect the memory footprint of ConcurrentMap<String, TreeNode> children to mostly dominate the memory overhead of our own classes

Good point. The description of CURATOR-374 implied this might have some mem overhead benefit for very large caches, but removal of the nodeState field was motivated more by state simplification and consolidation of atomic updates. I'll do some quick measurements but I acknowledge percentage-wise the saving might not be significant.

and add TreeNode serial version id

(add back AtomicReferenceFieldUpdater for this)

njhill · 2018-01-09T15:22:54Z

@dragonsinth I've removed the outstandingOps change commit from the PR, and added a new commit which reverts the TreeNode extends AtomicReference<ChildData> change. Hopefully that helps to make this more palatable!

njhill · 2018-02-06T05:10:18Z

@dragonsinth any interest in these changes? Maybe I should rename the PR since as you pointed out the memory saving aspect isn't very significant (I measured about 2% on a cache of 50k empty nodes), but hopefully the state and update logic simplification is beneficial?

dragonsinth · 2018-02-06T07:01:53Z

Sorry, been busy then out of town. I'll look tomorrow.

dragonsinth · 2018-02-07T18:12:10Z

I'm sure this is probably right, but the old code has a lot of miles on it. I have to read through super carefully and convince myself that there are no behavioral changes.

dragonsinth · 2018-02-12T19:57:44Z

curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java

                    wasCreated();
                }
                break;
            case CHILDREN:
                if ( event.getResultCode() == KeeperException.Code.OK.intValue() )
                {
                    ChildData oldChildData = childData;
-                    if ( oldChildData != null && oldChildData.getStat().getMzxid() == newStat.getMzxid() )
+                    //TODO consider doing update of cversion, pzxid, numChildren only


seems reasonable to me

dragonsinth · 2018-02-12T20:11:06Z

curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java

-                        {
-                            childMap = children;
-                        }
+                        if ( childrenUpdater.compareAndSet(this, null, childMap) ) break;


I see why you turned this into a loop (nice catch), but I think it reads a little clearly and more deliberateif you revert back to the original code and just change the if -> while. Ideally if Curator moves wholesale to Java 1.8, funcs, like:

ConcurrentMap<String, TreeNode> childMap = childrenUpdater.updateAndGet(this, (oldChildren) -> { return oldChildren != null ? oldChildren : Maps.newConcurrentMap(); });

dragonsinth · 2018-02-12T20:14:05Z

curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java

@@ -704,8 +687,7 @@ private TreeNode find(String findPath)
            {
                TreeNode childNode = entry.getValue();
                ChildData childData = childNode.childData;
-                // Double-check liveness after retreiving data.


I kind of liked this comment, although I misspelled "retrieving" here and below

The reason I removed it is that unlike before it's not really "double" checking anymore against the separate nodeState field, there's only a single check for liveness which is now implicit in the value of childData. But I could well have misinterpreted the reason for comment, no problem with leaving it in!

dragonsinth · 2018-02-12T20:15:00Z

curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java

        {
            return null;
        }
        ChildData result = node.childData;
-        // Double-check liveness after retreiving data.
-        return node.nodeState == NodeState.LIVE ? result : null;
+        return result != DEAD ? result : null;


shouldn't this be isLive for consistency?

Reason was that isLive also checks for null which isn't necessary here (since we're returning null anyhow). But no harm in making it consistent.

dragonsinth · 2018-02-12T20:25:40Z

curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java

+                                // Ordinary nodes are not allowed to transition from dead -> live;
+                                // make sure this isn't a delayed response that came in after death.
+                                || (parent != null && oldChildData == DEAD) ) {
+                            break;


Can we break up these cases for clarity?

final ChildData oldChildData = childData; // Ignore this event if we've already processed a newer update for this node. if ( (isLive(oldChildData) && newStat.getMzxid() <= oldChildData.getStat().getMzxid()) ) { break; } // Non-root nodes are not allowed to transition from dead -> live; // make sure this isn't a delayed response that came in after death. if ( parent != null && oldChildData == DEAD ) { break; }

dragonsinth · 2018-02-12T20:30:26Z

curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java

+                            {
+                                publishEvent(TreeCacheEvent.Type.NODE_UPDATED, toPublish);
+                            }
+                            break;


I think the second check is now superfluous since we checked above.

if ( childDataUpdater.compareAndSet(this, oldChildData, toUpdate) ) { publishEvent(isLive(oldChildData) ? TreeCacheEvent.Type.NODE_UPDATED : TreeCacheEvent.Type.NODE_ADDED, toPublish); break; }

dragonsinth · 2018-02-12T20:40:22Z

@njhill this looks good to me with a few changes. I pushed one extra commit onto https://github.com/apache/curator/tree/CURATOR-447 for you to look at. If you agree with my changes (some of them are to conform to the project style) please sync your branch to mine and I'll merge this.

njhill · 2018-02-12T23:24:27Z

Thanks alot @dragonsinth for going through this so carefully. I like your additional simplifications, and sorry for the project style inconsistencies that I missed!

I've pulled in your commit and added one more just to remove some parentheses no longer needed after splitting those conditions into separate ifs.

dragonsinth · 2018-02-13T00:51:48Z

curator-recipes/src/main/java/org/apache/curator/framework/recipes/cache/TreeCache.java

-                if ( childData != null && childNode.nodeState == NodeState.LIVE )
+                ChildData childData = entry.getValue().childData;
+                // Double-check liveness after retrieving data.
+                if ( isLive(childData) )


It's subtle, but the point I'm trying to make is that you have to read .childData exactly once, then check the liveness, then use only that value going forward. It would be an error to remove the local variable here and reference entry.getValue().childData twice.

It's a "double" check because generally there shouldn't be any non-LIVE children in the parent children map that we're iterating. But due to volatility we have to "double check" that child nodes read from the map are still alive after pulling .childData

dragonsinth

LGTM will merge

remove nodeState field, improve thread safety of cache updates

1c7be85

also simplify by having TreeNode extend AtomicReference<ChildData>

njhill changed the title ~~[CURATOR-447] TreeCache: Improve memory usage and thread safety~~ [CURATOR-447] TreeCache: Improve memory usage and concurrent update logic Jan 3, 2018

njhill force-pushed the CURATOR-447 branch 3 times, most recently from cd747d6 to 5bc92c0 Compare January 3, 2018 12:36

njhill added 2 commits January 9, 2018 14:52

remove superfluous braces, unused private method

939d7dc

and add TreeNode serial version id

revert TreeNode extends AtomicReference<ChildData> change

043270a

(add back AtomicReferenceFieldUpdater for this)

njhill force-pushed the CURATOR-447 branch from 5bc92c0 to 043270a Compare January 9, 2018 15:18

dragonsinth reviewed Feb 12, 2018

View reviewed changes

dragonsinth and others added 2 commits February 12, 2018 15:12

Minor cleanup and formatting

99acafe

tiny formatting change: remove superfluous parentheses

2cc3f1f

dragonsinth reviewed Feb 13, 2018

View reviewed changes

dragonsinth approved these changes Feb 13, 2018

View reviewed changes

asfgit merged commit 2cc3f1f into apache:master Feb 13, 2018

njhill deleted the CURATOR-447 branch February 13, 2018 01:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CURATOR-447] TreeCache: Improve memory usage and concurrent update logic #250

[CURATOR-447] TreeCache: Improve memory usage and concurrent update logic #250

njhill commented Jan 2, 2018 •

edited

njhill commented Jan 8, 2018

Randgalt commented Jan 8, 2018

dragonsinth commented Jan 8, 2018

njhill commented Jan 8, 2018

njhill commented Jan 9, 2018

njhill commented Feb 6, 2018

dragonsinth commented Feb 6, 2018

dragonsinth commented Feb 7, 2018

dragonsinth Feb 12, 2018

dragonsinth Feb 12, 2018

dragonsinth Feb 12, 2018

njhill Feb 12, 2018

dragonsinth Feb 12, 2018

njhill Feb 12, 2018

dragonsinth Feb 12, 2018

dragonsinth Feb 12, 2018

dragonsinth commented Feb 12, 2018

njhill commented Feb 12, 2018

dragonsinth Feb 13, 2018

dragonsinth left a comment

[CURATOR-447] TreeCache: Improve memory usage and concurrent update logic #250

[CURATOR-447] TreeCache: Improve memory usage and concurrent update logic #250

Conversation

njhill commented Jan 2, 2018 • edited

njhill commented Jan 8, 2018

Randgalt commented Jan 8, 2018

dragonsinth commented Jan 8, 2018

njhill commented Jan 8, 2018

njhill commented Jan 9, 2018

njhill commented Feb 6, 2018

dragonsinth commented Feb 6, 2018

dragonsinth commented Feb 7, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dragonsinth commented Feb 12, 2018

njhill commented Feb 12, 2018

Choose a reason for hiding this comment

dragonsinth left a comment

Choose a reason for hiding this comment

njhill commented Jan 2, 2018 •

edited