ZOOKEEPER-3425: ranking the ttl by expireTime asc for the performance#1010
ZOOKEEPER-3425: ranking the ttl by expireTime asc for the performance#1010maoling wants to merge 2 commits intoapache:masterfrom
Conversation
eolivelli
left a comment
There was a problem hiding this comment.
I feel I am missing some part of this change.
We are changing behaviour and how we are serializing data.
Don't we need backward compatibility tests?
Are you an user of TTL nodes and you saw a bottle neck?
| containers.add(path); | ||
| } else if (ephemeralType == EphemeralType.TTL) { | ||
| ttls.add(path); | ||
| ttls.add(new TTLNode(path, node.stat.getCtime() + ephemeralType.getValue(node.stat.getEphemeralOwner()))); |
There was a problem hiding this comment.
How do we deal with backward compatibility?
There was a problem hiding this comment.
ttlsis a set only in the memory without any serializing into disk, so no backward compatibility.
Are you an user of TTL nodes and you saw a bottle neck?
- Look at the github description firstly
- the ttl is a very import feature for a key-value storage system.Currently it's rusted.Look at the ZOOKEEPER-3428 which wants to improve the ttl further.
3a83e4e to
76e1c9c
Compare
f6e537a to
bc21d01
Compare
bc21d01 to
0e71766
Compare
|
TTL nodes must be able to have children. I'm -1 on this part of the PR - it's a non-starter TBH and completely changes what a TTL node is and how it's used. |
| // VisibleForTesting | ||
| protected long getElapsed(DataNode node) { | ||
| return Time.currentWallTime() - node.stat.getMtime(); | ||
| return Time.currentWallTime() - node.stat.getCtime(); |
There was a problem hiding this comment.
This is a breaking change and is incorrect. The concept of a TTL node is that it expires after non-use. If the TTL is continually updated it should not be deleted.
|
Heavy -1 on this. This would break users such as Elastic search cloud. TTL nodes were designed to have children. |
0e71766 to
0ee5799
Compare
0ee5799 to
c428c20
Compare
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
| */ | ||
| private final Set<String> ttls = Collections.newSetFromMap(new ConcurrentHashMap<String, Boolean>()); | ||
| public final Set<TTLNode> ttls = new TreeSet<>( | ||
| (t1, t2) -> { |
There was a problem hiding this comment.
can be:
public final Set<TTLNode> ttls = new TreeSet<>(
Comparator.comparingLong(TTLNode::getExpireTime)
.thenComparing(TTLNode::getPath)
);
| public Set<String> getTtls() { | ||
| return new HashSet<String>(ttls); | ||
| public Set<TTLNode> getTtls() { | ||
| return new LinkedHashSet<>(ttls); |
There was a problem hiding this comment.
Is the desire here not to allocate a potentially large internally array when there are many TTL nodes? I think we shouldn't even bother with the defensive copy here. Returning Collections.unmodifiableSet(ttls) is fine and, given that this is only an internally used object, just the internal ttls would be OK with an appropriate comment (and maybe renaming the method).
| */ | ||
| static class TTLNode { | ||
|
|
||
| private String path; |
|
|
||
| private String path; | ||
|
|
||
| private long expireTime; |
There was a problem hiding this comment.
should be volatile - but can be final given my other comment
| nodes.postChange(path, n); | ||
| EphemeralType ephemeralType = EphemeralType.get(n.stat.getEphemeralOwner()); | ||
| if (ephemeralType == EphemeralType.TTL) { | ||
| TTLNode ttlNode = ttlMap.get(path); |
There was a problem hiding this comment.
This get is not needed. Just remove from ttlMap and create a new TTLNode to add to ttls and ttlMap. Then, expireTime in TTLNode can be made final and the setter removed. i.e.
TTLNode ttlNode = ttlMap.remove(path);
if (ttlNode != null) {
ttls.remove(ttlNode);
}
ttlNode = new TTLNode(path, n.stat.getMtime() + ephemeralType.getValue(n.stat.getEphemeralOwner()));
ttlMap.put(path, ttlNode);
ttls.add(ttlNode);
| */ | ||
| private final Set<String> containers = Collections.newSetFromMap(new ConcurrentHashMap<String, Boolean>()); | ||
|
|
||
| private final Map<String, TTLNode> ttlMap = new HashMap<>(); |
There was a problem hiding this comment.
This addition of this map and the change of ttls from a concurrent map to a non-concurrent TreeSet breaks thread safety. The synchronization in DataTree is based on individual nodes yet these maps/sets belong to the DataTree itself. This is a serious latent bug and must be corrected or the implementation must be re-considered.
There was a problem hiding this comment.
I wonder if ConcurrentSkipListMap can be used (I'm not very familiar with it though).
|
Here's an alternate implementation that's simpler (in my view). However, I need to think more about the maintenance of the two maps. I'm not certain this is thread safe. Finally, is this optimization needed? This smells like a premature optimization to me. Has anyone seen issues with this in production installations? The TTLs are searched in a background thread and shouldn't affect foreground performance very much. master...Randgalt:ZOOKEEPER-3425-order-ttls-for-performance update: I've verified the thread safety of the two maps and updated the code to make sure it's properly synchronized. |
|
This new TTLManager solves thread safety however it hurts concurrency. In the master implementation, the TTL map is concurrent (via the ConcurrentMap) and allows for multi-thread mutation. This class synchronizes the entire TTL map making access serial for all the threads. I'm, in general, still highly -1 on this change as it feels like a premature optimization. |
|
Refer to this link for build results (access rights to CI server needed): |
|
|
If this patch is purely performance optimisation (I absolutely support that a single patch should focus on a single thing at a time), I believe we should see some test cases and numbers to prove the benefits. |
|
I second @Randgalt 's concern: I don't like that we have to synchronise on entire |
|
On the other hand - and I'm not sure about this -, but if writes in ZooKeeper are serialised, do we need to care about synchronisation? Which threads are possibly accessing |
I've wondered about that myself. The Watcher Manager has a bunch of synchronization (and other parts of the code too). TBH - I don't know enough about it and have always followed other synchronized blocks in the ZK code. |
Ranking the ttl by expireTime asc will get the following two benefits(Overall using space to swap time):
getCandidates(). If we have 1000+(even the 10000+) ttl nodes which will disappear two hours later, every iteration it will check whether all the ttls node has expired. With ranking, it will break/jump quickly.A UT:
testTTLTreeSetCollectionwas added to assert the behaviour of the TreeSet Collection:ttls.BTW, let's revisit the TimeTask's
scheduleAtFixedRatestrategy. the TimeTask will run every minute(with the defalutcheckIntervalMsvalue). when the time current round spends is more than one minute, it will waiting for the current round being finished, then starting another round. That's OK.more details in the ZOOKEEPER-3425