Skip to content

ZOOKEEPER-3425: ranking the ttl by expireTime asc for the performance#1010

Open
maoling wants to merge 2 commits intoapache:masterfrom
maoling:ZOOKEEPER-3425
Open

ZOOKEEPER-3425: ranking the ttl by expireTime asc for the performance#1010
maoling wants to merge 2 commits intoapache:masterfrom
maoling:ZOOKEEPER-3425

Conversation

@maoling
Copy link
Member

@maoling maoling commented Jul 4, 2019

  • Ranking the ttl by expireTime asc will get the following two benefits(Overall using space to swap time):

    • the guarantee of timing order, t1 will disappear before the t2 when t1's expired timestamp < t2's in the same round.
    • avoid the inefficient search in the getCandidates(). If we have 1000+(even the 10000+) ttl nodes which will disappear two hours later, every iteration it will check whether all the ttls node has expired. With ranking, it will break/jump quickly.
    • this design was inspired by the hadoop's client LeaseManager:
  • A UT:testTTLTreeSetCollection was added to assert the behaviour of the TreeSet Collection:ttls.

  • BTW, let's revisit the TimeTask's scheduleAtFixedRate strategy. the TimeTask will run every minute(with the defalut checkIntervalMs value). when the time current round spends is more than one minute, it will waiting for the current round being finished, then starting another round. That's OK.

  • more details in the ZOOKEEPER-3425

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel I am missing some part of this change.
We are changing behaviour and how we are serializing data.
Don't we need backward compatibility tests?

Are you an user of TTL nodes and you saw a bottle neck?

containers.add(path);
} else if (ephemeralType == EphemeralType.TTL) {
ttls.add(path);
ttls.add(new TTLNode(path, node.stat.getCtime() + ephemeralType.getValue(node.stat.getEphemeralOwner())));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we deal with backward compatibility?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ttls is a set only in the memory without any serializing into disk, so no backward compatibility.

Are you an user of TTL nodes and you saw a bottle neck?

  • Look at the github description firstly
  • the ttl is a very import feature for a key-value storage system.Currently it's rusted.Look at the ZOOKEEPER-3428 which wants to improve the ttl further.

@maoling maoling force-pushed the ZOOKEEPER-3425 branch 2 times, most recently from 3a83e4e to 76e1c9c Compare July 30, 2019 11:10
@maoling maoling force-pushed the ZOOKEEPER-3425 branch 2 times, most recently from f6e537a to bc21d01 Compare August 12, 2019 03:12
@anmolnar
Copy link
Contributor

@maoling I like the improvement part: sorting TTLs by expiry time makes sense to me, but why do you want to disable children creation?

@Randgalt You might want to take a look at this patch.

@Randgalt
Copy link
Member

TTL nodes must be able to have children. I'm -1 on this part of the PR - it's a non-starter TBH and completely changes what a TTL node is and how it's used.

// VisibleForTesting
protected long getElapsed(DataNode node) {
return Time.currentWallTime() - node.stat.getMtime();
return Time.currentWallTime() - node.stat.getCtime();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change and is incorrect. The concept of a TTL node is that it expires after non-use. If the TTL is continually updated it should not be deleted.

@Randgalt
Copy link
Member

Heavy -1 on this. This would break users such as Elastic search cloud. TTL nodes were designed to have children.

@maoling
Copy link
Member Author

maoling commented Nov 19, 2019

@anmolnar @Randgalt
Yep, we should not break the compatibility, I have deleted the changes about the semantics of TTL and focus on the performance issue by a force update. PTAL.

@maoling maoling changed the title ZOOKEEPER-3425:ttl node should not have the children and ranking the ttl by expireTime asc for the performance ZOOKEEPER-3425: ranking the ttl by expireTime asc for the performance Nov 23, 2019
@asf-ci
Copy link

asf-ci commented Nov 23, 2019

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build-maven/1648/

@asf-ci
Copy link

asf-ci commented Nov 23, 2019

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build-maven/1649/

*/
private final Set<String> ttls = Collections.newSetFromMap(new ConcurrentHashMap<String, Boolean>());
public final Set<TTLNode> ttls = new TreeSet<>(
(t1, t2) -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be:

    public final Set<TTLNode> ttls = new TreeSet<>(
        Comparator.comparingLong(TTLNode::getExpireTime)
        .thenComparing(TTLNode::getPath)
    );

public Set<String> getTtls() {
return new HashSet<String>(ttls);
public Set<TTLNode> getTtls() {
return new LinkedHashSet<>(ttls);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the desire here not to allocate a potentially large internally array when there are many TTL nodes? I think we shouldn't even bother with the defensive copy here. Returning Collections.unmodifiableSet(ttls) is fine and, given that this is only an internally used object, just the internal ttls would be OK with an appropriate comment (and maybe renaming the method).

*/
static class TTLNode {

private String path;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be final


private String path;

private long expireTime;
Copy link
Member

@Randgalt Randgalt Nov 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be volatile - but can be final given my other comment

nodes.postChange(path, n);
EphemeralType ephemeralType = EphemeralType.get(n.stat.getEphemeralOwner());
if (ephemeralType == EphemeralType.TTL) {
TTLNode ttlNode = ttlMap.get(path);
Copy link
Member

@Randgalt Randgalt Nov 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This get is not needed. Just remove from ttlMap and create a new TTLNode to add to ttls and ttlMap. Then, expireTime in TTLNode can be made final and the setter removed. i.e.

TTLNode ttlNode = ttlMap.remove(path);
if (ttlNode != null) {
    ttls.remove(ttlNode);
}
ttlNode = new TTLNode(path, n.stat.getMtime() + ephemeralType.getValue(n.stat.getEphemeralOwner()));
ttlMap.put(path, ttlNode);
ttls.add(ttlNode);

*/
private final Set<String> containers = Collections.newSetFromMap(new ConcurrentHashMap<String, Boolean>());

private final Map<String, TTLNode> ttlMap = new HashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This addition of this map and the change of ttls from a concurrent map to a non-concurrent TreeSet breaks thread safety. The synchronization in DataTree is based on individual nodes yet these maps/sets belong to the DataTree itself. This is a serious latent bug and must be corrected or the implementation must be re-considered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if ConcurrentSkipListMap can be used (I'm not very familiar with it though).

@Randgalt
Copy link
Member

Randgalt commented Nov 23, 2019

Here's an alternate implementation that's simpler (in my view). However, I need to think more about the maintenance of the two maps. I'm not certain this is thread safe. Finally, is this optimization needed? This smells like a premature optimization to me. Has anyone seen issues with this in production installations? The TTLs are searched in a background thread and shouldn't affect foreground performance very much.

master...Randgalt:ZOOKEEPER-3425-order-ttls-for-performance

update: I've verified the thread safety of the two maps and updated the code to make sure it's properly synchronized.

@Randgalt
Copy link
Member

Randgalt commented Nov 25, 2019

This new TTLManager solves thread safety however it hurts concurrency. In the master implementation, the TTL map is concurrent (via the ConcurrentMap) and allows for multi-thread mutation. This class synchronizes the entire TTL map making access serial for all the threads.

I'm, in general, still highly -1 on this change as it feels like a premature optimization.

@asf-ci
Copy link

asf-ci commented Nov 25, 2019

Refer to this link for build results (access rights to CI server needed):
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build-maven/1651/

@maoling
Copy link
Member Author

maoling commented Nov 25, 2019

@Randgalt

  • Great review work and insight. I take a close look at the patch you provide, learn a lot from it.
  • I don't insist on this improvement, not sure whether it's a premature optimization. But I believe this PR has opened an interesting discussion :). I really like this atmosphere.

@anmolnar
Copy link
Contributor

anmolnar commented Dec 7, 2019

If this patch is purely performance optimisation (I absolutely support that a single patch should focus on a single thing at a time), I believe we should see some test cases and numbers to prove the benefits.

@anmolnar
Copy link
Contributor

anmolnar commented Dec 7, 2019

I second @Randgalt 's concern: I don't like that we have to synchronise on entire TTLManager. Hence I'd like to see some perf tests on different workloads: I suspect this implementation performs worse with clients that create / delete lots of TTLs.

@anmolnar
Copy link
Contributor

anmolnar commented Dec 7, 2019

On the other hand - and I'm not sure about this -, but if writes in ZooKeeper are serialised, do we need to care about synchronisation? Which threads are possibly accessing TTLManager at the same time?

@Randgalt
Copy link
Member

Randgalt commented Dec 7, 2019

if writes in ZooKeeper are serialised, do we need to care about synchronisation?

I've wondered about that myself. The Watcher Manager has a bunch of synchronization (and other parts of the code too). TBH - I don't know enough about it and have always followed other synchronized blocks in the ZK code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants