Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix huge number of watches in zk issue #6683

Closed
wants to merge 1 commit into from

Conversation

kaijianding
Copy link
Contributor

@kaijianding kaijianding commented Nov 29, 2018

fix #6647 .
I guess the origin Announcer.java is better performance for segment announcement, so I just keep it.

I guess the root cause of this issue is that each child path has a watch on zookeeper server even we only add listener at the parent path. When the path is like /druid/prod/announces/$host, the number of watches is n * n (each server watches every host including itself). Plus the path for lookup, the number is n * n * 2.

I test it in my product environment, the number of watches is reduced as expected.

transaction = transaction.delete().forPath(parent).and();
}
catch (Exception e) {
log.info(e, "Unable to delete parent[%s], boooo.", parent);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.error maybe

((CuratorTransactionFinal) transaction).commit();
}
catch (Exception e) {
log.info(e, "Unable to commit transaction. Please feed the hamsters");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.error maybe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a production Druid cluster has issues I don't think cluster operators will be happy to see this humour in the logs.

import java.util.concurrent.CopyOnWriteArrayList;

/**
* Announces single node on Zookeeper and only watch this node, which is different with Announcer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Announcer should be a javadoc link

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"which is different with Announcer" the idea is not complete. Please elaborate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is different with Announcer means NodeAnnouncer announces single node on Zookeeper and only watch this node, while Announcer watches all child path, not only this node

private final List<Announceable> toUpdate = new ArrayList<>();
private final ConcurrentMap<String, NodeCache> listeners = new ConcurrentHashMap<>();
private final ConcurrentMap<String, byte[]> announcements = new ConcurrentHashMap<>();
private final List<String> parentsIBuilt = new CopyOnWriteArrayList<String>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name and the purpose of this field is not clear

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is clearer to call it pathsCreatedInThisAnnouncer? Is there anything specific about them being "parents"? They are parents of what?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list is iterated only once. There is no win in using CopyOnWriteArrayList rather than a simple ArrayList protected by a lock.

private final CuratorFramework curator;

private final List<Announceable> toAnnounce = new ArrayList<>();
private final List<Announceable> toUpdate = new ArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from toAnnounce?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toUpdate is added when update() called before zk is connected, the path to update may or may not be added to toAnnounce before.
Most part of NodeAnnouncer's code logic is following the original Announcer.java

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please annotate @GuadedBy("toAnnounce").

if (removeParentsIfCreated) {
parentsIBuilt.add(parentPath);
}
log.debug("Created parentPath[%s], %s remove on stop.", parentPath, removeParentsIfCreated ? "will" : "will not");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "stop" mean here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stop is the stop() method annotated with LifecycleStop

catch (KeeperException.NoNodeException e) {
log.info("node[%s] didn't exist anyway...", path);
}
catch (Exception e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the curator.xxx.forPath(parentPath) throws Exception in method signature, so we have to catch Exception

log.info("node[%s] didn't exist anyway...", path);
}
catch (Exception e) {
throw Throwables.propagate(e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

try {
cache.start();
}
catch (Exception e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the NodeCache.start() throws Exception in method signature, so we have to catch Exception

}
catch (Exception e) {
CloseQuietly.close(cache);
throw Throwables.propagate(e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@gianm
Copy link
Contributor

gianm commented Dec 20, 2018

From #6647,

If there are 10000 realtime tasks(not single realtime job, but like 50 jobs), then each task will create 2 * 10000 watches, finally we will get 200M watches!!

This seems strange & unnecessary, do you know why the tasks are creating watches for each other? They shouldn't be watching each other, I think, since they don't have a good reason to. Only the coordinators and brokers should be watching tasks, since they're the only ones that care about them (for lookups & segment discovery purposes).

Does Announcer set up too many watches, for some reason?

@kaijianding
Copy link
Contributor Author

@gianm I'm not familiar with curator code, but it seems because we watch the parent path to get child events, curator adds watch for each child path

@jihoonson
Copy link
Contributor

jihoonson commented Jan 2, 2019

Hi @kaijianding, I still don't understand why you have such many watchers. In Druid, only master nodes (brokers, coordinator, overlord if you use zk-based remoteTaskRunner) watch such announcements. For example, AbstractCuratorServerInventoryView watches /some/zk/path/announcements which is used only in brokers and the coordinator. Similarly, /some/zk/path/listeners/lookups/ is watched only by the coordinator (LookupCoordinatorManager). Are you seeing that other modules like tasks are also watching those zk paths?

@kaijianding
Copy link
Contributor Author

@jihoonson Announcer.java is also used in realtime node to re-announce itself when temporary disconnecting to zk, Then a realtime node also watches /some/zk/path/announcements

@leventov
Copy link
Member

@kaijianding most answers to my questions that you given should be transformed into code comments, or the code should be altered in some other way. The objective is to approach the state that if I read the code for the first time, none of my questions were arisen.

@jihoonson
Copy link
Contributor

@kaijianding thanks. Would you please fix the conflicts and check @leventov's last comment?

@jon-wei jon-wei removed this from the 0.14.0 milestone Feb 12, 2019
@kaijianding
Copy link
Contributor Author

@jihoonson @leventov @gianm conflicts are resolved and code comments are updated, do you have any more comment?

@leventov
Copy link
Member

@kaijianding
Copy link
Contributor Author

@leventov I added comments to toAnnounce, toUpdate, parentsIBuilt, and modified some other comments to help better understanding. Could you tell me which part of comments you think can improve?


private final CuratorFramework curator;

// incase a path is added to `toAnnounce` in announce() before zk is connected,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Javadoc /** ... */ comments for fields.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"In case", missing space.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right that it's implied in this comment that announce() may be called concurrently by somebody? How is this possible before start()? Could you please add concurrent control flow documentation to announce() and start() methods?

// incase a path is added to `toAnnounce` in announce() before zk is connected,
// should remember the path and do announce in start() later
private final List<Announceable> toAnnounce = new ArrayList<>();
// incase a path is added to `toUpdate` in update() before zk is connected,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"In case"


// incase a path is added to `toAnnounce` in announce() before zk is connected,
// should remember the path and do announce in start() later
private final List<Announceable> toAnnounce = new ArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's clearer to call it toAnnounceInStart?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please annotate @GuadedBy("toAnnounce"). Could use create a separate lock object in a field called "lock"?

* NodeAnnouncer announces single node on Zookeeper and only watches this node,
* while {@link Announcer} watches all child paths, not only this node
*/
public class NodeAnnouncer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NodeAnnouncer and Announcer share a lot of code. Is it possible to extract a common base class and either extend it as Announcer and NodeAnnouncer or just add a flag to that basic class's state and use something like BaseAnnouncer(watchAll=true) as Announcer and BaseAnnouncer(watchAll=true) as NodeAnnouncer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add concurrent control flow documentation to this class? From what threads (thread pools, executors) each method may be called?

private final List<Announceable> toUpdate = new ArrayList<>();
private final ConcurrentMap<String, NodeCache> listeners = new ConcurrentHashMap<>();
private final ConcurrentMap<String, byte[]> announcedPaths = new ConcurrentHashMap<>();
// only who creates the parent path can drop the parent path, so should remmeber the created parents
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"only who creates the parent path can drop the parent path" - "Who" here is a thread? A class? Is this rule imposed by the Curator framework?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"remember"

private final CuratorFramework curator;

private final List<Announceable> toAnnounce = new ArrayList<>();
private final List<Announceable> toUpdate = new ArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please annotate @GuadedBy("toAnnounce").

synchronized (toAnnounce) {
try {
if (!Arrays.equals(oldBytes, bytes)) {
announcedPaths.put(path, bytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a race condition is possible between announcedPaths.get() and announcedPaths.put(), is that benign? If not, please rewrite the code using map.compute().


boolean created = false;
synchronized (toAnnounce) {
if (started) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that started flag is already checked at the beginning of this method, why it is checked again here?

}
);

if (started) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that started flag is already checked at the beginning of this method, why it is checked again here?

}
CloseQuietly.close(closer);

for (String announcementPath : announcedPaths.keySet()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entries are removed via map.remove() while iterating over the map. I cannot find proof that this is safe; it would be better to remove them via iterator.remove().

@fjy fjy added this to the 0.15.0 milestone Mar 11, 2019
@leventov
Copy link
Member

@kaijianding do you plan to continue this work?

@jihoonson
Copy link
Contributor

@kaijianding would you please fix the conflicts and check @leventov's latest comments? BTW, I'm untagging milestone since this issue is not necessarily a release blocker. Feel free to let me know if you think this should be.

@jihoonson jihoonson removed this from the 0.15.0 milestone May 7, 2019
@stale
Copy link

stale bot commented Jul 6, 2019

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Jul 6, 2019
@stale
Copy link

stale bot commented Jul 13, 2019

This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@stale stale bot closed this Jul 13, 2019
@pzhdfy
Copy link
Contributor

pzhdfy commented Jan 11, 2020

image
We use this patch in our production environment,the watch count significantly decreased .
And anything works fine, so I raise for this pr

@stale
Copy link

stale bot commented Jan 11, 2020

This pull request/issue is no longer marked as stale.

@stale stale bot removed the stale label Jan 11, 2020
@asdf2014 asdf2014 mentioned this pull request Jan 13, 2020
9 tasks
@stale
Copy link

stale bot commented Mar 14, 2020

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Mar 14, 2020
@stale
Copy link

stale bot commented Apr 11, 2020

This pull request/issue has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@stale stale bot closed this Apr 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

huge number of watch in zookeeper cause zookeeper full gc
7 participants