Skip to content

fix #13837 the Invalid log : worker worker have not received the hear…#13838

Closed
lxorc wants to merge 3 commits intoapache:devfrom
lxorc:patch-2
Closed

fix #13837 the Invalid log : worker worker have not received the hear…#13838
lxorc wants to merge 3 commits intoapache:devfrom
lxorc:patch-2

Conversation

@lxorc
Copy link
Contributor

@lxorc lxorc commented Mar 30, 2023

fix #13837 the Invalid log : worker worker have not received the hearbet

@ruanwenjun
Copy link
Member

ruanwenjun commented Mar 31, 2023

We don't need to add this check, this case might be due to we registering an empty node? Could you please describe how to reproduce this.

@lxorc
Copy link
Contributor Author

lxorc commented Mar 31, 2023

We don't need to add this check, this case might be due to we registering an empty node? Could you please describe how to reproduce this.

Yes, by default a worker node will be loaded in registryClient.subscribe(REGISTRY_DOLPHINSCHEDULER_WORKERS, new WorkerDataListener());

registryClient.subscribe(REGISTRY_DOLPHINSCHEDULER_WORKERS, new WorkerDataListener()); 
String[] parts = path.split("/");  // REGISTRY_DOLPHINSCHEDULER_WORKERS = /nodes/worker
final String workerAddress = parts[parts.length - 1];  // workerAddress  = worker
if (type == Type.UPDATE) // the Type.UPDATE is  Node /nodes/worker update ,   case NODE_UPDATED: type(Type.UPDATE);
syncSingleWorkerNodeInfo(workerAddress, JSONUtils.parseObject(data, WorkerHeartBeat.class));  //  ???
workerNodeInfo.put(workerAddress, info);  // put "worker" key into the workerNodeInfo

The effect is that the addr of getHostWeight will be the worker and heartBeat == null.

so

        if (heartBeat == null) {
            logger.warn("worker {} in work group {} have not received the heartbeat", addr, workerGroup);
            return Optional.empty();
        }

@ruanwenjun
Copy link
Member

ruanwenjun commented Mar 31, 2023

We don't need to add this check, this case might be due to we registering an empty node? Could you please describe how to reproduce this.

Yes, by default a worker node will be loaded in registryClient.subscribe(REGISTRY_DOLPHINSCHEDULER_WORKERS, new WorkerDataListener());

registryClient.subscribe(REGISTRY_DOLPHINSCHEDULER_WORKERS, new WorkerDataListener()); 
String[] parts = path.split("/");  // REGISTRY_DOLPHINSCHEDULER_WORKERS = /nodes/worker
final String workerAddress = parts[parts.length - 1];  // workerAddress  = worker
if (type == Type.UPDATE) // the Type.UPDATE is  Node /nodes/worker update ,   case NODE_UPDATED: type(Type.UPDATE);
syncSingleWorkerNodeInfo(workerAddress, JSONUtils.parseObject(data, WorkerHeartBeat.class));  //  ???
workerNodeInfo.put(workerAddress, info);  // put "worker" key into the workerNodeInfo

The effect is that the addr of getHostWeight will be the worker and heartBeat == null.

so

        if (heartBeat == null) {
            logger.warn("worker {} in work group {} have not received the heartbeat", addr, workerGroup);
            return Optional.empty();
        }

So we only need to subscribe the event of REGISTRY_DOLPHINSCHEDULER_WORKERS's child path

@lxorc
Copy link
Contributor Author

lxorc commented Mar 31, 2023

We don't need to add this check, this case might be due to we registering an empty node? Could you please describe how to reproduce this.

Yes, by default a worker node will be loaded in registryClient.subscribe(REGISTRY_DOLPHINSCHEDULER_WORKERS, new WorkerDataListener());

registryClient.subscribe(REGISTRY_DOLPHINSCHEDULER_WORKERS, new WorkerDataListener()); 
String[] parts = path.split("/");  // REGISTRY_DOLPHINSCHEDULER_WORKERS = /nodes/worker
final String workerAddress = parts[parts.length - 1];  // workerAddress  = worker
if (type == Type.UPDATE) // the Type.UPDATE is  Node /nodes/worker update ,   case NODE_UPDATED: type(Type.UPDATE);
syncSingleWorkerNodeInfo(workerAddress, JSONUtils.parseObject(data, WorkerHeartBeat.class));  //  ???
workerNodeInfo.put(workerAddress, info);  // put "worker" key into the workerNodeInfo

The effect is that the addr of getHostWeight will be the worker and heartBeat == null.
so

        if (heartBeat == null) {
            logger.warn("worker {} in work group {} have not received the heartbeat", addr, workerGroup);
            return Optional.empty();
        }

So we only need to subscribe the event of REGISTRY_DOLPHINSCHEDULER_WORKERS's child path

Please help to review my code to see if there are other problems. If there is a better implementation, please reply to me. And I have fixed it in my production environment and it works fine.

@ruanwenjun
Copy link
Member

We don't need to add this check, this case might be due to we registering an empty node? Could you please describe how to reproduce this.

Yes, by default a worker node will be loaded in registryClient.subscribe(REGISTRY_DOLPHINSCHEDULER_WORKERS, new WorkerDataListener());

registryClient.subscribe(REGISTRY_DOLPHINSCHEDULER_WORKERS, new WorkerDataListener()); 
String[] parts = path.split("/");  // REGISTRY_DOLPHINSCHEDULER_WORKERS = /nodes/worker
final String workerAddress = parts[parts.length - 1];  // workerAddress  = worker
if (type == Type.UPDATE) // the Type.UPDATE is  Node /nodes/worker update ,   case NODE_UPDATED: type(Type.UPDATE);
syncSingleWorkerNodeInfo(workerAddress, JSONUtils.parseObject(data, WorkerHeartBeat.class));  //  ???
workerNodeInfo.put(workerAddress, info);  // put "worker" key into the workerNodeInfo

The effect is that the addr of getHostWeight will be the worker and heartBeat == null.
so

        if (heartBeat == null) {
            logger.warn("worker {} in work group {} have not received the heartbeat", addr, workerGroup);
            return Optional.empty();
        }

So we only need to subscribe the event of REGISTRY_DOLPHINSCHEDULER_WORKERS's child path

Please help to review my code to see if there are other problems. If there is a better implementation, please reply to me. And I have fixed it in my production environment and it works fine.

Does master path will have this problem? could you please change the master together?

@ruanwenjun ruanwenjun added 3.2.0 for 3.2.0 version bug Something isn't working labels Apr 1, 2023
@ruanwenjun ruanwenjun added this to the 3.2.0 milestone Apr 1, 2023
@ruanwenjun
Copy link
Member

Please fix the code style by mvn spotless:apply

…in/java/org/apache/dolphinscheduler/registry/api/RegistryClient.java

Co-authored-by: Gallardot <gallardot@apache.org>
@zhongjiajie zhongjiajie modified the milestones: 3.2.0, 3.3.0 Aug 30, 2023
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity for 120 days. It will be closed in 7 days if no further activity occurs.

@github-actions github-actions bot added the Stale label Jan 13, 2024
@github-actions
Copy link

This pull request has been closed because it has not had recent activity. You could reopen it if you try to continue your work, and anyone who are interested in it are encouraged to continue work on this pull request.

@github-actions github-actions bot closed this Jan 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3.2.0 for 3.2.0 version backend bug Something isn't working Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] [Master] worker worker have not received the heartbeat

4 participants