feat(server): support single computer do schedule task#2676
feat(server): support single computer do schedule task#2676shirleyStorage wants to merge 5 commits into
Conversation
Signed-off-by: shirley118146 <shirley118146@gmail.com>
Signed-off-by: shirley118146 <shirley118146@gmail.com>
Signed-off-by: shirley118146 <shirley118146@gmail.com>
|
@imbajin PTAL~ |
|
@imbajin I updated the test case to the PR description. PTAL ~~ |
|
@dosu how to rerun "HugeGraph-Server CI / build-server (hbase, 11)" |
2024-10-22.22.40.08.mov |
javeme
left a comment
There was a problem hiding this comment.
Thanks for your contribution.
Can I know in what scenario you need to do this?
|
|
||
| boolean singleNode = !hasWorkerNode||computerNodeCount==1; | ||
| if (singleNode != this.onlySingleNode) { | ||
| LOG.info("Switch only_single_node 02 to {}", singleNode); |
| } | ||
| if (server.role().master()) { | ||
| continue; | ||
| }else if (server.role().computer()){ |
There was a problem hiding this comment.
add a space after '}', or remove 'else'
| return this.selfNodeRole() != null && this.selfNodeRole().master(); | ||
| public boolean selfIsMasterOrSingleComputer() { | ||
| boolean isMaster=this.selfNodeRole() != null && this.selfNodeRole().master(); | ||
| boolean isSingleComputer=isStandAloneComputer(); |
| return this.selfNodeRole() != null && this.selfNodeRole().computer(); | ||
| } | ||
|
|
||
| public boolean isStandAloneComputer(){ |
|
Hello, I'm trying to disable the server-role option(https://github.com/hugegraph/hugegraph/pull/85). I'd like to ask what blocked this pr merge before? |
There was a problem hiding this comment.
Pull request overview
This PR aims to let a standalone computer role node schedule and execute tasks in HugeGraph’s task scheduling flow.
Changes:
- Allows master checks to pass for
selfIsMasterOrSingleComputer(). - Adds standalone-computer detection in
ServerInfoManager. - Refreshes single-node state during scheduler heartbeat flow.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/task/TaskManager.java |
Updates scheduling gate to include standalone computer nodes. |
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/task/StandardTaskScheduler.java |
Allows standard task scheduling/cancel master checks for standalone computer nodes. |
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/task/ServerInfoManager.java |
Adds standalone-computer detection and recalculates single-node state. |
hugegraph-server/hugegraph-core/src/main/java/org/apache/hugegraph/task/DistributedTaskScheduler.java |
Updates distributed scheduler requirement checks for standalone computer nodes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public boolean selfIsMasterOrSingleComputer() { | ||
| boolean isMaster=this.selfNodeRole() != null && this.selfNodeRole().master(); | ||
| boolean isSingleComputer=isStandAloneComputer(); | ||
| return isMaster||isSingleComputer; | ||
| } | ||
|
|
||
| public boolean selfIsComputer() { | ||
| return this.selfNodeRole() != null && this.selfNodeRole().computer(); | ||
| } | ||
|
|
||
| public boolean isStandAloneComputer(){ | ||
| return this.onlySingleNode() && this.selfIsComputer(); |
| protected synchronized void updateIsSingleNode(){ | ||
| Collection<HugeServerInfo> servers=this.allServerInfos(); | ||
| boolean hasWorkerNode = false; | ||
| long now = DateUtil.now().getTime(); | ||
| int computerNodeCount=0; | ||
|
|
||
| // Iterate servers to find suitable one | ||
| for (HugeServerInfo server : servers) { | ||
| if (!server.alive()) { | ||
| continue; | ||
| } | ||
| if (server.role().master()) { | ||
| continue; | ||
| }else if (server.role().computer()){ | ||
| computerNodeCount++; | ||
| } | ||
| hasWorkerNode = true; | ||
| } | ||
|
|
||
| boolean singleNode = !hasWorkerNode||computerNodeCount==1; | ||
| if (singleNode != this.onlySingleNode) { | ||
| LOG.info("Switch only_single_node 02 to {}", singleNode); | ||
| this.onlySingleNode = singleNode; | ||
| } | ||
| } |
imbajin
left a comment
There was a problem hiding this comment.
I found two remaining behavior issues around standalone computer scheduling. The existing current-head Copilot comments for checkstyle and the mixed worker/computer onlySingleNode case are valid, so I added +1 reactions there instead of duplicating them.
2 CI/status checks failed on the latest head; please check the failed jobs 🔗 https://github.com/apache/hugegraph/actions/runs/16592586432/job/46931607687.
| // Update server heartbeat | ||
| serverManager.heartbeat(); | ||
|
|
||
| serverManager.updateIsSingleNode(); |
There was a problem hiding this comment.
onlySingleNode is first refreshed here, but selfIsMasterOrSingleComputer() is already used by StandardTaskScheduler.checkOnMasterNode() before this periodic loop runs. Since TaskManager starts this loop with a 10 * SCHEDULE_PERIOD initial delay, a standalone server.role=computer node can still reject the first task request after startup with Can't schedule task on non-master server. Please initialize this state during server-info bootstrap, or compute the standalone-computer condition on demand instead of waiting for the background scheduler.
| return; | ||
| } | ||
| if (this.selfIsMaster()) { | ||
| if (this.selfIsMasterOrSingleComputer()) { |
There was a problem hiding this comment.
ServerInfo entry is missing. Before this patch, non-master nodes recreated the missing record locally via saveServerInfo(...); after this change, a standalone computer returns here and waits for re-init like a master. In standalone server.role=computer mode there is no master to recreate that row, so once the server-info vertex is lost the node cannot recover itself anymore. Please keep the local re-save path for standalone computers, or add a dedicated re-init path for them.
imbajin
left a comment
There was a problem hiding this comment.
I added +1 reactions to the existing Copilot comments that already cover the checkstyle breakage and the single-computer misclassification bug. One additional gap remains around missing coverage.
2 CI/status checks failed; please check the failed jobs: 🔗 https://github.com/apache/hugegraph/actions/runs/16592586432/job/46931607687
| return !this.closed && this.graph.started() && this.graph.initialized(); | ||
| } | ||
|
|
||
| protected synchronized void updateIsSingleNode(){ |
There was a problem hiding this comment.
This scheduler role change needs automated coverage. The PR changes when a computer node can pass master-only scheduling checks and recalculates onlySingleNode, but no test files are changed, so the standalone-computer path and the master + worker + single-computer case are unguarded. Please add tests that cover a single computer node scheduling a task and a mixed cluster where one computer node must not make the whole deployment behave as single-node.
imbajin
left a comment
There was a problem hiding this comment.
One additional standalone-computer scheduling gap remains in the distributed scheduler path.
2 CI/status checks failed on the latest head; please check the failed jobs 🔗 https://github.com/apache/hugegraph/actions/runs/16592586432/job/46931607687.
| @Override | ||
| public void checkRequirement(String op) { | ||
| if (!this.serverManager().selfIsMaster()) { | ||
| if (!this.serverManager().selfIsMasterOrSingleComputer()) { |
There was a problem hiding this comment.
DistributedTaskScheduler now also depends on selfIsMasterOrSingleComputer(), but this scheduler never refreshes the new onlySingleNode flag. TaskManager.scheduleOrExecuteJobForGraph() only calls updateIsSingleNode() inside the StandardTaskScheduler branch, while each scheduler owns its own ServerInfoManager initialized with onlySingleNode = false. In scheduler.type=distributed, a standalone server.role=computer node therefore still fails this requirement check permanently instead of passing as a standalone computer.
Purpose of the PR
Main Changes
Verifying these changes
Does this PR potentially affect the following parts?
Documentation Status
Doc - TODODoc - DoneDoc - No Need