Describe the new feature
I suggest adding two concepts for managing specific service instances within a task:
- Commands
service <name> freeze / unfreeze:
freeze: Completely stops the instance and marks it with a internal flag that prohibits any automatic startup by the TickLoop.
- Core Logic: During the "frozen" state, the internal
minServiceCount check (located in DefaultTickLoop#startService) must count this server as "occupied/present". This ensures the system does not attempt to create or start a new server to replace the frozen one.
- Command
service <name> autoStart <true/false>:
- Allows overriding the auto-start behavior for a specific static instance without modifying the global settings of the
ServiceTask.
- When set to
false, the service should be ignored by the automatic service selection logic in CloudServiceManager, making it startable only through manual command or API call.
Why do you need this feature?
As a DevOps engineer managing large-scale networks (200–1500+ players) with dozens of static servers, "targeted" maintenance is a frequent necessity. Tasks like wiping a specific SMP instance, performing seasonal map cleanups, or manually updating playerdata require the server to be completely offline to prevent file corruption.
Currently, CloudNet's minServiceCount mechanism makes this impossible to automate safely. As soon as a static instance is stopped for maintenance, the TickLoop detects a "shortage" and immediately tries to restart the same instance or spawn a replacement.
This leads to:
- Persistent file access conflicts.
- Data corruption due to concurrent writes.
- Unnecessary resource load on the nodes.
Changing the minServiceCount for the whole task or putting the entire task into maintenance mode is not viable, as it affects all other healthy servers in that group.
Default Task Config Examples
This implementation would allow DevOps engineers to fully automate maintenance (via Ansible/API) for specific servers without "fighting" the CloudNet automation.
Technical reference:
The issue resides in the current implementation of eu.cloudnetservice.node.impl.tick.DefaultTickLoop#startService, where only services with ServiceLifeCycle.RUNNING are counted towards the minimum service requirement.
Configuration Example (ServiceTask properties):
To support this per-instance, these flags could be stored in the properties of the ServiceTask or directly in the service's own configuration/snapshot. For a task managing multiple static instances, it might look like this:
{
"name": "FFA",
"minServiceCount": 3,
"staticServices": true,
"properties": {
"instanceOverrides": {
"FFA-1": {
"autoStart": false,
"frozen": true,
"maintenance": true
},
"FFA-2": {
"autoStart": true,
"frozen": false,
"maintenance": false
},
"FFA-3": {
"autoStart": false,
"frozen": false,
"maintenance": true
}
}
}
}
- FFA-1: Completely ignored by minServiceCount (counted as occupied) and cannot be started manually without unfreezing.
- FFA-2: Operates as a standard automated service.
- FFA-3: Exists and is ready, but will never be started by the TickLoop automation; it requires a manual service FFA-3 start command.
Why do you need this feature?
As a DevOps engineer managing large-scale networks (200–1500+ players) with dozens of static servers, "targeted" maintenance is a frequent necessity. Tasks like wiping a specific SMP instance, performing seasonal map cleanups, or manually updating playerdata require the server to be completely offline to prevent file corruption.
Currently, CloudNet's minServiceCount mechanism makes this impossible to automate safely. As soon as a static instance is stopped for maintenance, the TickLoop detects a "shortage" and immediately tries to restart the same instance or spawn a replacement.
This leads to:
- Persistent file access conflicts.
- Data corruption due to concurrent writes.
- Unnecessary resource load on the nodes.
Changing the minServiceCount for the whole task or putting the entire task into maintenance mode is not viable, as it affects all other healthy servers in that group.
Alternatives
I have considered solving this via a custom module, but it is technically ineffective. While a module can cancel the CloudServicePreLifecycleEvent, it cannot intervene in the DefaultTickLoop counting logic. The core will still see "insufficient services" and attempt a restart every second, resulting in an endless loop of cancelled events and console spam. This functionality needs to be native to the core's lifecycle management.
Other
No response
Issue uniqueness
Describe the new feature
I suggest adding two concepts for managing specific service instances within a task:
service <name> freeze / unfreeze:freeze: Completely stops the instance and marks it with a internal flag that prohibits any automatic startup by theTickLoop.minServiceCountcheck (located inDefaultTickLoop#startService) must count this server as "occupied/present". This ensures the system does not attempt to create or start a new server to replace the frozen one.service <name> autoStart <true/false>:ServiceTask.false, the service should be ignored by the automatic service selection logic inCloudServiceManager, making it startable only through manual command or API call.Why do you need this feature?
As a DevOps engineer managing large-scale networks (200–1500+ players) with dozens of static servers, "targeted" maintenance is a frequent necessity. Tasks like wiping a specific SMP instance, performing seasonal map cleanups, or manually updating playerdata require the server to be completely offline to prevent file corruption.
Currently, CloudNet's
minServiceCountmechanism makes this impossible to automate safely. As soon as a static instance is stopped for maintenance, theTickLoopdetects a "shortage" and immediately tries to restart the same instance or spawn a replacement.This leads to:
Changing the
minServiceCountfor the whole task or putting the entire task intomaintenancemode is not viable, as it affects all other healthy servers in that group.Default Task Config Examples
This implementation would allow DevOps engineers to fully automate maintenance (via Ansible/API) for specific servers without "fighting" the CloudNet automation.
Technical reference:
The issue resides in the current implementation of
eu.cloudnetservice.node.impl.tick.DefaultTickLoop#startService, where only services withServiceLifeCycle.RUNNINGare counted towards the minimum service requirement.Configuration Example (ServiceTask properties):
To support this per-instance, these flags could be stored in the
propertiesof theServiceTaskor directly in the service's own configuration/snapshot. For a task managing multiple static instances, it might look like this:{ "name": "FFA", "minServiceCount": 3, "staticServices": true, "properties": { "instanceOverrides": { "FFA-1": { "autoStart": false, "frozen": true, "maintenance": true }, "FFA-2": { "autoStart": true, "frozen": false, "maintenance": false }, "FFA-3": { "autoStart": false, "frozen": false, "maintenance": true } } } }Why do you need this feature?
As a DevOps engineer managing large-scale networks (200–1500+ players) with dozens of static servers, "targeted" maintenance is a frequent necessity. Tasks like wiping a specific SMP instance, performing seasonal map cleanups, or manually updating playerdata require the server to be completely offline to prevent file corruption.
Currently, CloudNet's
minServiceCountmechanism makes this impossible to automate safely. As soon as a static instance is stopped for maintenance, theTickLoopdetects a "shortage" and immediately tries to restart the same instance or spawn a replacement.This leads to:
Changing the
minServiceCountfor the whole task or putting the entire task intomaintenancemode is not viable, as it affects all other healthy servers in that group.Alternatives
I have considered solving this via a custom module, but it is technically ineffective. While a module can cancel the
CloudServicePreLifecycleEvent, it cannot intervene in theDefaultTickLoopcounting logic. The core will still see "insufficient services" and attempt a restart every second, resulting in an endless loop of cancelled events and console spam. This functionality needs to be native to the core's lifecycle management.Other
No response
Issue uniqueness