New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rename maintenance mode to decommission #7154
Changes from 9 commits
950b03f
ab31ac5
85f6ef6
5fd6ffe
acfea61
756ac62
335a84a
fc54161
ce2555a
e9b1242
1da2cb8
5abb5da
4834037
4b5bd4c
be4daba
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -783,8 +783,8 @@ A sample Coordinator dynamic config JSON object is shown below: | |
"replicationThrottleLimit": 10, | ||
"emitBalancingStats": false, | ||
"killDataSourceWhitelist": ["wikipedia", "testDatasource"], | ||
"historicalNodesInMaintenance": ["localhost:8182", "localhost:8282"], | ||
"nodesInMaintenancePriority": 7 | ||
"decommissioningNodes": ["localhost:8182", "localhost:8282"], | ||
"decommissioningVelocity": 7 | ||
} | ||
``` | ||
|
||
|
@@ -803,9 +803,9 @@ Issuing a GET request at the same URL will return the spec that is currently in | |
|`killDataSourceWhitelist`|List of dataSources for which kill tasks are sent if property `druid.coordinator.kill.on` is true. This can be a list of comma-separated dataSources or a JSON array.|none| | ||
|`killAllDataSources`|Send kill tasks for ALL dataSources if property `druid.coordinator.kill.on` is true. If this is set to true then `killDataSourceWhitelist` must not be specified or be empty list.|false| | ||
|`killPendingSegmentsSkipList`|List of dataSources for which pendingSegments are _NOT_ cleaned up if property `druid.coordinator.kill.pendingSegments.on` is true. This can be a list of comma-separated dataSources or a JSON array.|none| | ||
|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are "slow" processes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, acceptable replication time and number of processes. Value 1000 could be a start point for a rather big cluster. Default value is 0 (loading queue is unbounded) |0| | ||
|`historicalNodesInMaintenance`| List of Historical nodes in maintenance mode. Coordinator doesn't assign new segments on those nodes and moves segments from the nodes according to a specified priority.|none| | ||
|`nodesInMaintenancePriority`| Priority of segments from servers in maintenance. Coordinator takes ceil(maxSegmentsToMove * (priority / 10)) from servers in maitenance during balancing phase, i.e.:<br>0 - no segments from servers in maintenance will be processed during balancing<br>5 - 50% segments from servers in maintenance<br>10 - 100% segments from servers in maintenance<br>By leveraging the priority an operator can prevent general nodes from overload or decrease maitenance time instead.|7| | ||
|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are "slow" nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, acceptable replication time and number of nodes. Value 1000 could be a start point for a rather big cluster. Default value is 0 (loading queue is unbounded) |0| | ||
|`decommissioningNodes`| List of 'decommissioning' historical servers. The Coordinator doesn't assign new segments to these servers and moves segments away from the 'decommissioning' servers at the maximum rate specified by `decommissioningVelocity`.|none| | ||
|`decommissioningVelocity`| Decommissioning velocity determines the maximum number of segments that may be moved away from 'decommissioning' servers to non-decommissioning (that is, active) servers during one Coordinator's run. This value is relative to the total maximum segment movements allowed during one run which is determined by the `maxSegmentsToMove` configuration. Specifically, the maximum is `ceil(maxSegmentsToMove * (velocity / 10))`. For example, if `decommissioningVelocity` is 5, no more than `ceil(maxSegmentsToMove * 0.5)` segments may be moved away from 'decommissioning' servers. If `decommissioningVelocity` is 0, segments will neither be moved from _or to_ 'decommissioning' servers, effectively putting them in a sort of 'maintenance' mode that will not participate in balancing or assignment by load rules. Decommissioning can also become stalled if there are no available active servers to place the segments. By leveraging the velocity an operator can prevent active servers from overload by prioritizing balancing, or decrease decommissioning time instead. The value should be between 0 and 10.|7| | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry for asking you doing and re-doing renames and docs, but I think we should better use "percent" than "velocity". The reason is that after fixing #7159, there should be a single cap, Now, there is an observation that moving segments away from decommissioning nodes looks very much like a temporary "drop" rule "for servers". For this reason, I want the configurations that specify min guaranteed balancing quota and max 'decommissioning' movement quota to use the same units. The other reason why percent may be preferable is that we don't need to explain what that are with Specifically, I suggest this configuration be called "maxPercentOfDecommissioningMoves". It doesn't follow the prefix principle. "decommisioningMaxPercentOfMoves" is probably also acceptable, but because of strange word order, it's less understandable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I fully agree, I was considering this as well and think it is a lot more intuitive. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll suggest
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest "decommissioningMaxPercentInMaxSegmentsToMove", because there are two different maximums in play here:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would 'of' instead of 'in' be better, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
To view the audit history of Coordinator dynamic config issue a GET request to the URL - | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's also worth mentioning, either in this configuration description or in the description of
decommissioningVelocity
, that ifdecommissioningVelocity
is 0 then Coordinator not only doesn't move segment away from 'decomissioning' servers per se but also abstains from making any balancing movements involving 'decomissioning' servers. In this case, 'decomissioning' nodes indeed are in a sort of "maintenance" mode, as per the former config naming.I think the current descriptions don't make this clear enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, to put it more generally, Coordinator always abstains from movement decisions involving 'decomissioning' servers (other than moving segments away from 'decomissioning' to non-decomissioning; specifically, Coordinator abstains from making movement decisions between decomissioning servers and from active to decomissioning servers) and tries to move segments away from decomissioning servers to the limit imposed by
decommissioningVelocity
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that is a good point and definitely worth documenting, thanks for the catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds odd that the servers are referred to as "these servers" and then "the 'decommissioning' servers" in the same sentence. The opposite should be better, I think.