rename maintenance mode to decommission #7154

clintropolis · 2019-02-27T23:53:44Z

New coordinator config properties are:

...
  "decommissioningNodes": ["localhost:8182", "localhost:8282"],
  "decommissioningMaxPercentOfMaxSegmentsToMove": 70
...

Minor adjustments to documentation and javadocs.

leventov · 2019-02-28T11:45:51Z

docs/content/configuration/index.md

-  "historicalNodesInMaintenance": ["localhost:8182", "localhost:8282"],
-  "nodesInMaintenancePriority": 7
+  "decommissionNodes": ["localhost:8182", "localhost:8282"],
+  "decommissionPriority": 7


Parhaps better to call it "decommissionVelocity" as @drcrallen suggested here: #6349 (comment) because "priority" is a super overloaded term.

Makes sense to me, though I think the properties should be consistent named so maybe I'll either rename the other decommissioningNodes or call this one decommissionVelocity to match. Anyone have a preference?

Renamed to decommissioningNodes and decommissioningVelocity

leventov · 2019-02-28T11:54:24Z

server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java

@@ -694,7 +694,7 @@ public CoordinatorHistoricalManagerRunnable(final int startingLeaderCounter)
                }

                // Find all historical servers, group them by subType and sort by ascending usage
-                Set<String> nodesInMaintenance = params.getCoordinatorDynamicConfig().getHistoricalNodesInMaintenance();
+                Set<String> decommissioned = params.getCoordinatorDynamicConfig().getDecommissionNodes();


Suggested "decommissioningNodes"

leventov · 2019-02-28T11:54:38Z

server/src/main/java/org/apache/druid/server/coordinator/ServerHolder.java

@@ -32,18 +32,18 @@
  private static final Logger log = new Logger(ServerHolder.class);
  private final ImmutableDruidServer server;
  private final LoadQueuePeon peon;
-  private final boolean inMaintenance;
+  private final boolean isDecommissioned;


Suggested "isDecommissioning" or "beingDecommissioned"

leventov · 2019-02-28T11:58:42Z

server/src/main/java/org/apache/druid/server/coordinator/helper/DruidCoordinatorBalancer.java

    final List<ServerHolder> availableServers = partitions.get(false);
    log.info(
-        "Found %d servers in maintenance, %d available servers servers",
-        maintenanceServers.size(),
+        "Found %d decomissioned servers, %d available servers servers",


"%d decommissioning servers, % active servers"

word "servers" is duplicated

perhaps "active" is a better term in the course of this method, because servers to be decomissioned are available (in distributed systems terms) too.

Changed to 'active'

leventov · 2019-02-28T11:59:59Z

server/src/main/java/org/apache/druid/server/coordinator/helper/DruidCoordinatorBalancer.java

        availableServers.size()
    );

-    if (maintenanceServers.isEmpty()) {
+    if (decommssionedServers.isEmpty()) {
      if (availableServers.size() <= 1) {
        log.info("[%s]: %d available servers servers found.  Cannot balance.", tier, availableServers.size());


"servers" duplicated

leventov · 2019-02-28T12:05:08Z

server/src/main/java/org/apache/druid/server/coordinator/helper/DruidCoordinatorBalancer.java

      if (availableServers.size() <= 1) {
        log.info("[%s]: %d available servers servers found.  Cannot balance.", tier, availableServers.size());
      }
    } else if (availableServers.isEmpty()) {
-      log.info("[%s]: no available servers servers found during maintenance.  Cannot balance.", tier);
+      log.info("[%s]: no available servers servers found during decommissioning.  Cannot balance.", tier);


Suggested to change the message to something like "no active servers found, segments can't be moved off %d decommissioning servers"

Maybe it should be warn(), not info()?

Seems that this logging statement and the statement at line 125 duplicate the statement at line 117. Maybe this part of code should be restructured.

"no active servers found, segments can't be moved off %d decommissioning servers" leads me to the question: should we really give up decommission in this case? Maybe if segments are available on historical nodes from other tiers (or from the same tier, on other decommissioning servers or on full active servers -- see below), or actually always, we should continue decommission by just dropping segments from the decommissioning servers and not moving them anywhere?

Also this condition doesn't cover the situation when there are active historical nodes in the tier, but they are either full, or their load queuess have maxSegmentsInNodeLoadingQueue elements (loading segments). (Note that in the latter case, balancing step won't be skipped entirely in the if (!currentlyMovingSegments.get(tier).isEmpty()) { branch above in the code, because historicals' segment loading queues may be filled up by LoadRule, because LoadRule uses an independent mechanism to implement the "making a balancing burst and then letting all segments to be loaded before the next balacing burst" pattern, ReplicationThrottler).

I elaborated on the problems related to the independence of DruidCoordinatorBalancer and LoadRule in #7159.

"no active servers found, segments can't be moved off %d decommissioning servers" leads me to the question: should we really give up decommission in this case? Maybe if segments are available on historical nodes from other tiers (or from the same tier, on other decommissioning servers or on full active servers -- see below), or actually always, we should continue decommission by just dropping segments from the decommissioning servers and not moving them anywhere?

@egor-ryashin, what's your take on this?

@leventov the initial idea was to keep the replication at the same level.

Consolidated the 'cannot balance' checks here and changed the message here to just log.warn that there is an insufficient number of active servers to do anything and added a return

the initial idea was to keep the replication at the same level.

This should be reflected in docs, as well as other corner cases related to decommissioning nodes.

I'll try to add docs that decommissioning can become stalled if there are no active servers available to move segments to.

leventov · 2019-02-28T12:21:26Z

...r/src/test/java/org/apache/druid/server/coordinator/rules/BroadcastDistributionRuleTest.java

@@ -59,8 +59,8 @@
  private DataSegment smallSegment;
  private DruidCluster secondCluster;
  private ServerHolder generalServer;
-  private ServerHolder maintenanceServer2;
-  private ServerHolder maintenanceServer1;
+  private ServerHolder decommissionedServer2;


Suggested "decommissioning" prefix throughout the tests

leventov · 2019-02-28T12:22:42Z

server/src/main/java/org/apache/druid/server/coordinator/helper/DruidCoordinatorBalancer.java

+    int priority = params.getCoordinatorDynamicConfig().getDecommissionPriority();
+    int maxDecommissionedSegmentsToMove = (int) Math.ceil(maxSegmentsToMove * priority / 10.0);
+    log.info("Processing %d segments from decommissioned servers", maxDecommissionedSegmentsToMove);
+    Pair<Integer, Integer> decommissionedResult =


Suggested "decommissioningResult"

leventov · 2019-02-28T12:24:20Z

server/src/main/java/org/apache/druid/server/coordinator/helper/DruidCoordinatorBalancer.java

-    int maxGeneralSegmentsToMove = maxSegmentsToMove - maintenanceResult.lhs;
-    log.info("Processing %d segments from servers in general mode", maxGeneralSegmentsToMove);
+    int priority = params.getCoordinatorDynamicConfig().getDecommissionPriority();
+    int maxDecommissionedSegmentsToMove = (int) Math.ceil(maxSegmentsToMove * priority / 10.0);


Suggested "maxSegmentsToMoveOffDecommissioningNodes"

I'd vote for smth ending in "segments".

leventov · 2019-02-28T12:28:23Z

server/src/test/java/org/apache/druid/server/coordinator/DruidCoordinatorBalancerTest.java

@@ -251,7 +251,7 @@ public void testMoveMaintenancePriority()
  }

  @Test
-  public void testZeroMaintenancePriority()
+  public void testZeroDecommissionPriority()
  {
    DruidCoordinatorRuntimeParams params = setupParamsForMaintenancePriority(0);


Residual "Maintenance". Please search the whole codebase for this.

egor-ryashin · 2019-02-28T20:38:36Z

server/src/main/java/org/apache/druid/server/coordinator/helper/DruidCoordinatorBalancer.java

      if (availableServers.size() <= 1) {
        log.info("[%s]: %d available servers servers found.  Cannot balance.", tier, availableServers.size());
      }
    } else if (availableServers.isEmpty()) {
-      log.info("[%s]: no available servers servers found during maintenance.  Cannot balance.", tier);
+      log.info("[%s]: no available servers servers found during decommissioning.  Cannot balance.", tier);


@leventov the initial idea was to keep the replication at the same level.

server/src/main/java/org/apache/druid/server/coordinator/helper/DruidCoordinatorBalancer.java

clintropolis · 2019-02-28T22:32:55Z

Thanks for review @leventov and @egor-ryashin 👍

leventov · 2019-02-28T23:05:41Z

server/src/test/java/org/apache/druid/server/http/CoordinatorDynamicConfigTest.java

@@ -257,7 +257,7 @@ private void assertConfig(
      Set<String> expectedKillableDatasources,
      boolean expectedKillAllDataSources,
      int expectedMaxSegmentsInNodeLoadingQueue,
-      Set<String> decommissioned,
+      Set<String> decommissioning,
      int decommissionPriority


Residual 'priority'. Please search the codebase for other residuals

…rvers

…to-decommission

fjy · 2019-03-02T20:24:43Z

@leventov @egor-ryashin any more comments?

egor-ryashin · 2019-03-03T23:06:17Z

Aside from other comments, LGTM.

fjy · 2019-03-04T02:58:24Z

@egor-ryashin can you change your status to approval?

clintropolis · 2019-03-04T21:32:21Z

I think all comments should be addressed as of the last commit. I couldn't find any remaining related uses of 'maintenance' or 'priority', and docs have been updated.

leventov · 2019-03-04T22:43:51Z

docs/content/configuration/index.md

-|`nodesInMaintenancePriority`| Priority of segments from servers in maintenance. Coordinator takes ceil(maxSegmentsToMove * (priority / 10)) from servers in maitenance during balancing phase, i.e.:<br>0 - no segments from servers in maintenance will be processed during balancing<br>5 - 50% segments from servers in maintenance<br>10 - 100% segments from servers in maintenance<br>By leveraging the priority an operator can prevent general nodes from overload or decrease maitenance time instead.|7|
+|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are "slow" nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, acceptable replication time and number of nodes. Value 1000 could be a start point for a rather big cluster. Default value is 0 (loading queue is unbounded) |0|
+|`decommissioningNodes`| List of 'decommissioning' historical servers. The Coordinator doesn't assign new segments to these servers and moves segments from the servers at the rate specified by `decommissioningVelocity`.|none|
+|`decommissioningVelocity`| Decommissioning velocity indicates what proportion of balancer 'move' operations out of `maxSegmentsToMove` total will be spent towards 'decommissioning' servers by moving their segments to active servers, instead of normal 'balancing' moves. Coordinator takes ceil(maxSegmentsToMove * (velocity / 10)) from servers in maitenance during balancing phase, i.e.:<br>0 - no segments from decommissioning servers will be processed during balancing<br>5 - 50% segments from decommissioning servers<br>10 - 100% segments from decommissioning servers<br>By leveraging the velocity an operator can prevent general servers from overload or decrease decommissioning time instead. Decommissioning can become stalled if there are no available active servers to place the segments.|7|


I think this description may be confusing. I suggest

Decommissioning velocity determines the maximum number of segments that may be moved away from 'decommissioning' servers non-decommissioning (that is, active) servers during one Coordinator's run, relative to the total maximum segment movements allowed during one Coordinator's run (which, in turn, is determined by the maxSegmentsToMove configuration). Specifically, the maximum is ceil(maxSegmentsToMove * (velocity / 10)). For example, if decommissioningVelocity is 0 no segments will be moved away from 'decommissioning' servers. If decommissioningVelocity is 5 no more than ceil(maxSegmentsToMove * 0.5) segments may be moved away from 'decommissioning' servers. By leveraging the velocity an operator can prevent general servers from overload or decrease decommissioning time instead. Decommissioning can become stalled if there are no available active servers to place the segments. The value should be between 0 and 10.

The difference in language is that instead of saying "will be spent" and "will be processed", we say "maximum that may be moved", because Coordinator doesn't guarantee and doesn't need to fulfill the "quota". Also removed the phrase "spent towards 'decommissioning' servers" because it may be confusing: we don't do anything "towards" decommissioning servers, rather "away from".

Thanks, changed. I also modified

By leveraging the velocity an operator can prevent general servers from overload or decrease decommissioning time instead.

to

By leveraging the velocity an operator can prevent active servers from overload by prioritizing balancing, or decrease decommissioning time instead.

To switch from 'general' to 'active' terminology and try to clarify that preventing overload means prioritizing balance over decommissioning.

leventov · 2019-03-04T22:44:13Z

server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java

+   * Decommissioning velocity indicates what proportion of balancer 'move' operations out of
+   * {@link CoordinatorDynamicConfig#getMaxSegmentsToMove()} total will be spent towards 'decommissioning' servers
+   * by moving their segments to active servers, instead of normal 'balancing' segments between servers.
+   * Coordinator takes ceil(maxSegmentsToMove * (velocity / 10)) from servers in maitenance during balancing phase:


"maitenance", as well as in several other places in the PR.

oops, missed the typo :(

leventov · 2019-03-04T22:58:10Z

server/src/main/java/org/apache/druid/server/coordinator/helper/DruidCoordinatorBalancer.java

-      log.info("[%s]: no available servers servers found during maintenance.  Cannot balance.", tier);
+    if ((decommissioningServers.isEmpty() && activeServers.size() <= 1) || activeServers.isEmpty()) {
+      log.warn("[%s]: insufficient active servers. Cannot balance.", tier);
+      return;


Is it planned that we don't emit stats in this case? If yes, a comment should be left explicitly stating this.

I think makes sense to include the numbers of active and decommissioning servers to this logging statement, so that the cases are distinguishable in logs.

Is it planned that we don't emit stats in this case? If yes, a comment should be left explicitly stating this.

I think it is pretty consistent with behavior of other 'cannot balance' conditions such as still having segments moving, not having any segments, and was changed to return here based on this comment #7154 (comment) which seemed reasonable to me.

I think makes sense to include the numbers of active and decommissioning servers to this logging statement, so that the cases are distinguishable in logs.

I removed the count here because the log statement immediately before this one lists the count of 'active' and 'decommissioning' servers. Should these log messages be consolidated, or just print the count info twice?

At very minimum, a comment like "not emitting moved and unmoved counts on purpose here" is needed. So that readers of this code don't have to think whether it counts are not emitted on purpose or this is an unintentional omission. However, #7154 (comment) doesn't really explain anything to me.

I removed the count here because the log statement immediately before this one lists the count of 'active' and 'decommissioning' servers. Should these log messages be consolidated, or just print the count info twice?

Right, I missed that preceding logging statement.

At very minimum, a comment like "not emitting moved and unmoved counts on purpose here" is needed.

Hmm, currently, all of the return points in balanceTier are effectively suppressing emitting the stats if it can't find anything to do, but I'm unsure if the return points are to explicitly not emit stats or just an optimization to exit fast and not do extra work that just overlooked emitting 0 values for stats. Is this correct behavior? In other words, should balanceTier always emit these stats or is it sensible like it is?

Added comments for all places we exit where explicitly not emitting stats for now

leventov · 2019-03-04T23:00:43Z

...r/src/test/java/org/apache/druid/server/coordinator/rules/BroadcastDistributionRuleTest.java

-   * maintenance2 | large segment
+   * name             | segments
+   * -----------------+--------------
+   * general          | large segment


Probably should be "active" to match the terminology in production code.

good point, fixed

leventov · 2019-03-04T23:00:49Z

...r/src/test/java/org/apache/druid/server/coordinator/rules/BroadcastDistributionRuleTest.java

-   * general      | large & small segments
-   * maintenance1 |
-   * maintenance2 | large segment
+   * general          | large & small segments


leventov · 2019-03-04T23:01:19Z

server/src/test/java/org/apache/druid/server/coordinator/rules/LoadRuleTest.java

@@ -1019,12 +1019,12 @@ private static LoadQueuePeon createOneCallPeonMock()
    return mockPeon2;
  }

-  private static ServerHolder createServerHolder(String tier, LoadQueuePeon mockPeon1, boolean maintenance)
+  private static ServerHolder createServerHolder(String tier, LoadQueuePeon mockPeon1, boolean decommission)


isDecommissioning

leventov · 2019-03-04T23:04:10Z

docs/content/configuration/index.md

-|`historicalNodesInMaintenance`| List of Historical nodes in maintenance mode. Coordinator doesn't assign new segments on those nodes and moves segments from the nodes according to a specified priority.|none|
-|`nodesInMaintenancePriority`| Priority of segments from servers in maintenance. Coordinator takes ceil(maxSegmentsToMove * (priority / 10)) from servers in maitenance during balancing phase, i.e.:<br>0 - no segments from servers in maintenance will be processed during balancing<br>5 - 50% segments from servers in maintenance<br>10 - 100% segments from servers in maintenance<br>By leveraging the priority an operator can prevent general nodes from overload or decrease maitenance time instead.|7|
+|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are "slow" nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, acceptable replication time and number of nodes. Value 1000 could be a start point for a rather big cluster. Default value is 0 (loading queue is unbounded) |0|
+|`decommissioningNodes`| List of 'decommissioning' historical servers. The Coordinator doesn't assign new segments to these servers and moves segments from the servers at the rate specified by `decommissioningVelocity`.|none|


Suggested "... and moves segments away from the 'decommissioning' servers at the maximum rate specified by decommissioningVelocity"

sgtm, changed

fjy · 2019-03-05T06:19:31Z

@leventov any more comments?

leventov · 2019-03-05T22:31:52Z

docs/content/configuration/index.md

-|`nodesInMaintenancePriority`| Priority of segments from servers in maintenance. Coordinator takes ceil(maxSegmentsToMove * (priority / 10)) from servers in maitenance during balancing phase, i.e.:<br>0 - no segments from servers in maintenance will be processed during balancing<br>5 - 50% segments from servers in maintenance<br>10 - 100% segments from servers in maintenance<br>By leveraging the priority an operator can prevent general nodes from overload or decrease maitenance time instead.|7|
+|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are "slow" nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, acceptable replication time and number of nodes. Value 1000 could be a start point for a rather big cluster. Default value is 0 (loading queue is unbounded) |0|
+|`decommissioningNodes`| List of 'decommissioning' historical servers. The Coordinator doesn't assign new segments to these servers and moves segments away from the 'decommissioning' servers at the maximum rate specified by `decommissioningVelocity`.|none|
+|`decommissioningVelocity`| Decommissioning velocity determines the maximum number of segments that may be moved away from 'decommissioning' servers non-decommissioning (that is, active) servers during one Coordinator's run, relative to the total maximum segment movements allowed during one Coordinator's run (which, in turn, is determined by the maxSegmentsToMove configuration). Specifically, the maximum is ceil(maxSegmentsToMove * (velocity / 10)). For example, if decommissioningVelocity is 0 no segments will be moved away from 'decommissioning' servers. If decommissioningVelocity is 5 no more than ceil(maxSegmentsToMove * 0.5) segments may be moved away from 'decommissioning' servers. By leveraging the velocity an operator can prevent active servers from overload by prioritizing balancing, or decrease decommissioning time instead. Decommissioning can become stalled if there are no available active servers to place the segments. The value should be between 0 and 10.|7|


moved away from 'decommissioning' servers to non-decommissioning

Missed preposition.

Also, backticks may be used at several places in this description.

Oops, thanks will fix. I'm also splitting that first sentence into 2 since it got a bit long.

leventov · 2019-03-05T22:34:57Z

docs/content/configuration/index.md

-|`historicalNodesInMaintenance`| List of Historical nodes in maintenance mode. Coordinator doesn't assign new segments on those nodes and moves segments from the nodes according to a specified priority.|none|
-|`nodesInMaintenancePriority`| Priority of segments from servers in maintenance. Coordinator takes ceil(maxSegmentsToMove * (priority / 10)) from servers in maitenance during balancing phase, i.e.:<br>0 - no segments from servers in maintenance will be processed during balancing<br>5 - 50% segments from servers in maintenance<br>10 - 100% segments from servers in maintenance<br>By leveraging the priority an operator can prevent general nodes from overload or decrease maitenance time instead.|7|
+|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are "slow" nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, acceptable replication time and number of nodes. Value 1000 could be a start point for a rather big cluster. Default value is 0 (loading queue is unbounded) |0|
+|`decommissioningNodes`| List of 'decommissioning' historical servers. The Coordinator doesn't assign new segments to these servers and moves segments away from the 'decommissioning' servers at the maximum rate specified by `decommissioningVelocity`.|none|


I think it's also worth mentioning, either in this configuration description or in the description of decommissioningVelocity, that if decommissioningVelocity is 0 then Coordinator not only doesn't move segment away from 'decomissioning' servers per se but also abstains from making any balancing movements involving 'decomissioning' servers. In this case, 'decomissioning' nodes indeed are in a sort of "maintenance" mode, as per the former config naming.

I think the current descriptions don't make this clear enough.

Or, to put it more generally, Coordinator always abstains from movement decisions involving 'decomissioning' servers (other than moving segments away from 'decomissioning' to non-decomissioning; specifically, Coordinator abstains from making movement decisions between decomissioning servers and from active to decomissioning servers) and tries to move segments away from decomissioning servers to the limit imposed by decommissioningVelocity.

Ah, that is a good point and definitely worth documenting, thanks for the catch

The Coordinator doesn't assign new segments to these servers and moves segments away from the 'decommissioning' servers at the maximum rate specified by decommissioningVelocity.

It sounds odd that the servers are referred to as "these servers" and then "the 'decommissioning' servers" in the same sentence. The opposite should be better, I think.

leventov · 2019-03-05T22:36:58Z

server/src/test/java/org/apache/druid/server/coordinator/DruidCoordinatorBalancerTest.java

@@ -632,7 +632,7 @@ private DruidCoordinatorRuntimeParams setupParamsForMaintenancePriority(int prio

    mockCoordinator(coordinator);

-    // either maintenance servers list or general ones (ie servers list is [2] or [1, 3])
+    // either decommissioning servers list or general ones (ie servers list is [2] or [1, 3])


leventov · 2019-03-05T22:37:45Z

...r/src/test/java/org/apache/druid/server/coordinator/rules/BroadcastDistributionRuleTest.java

@@ -59,8 +59,8 @@
  private DataSegment smallSegment;
  private DruidCluster secondCluster;
  private ServerHolder generalServer;


"generalServer" -> "activeServer". Clint, please search code.

leventov · 2019-03-06T21:44:50Z

docs/content/configuration/index.md

-|`historicalNodesInMaintenance`| List of Historical nodes in maintenance mode. Coordinator doesn't assign new segments on those nodes and moves segments from the nodes according to a specified priority.|none|
-|`nodesInMaintenancePriority`| Priority of segments from servers in maintenance. Coordinator takes ceil(maxSegmentsToMove * (priority / 10)) from servers in maitenance during balancing phase, i.e.:<br>0 - no segments from servers in maintenance will be processed during balancing<br>5 - 50% segments from servers in maintenance<br>10 - 100% segments from servers in maintenance<br>By leveraging the priority an operator can prevent general nodes from overload or decrease maitenance time instead.|7|
+|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are "slow" nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, acceptable replication time and number of nodes. Value 1000 could be a start point for a rather big cluster. Default value is 0 (loading queue is unbounded) |0|
+|`decommissioningNodes`| List of 'decommissioning' historical servers. The Coordinator doesn't assign new segments to these servers and moves segments away from the 'decommissioning' servers at the maximum rate specified by `decommissioningVelocity`.|none|


The Coordinator doesn't assign new segments to these servers and moves segments away from the 'decommissioning' servers at the maximum rate specified by decommissioningVelocity.

It sounds odd that the servers are referred to as "these servers" and then "the 'decommissioning' servers" in the same sentence. The opposite should be better, I think.

leventov · 2019-03-06T22:03:19Z

docs/content/configuration/index.md

-|`nodesInMaintenancePriority`| Priority of segments from servers in maintenance. Coordinator takes ceil(maxSegmentsToMove * (priority / 10)) from servers in maitenance during balancing phase, i.e.:<br>0 - no segments from servers in maintenance will be processed during balancing<br>5 - 50% segments from servers in maintenance<br>10 - 100% segments from servers in maintenance<br>By leveraging the priority an operator can prevent general nodes from overload or decrease maitenance time instead.|7|
+|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are "slow" nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, acceptable replication time and number of nodes. Value 1000 could be a start point for a rather big cluster. Default value is 0 (loading queue is unbounded) |0|
+|`decommissioningNodes`| List of 'decommissioning' historical servers. The Coordinator doesn't assign new segments to these servers and moves segments away from the 'decommissioning' servers at the maximum rate specified by `decommissioningVelocity`.|none|
+|`decommissioningVelocity`| Decommissioning velocity determines the maximum number of segments that may be moved away from 'decommissioning' servers to non-decommissioning (that is, active) servers during one Coordinator's run. This value is relative to the total maximum segment movements allowed during one run which is determined by the `maxSegmentsToMove` configuration. Specifically, the maximum is `ceil(maxSegmentsToMove * (velocity / 10))`. For example, if `decommissioningVelocity` is 5, no more than `ceil(maxSegmentsToMove * 0.5)` segments may be moved away from 'decommissioning' servers. If `decommissioningVelocity` is 0, segments will neither be moved from _or to_ 'decommissioning' servers, effectively putting them in a sort of 'maintenance' mode that will not participate in balancing or assignment by load rules. Decommissioning can also become stalled if there are no available active servers to place the segments. By leveraging the velocity an operator can prevent active servers from overload by prioritizing balancing, or decrease decommissioning time instead. The value should be between 0 and 10.|7|


Sorry for asking you doing and re-doing renames and docs, but I think we should better use "percent" than "velocity".

The reason is that after fixing #7159, there should be a single cap, maxSegmentsToMove. There should also be a configuration parameter that specifies what percent of that movement cap may be spent (at maximum) on segment loading and dropping (this is currently specified by a separate config replicationThrottleLimit). For that (future) configuration, I want to use "percent", (for example, "minGuaranteedBalancingMovesPercent"), because I think that 10% step (as in "velocity") might not be precise enough, and because "velocity" is simply not the right term for this situation.

Now, there is an observation that moving segments away from decommissioning nodes looks very much like a temporary "drop" rule "for servers". For this reason, I want the configurations that specify min guaranteed balancing quota and max 'decommissioning' movement quota to use the same units.

The other reason why percent may be preferable is that we don't need to explain what that are with ceil, / 10 etc. Everybody knows what percent are. So it's less likely that users specify a wrong number because they misinterpret the units (e. g. they specify decommissioningVelocity=10 because they think that the velocity is actually expressed in percent. I. e. they wanted decommissioningVelocity=1).

Specifically, I suggest this configuration be called "maxPercentOfDecommissioningMoves". It doesn't follow the prefix principle. "decommisioningMaxPercentOfMoves" is probably also acceptable, but because of strange word order, it's less understandable.

I fully agree, I was considering this as well and think it is a lot more intuitive.

I'll suggest decommissioningMaxSegmentsToMovePercent:

keeps the decommissioning prefix

has MaxSegmentsToMove in the name, showing that the parameters are related

placement of Percent follows MaxSegmentsToMove to show what the percent applies to

Ah,decommissioningMaxSegmentsToMovePercent sounds good to me, will go with that.

I suggest "decommissioningMaxPercentInMaxSegmentsToMove", because there are two different maximums in play here:

The maximum percent: the specified percent is "maximum" because it doesn't need to be satisfied. E. g. if there are no decommissioning nodes the actual percentage will be zero.

The maximum segments to move.

would 'of' instead of 'in' be better, decommissioningMaxPercentOfMaxSegmentsToMove? Also, this seems a bit long, but then again it's not so much longer than decommissioningMaxSegmentsToMovePercent...

decommissioningMaxPercentOfMaxSegmentsToMove sounds fine to me, I'm not sure that there would be a good short name for the property

…cent and update docs

fjy · 2019-03-07T18:35:06Z

@leventov any more comments?

leventov · 2019-03-07T21:05:32Z

@fjy why do you repeatedly ping me in this issue less than 24 hours since my last response, don't you see that I respond in this issue every day anyway?

Although the discussion in the dev mailing list about the minimum poking period didn't arrive at any conclusion, nobody even proposed the poking period to be less than 2-3 working days.

fjy · 2019-03-07T21:15:39Z

@leventov this is the last issue remaining before we can do a release so we would appreciate that you help move the release along

leventov · 2019-03-07T21:06:47Z

docs/content/configuration/index.md

-  "historicalNodesInMaintenance": ["localhost:8182", "localhost:8282"],
-  "nodesInMaintenancePriority": 7
+  "decommissioningNodes": ["localhost:8182", "localhost:8282"],
+  "decommissioningMaxPercentOfMaxSegmentsToMove": 7


leventov · 2019-03-07T21:12:33Z

docs/content/configuration/index.md

-|`historicalNodesInMaintenance`| List of Historical nodes in maintenance mode. Coordinator doesn't assign new segments on those nodes and moves segments from the nodes according to a specified priority.|none|
-|`nodesInMaintenancePriority`| Priority of segments from servers in maintenance. Coordinator takes ceil(maxSegmentsToMove * (priority / 10)) from servers in maitenance during balancing phase, i.e.:<br>0 - no segments from servers in maintenance will be processed during balancing<br>5 - 50% segments from servers in maintenance<br>10 - 100% segments from servers in maintenance<br>By leveraging the priority an operator can prevent general nodes from overload or decrease maitenance time instead.|7|
+|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are "slow" nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, acceptable replication time and number of nodes. Value 1000 could be a start point for a rather big cluster. Default value is 0 (loading queue is unbounded) |0|
+|`decommissioningNodes`| List of historical servers to 'decommission'. Coordinator will not assign new segments to 'decomissioning' servers,  and segments will be moved away from them to be placed on 'active' servers at the maximum rate specified by `decommissioningMaxPercentOfMaxSegmentsToMove`.|none|


Two nits:

"servers, and" - double space

"active" is not a term in the same category as 'decommissioning', so I would not put it into quotes. Perhaps better to say non-decommissioning here.

Also typo: decommissioning

leventov · 2019-03-07T21:14:07Z

docs/content/configuration/index.md

-|`nodesInMaintenancePriority`| Priority of segments from servers in maintenance. Coordinator takes ceil(maxSegmentsToMove * (priority / 10)) from servers in maitenance during balancing phase, i.e.:<br>0 - no segments from servers in maintenance will be processed during balancing<br>5 - 50% segments from servers in maintenance<br>10 - 100% segments from servers in maintenance<br>By leveraging the priority an operator can prevent general nodes from overload or decrease maitenance time instead.|7|
+|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are "slow" nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, acceptable replication time and number of nodes. Value 1000 could be a start point for a rather big cluster. Default value is 0 (loading queue is unbounded) |0|
+|`decommissioningNodes`| List of historical servers to 'decommission'. Coordinator will not assign new segments to 'decomissioning' servers,  and segments will be moved away from them to be placed on 'active' servers at the maximum rate specified by `decommissioningMaxPercentOfMaxSegmentsToMove`.|none|
+|`decommissioningMaxPercentOfMaxSegmentsToMove`|  The maximum number of segments that may be moved away from 'decommissioning' servers to non-decommissioning (that is, active) servers during one Coordinator's run. This value is relative to the total maximum segment movements allowed during one run which is determined by `maxSegmentsToMove`. If `decommissioningMaxPercentOfMaxSegmentsToMove` is 0, segments will neither be moved from _or to_ 'decommissioning' servers, effectively putting them in a sort of 'maintenance' mode that will not participate in balancing or assignment by load rules. Decommissioning can also become stalled if there are no available active servers to place the segments. By leveraging decommissioning percent, an operator can prevent active servers from overload by prioritizing balancing, or decrease decommissioning time instead. The value should be between 0 and 100.|70|


I suggest the following (a few fixes highlighted in italics):

Determines the maximum number of segments that may be moved away from 'decommissioning' servers to non-decommissioning (that is, active) servers during one Coordinator's run. The maximum number of segment movements away from 'decommissioning' servers is relative to the total maximum segment movements allowed during one run which is determined by maxSegmentsToMove. If decommissioningMaxPercentOfMaxSegmentsToMove is 0, segments will neither be moved from or to 'decommissioning' servers, effectively putting them in a sort of "maintenance" mode that will not participate in balancing or assignment by load rules. Decommissioning can also become stalled if there are no available active servers to place the segments. By leveraging the maximum percent of decommissioning segment movements, an operator can prevent active servers from overload by prioritizing balancing, or decrease decommissioning time instead. The value should be between 0 and 100.

I think starting with The maximum number is pretty consistent with the other descriptions:

|`mergeBytesLimit`|The maximum total uncompressed size in bytes ... |`mergeSegmentsLimit`|The maximum number of segments that can be ... |`maxSegmentsToMove`|The maximum number of segments that can be ... |`replicantLifetime`|The maximum number of Coordinator runs for ... |`replicationThrottleLimit`|The maximum number of segments that ...

I also think that repeating the near same phrase in the 2nd sentence isn't really necessary, but agree with the other changes 👍

I don't see why this kind of consistency at the beginnings of the doc descriptions is a good thing.

You can reword it to start with "The maximum percent", but then the whole sentence should be changed.

The problem with "This value" is that the percent value is not relative to maxSegmentsToMove. The result of applying the percent to maxSegmentsToMove is relative to maxSegmentsToMove (quite obviously).

Hmm, yeah i see the problem, I think some things can be reworded. How about this, where I sort of combine the ideas of the first 2 sentences, and also throw in a reference to decommissioningNodes, maybe:

The percent of maxSegmentsToMove that determines the maximum number of segments that may be moved away from 'decommissioning' servers (specified by decommissioningNodes) to non-decommissioning servers during one Coordinator balancer run. If decommissioningMaxPercentOfMaxSegmentsToMove is 0, segments will neither be moved from or to 'decommissioning' servers, effectively putting them in a sort of "maintenance" mode that will not participate in balancing or assignment by load rules. Decommissioning can also become stalled if there are no available active servers to place the segments. By adjusting decommissioningMaxPercentOfMaxSegmentsToMove, an operator can prevent active servers from overload by prioritizing balancing, or decrease decommissioning time instead. The value should be between 0 and 100.

@clintropolis's proposed wording looks good to me.

👍 good wording.

leventov · 2019-03-07T21:20:13Z

server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java

-   * Historical nodes list in maintenance mode. Coordinator doesn't assign new segments on those nodes and moves
-   * segments from those nodes according to a specified priority.
+   * List of historical servers to 'decommission'. Coordinator will not assign new segments to 'decomissioning' servers,
+   * and segments will be moved away from them to be placed on 'active' servers at the maximum rate specified by


Nit: I think don't need quotes on active.

leventov · 2019-03-07T21:21:00Z

server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java

-   * 10 - 100% segments from servers in maintenance
-   * By leveraging the priority an operator can prevent general nodes from overload or decrease maitenance time
-   * instead.
+   * The maximum number of segments that may be moved away from 'decommissioning' servers to non-decommissioning


Suggested "Determines the maximum number ..."

leventov · 2019-03-07T21:24:34Z

server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java

+   * If `decommissioningMaxPercentOfMaxSegmentsToMove` is 0, segments will neither be moved from _or to_ 'decommissioning'
+   * servers, effectively putting them in a sort of 'maintenance' mode that will not participate in balancing or
+   * assignment by load rules. Decommissioning can also become stalled if there are no available active servers to place
+   * the segments. By leveraging decommissioning percent, an operator can prevent active servers from overload by


Suggested "By leveraging the maximum percent of decommissioning segment movements, ..."

leventov · 2019-03-07T21:25:32Z

server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java

+   * the segments. By leveraging decommissioning percent, an operator can prevent active servers from overload by
+   * prioritizing balancing, or decrease decommissioning time instead. The value should be between 0 and 100.
+   *
+   * @return number in range [0, 100]


I see we also use @Min() and @Max() annotations in the codebase, perhaps should use it here too.

leventov · 2019-03-07T21:28:44Z

server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java

@@ -231,32 +231,35 @@ public int getMaxSegmentsInNodeLoadingQueue()
  }

  /**
-   * Historical nodes list in maintenance mode. Coordinator doesn't assign new segments on those nodes and moves
-   * segments from those nodes according to a specified priority.
+   * List of historical servers to 'decommission'. Coordinator will not assign new segments to 'decomissioning' servers,


typo: decommissioning

leventov · 2019-03-07T21:29:00Z

server/src/test/java/org/apache/druid/server/coordinator/rules/LoadRuleTest.java

   */
  @Test
-  public void testLoadReplicaDuringMaitenance()
+  public void testLoadReplicaDuringDecomissioning()


typo: decommissioning

leventov · 2019-03-07T21:30:07Z

server/src/test/java/org/apache/druid/server/coordinator/DruidCoordinatorBalancerTest.java

    params = new DruidCoordinatorBalancerTester(coordinator).run(params);
    Assert.assertEquals(1L, params.getCoordinatorStats().getTieredStat("movedCount", "normal"));
    Assert.assertThat(peon3.getSegmentsToLoad(), is(equalTo(ImmutableSet.of(segment2))));
  }

  /**
-   * Should balance segments as usual (ignoring priority) with empty maintenanceList.
+   * Should balance segments as usual (ignoring percent) with empty decommissioningList.


Perhaps meant decommissioningNodes, not decommissioningList

leventov · 2019-03-07T21:43:19Z

@clintropolis could you please also add "meitenance" and "decomission" as prohibited regex patterns in Checkstyle?

gianm · 2019-03-08T00:18:12Z

@clintropolis could you please also add "meitenance" and "decomission" as prohibited regex patterns in Checkstyle?

This doesn't need to be done in this PR, IMO. Especially given that we want to backport this PR to 0.14.0, it should be as small as possible.

gianm · 2019-03-08T00:24:46Z

This doesn't need to be done in this PR, IMO. Especially given that we want to backport this PR to 0.14.0, it should be as small as possible.

Actually, I don't think it should be done at all in this PR, even if we weren't going to backport it. The exact approach to take to detect misspellings (or whether it's worth doing at all) may be controversial and should be hashed out on #7209.

fjy · 2019-03-08T01:50:47Z

👍 This is ready after tests pass

gianm

LGTM; looks like all comments have been addressed, will merge.

* rename maintenance mode to decommission * review changes * missed one * fix straggler, add doc about decommissioning stalling if no active servers * fix missed typo, docs * refine docs * doc changes, replace generals * add explicit comment to mention suppressed stats for balanceTier * rename decommissioningVelocity to decommissioningMaxSegmentsToMovePercent and update docs * fix precondition check * decommissioningMaxPercentOfMaxSegmentsToMove * fix test * fix test * fixes

rename maintenance mode to decommission

950b03f

clintropolis added this to the 0.14.0 milestone Feb 27, 2019

leventov reviewed Feb 28, 2019

View reviewed changes

leventov requested changes Feb 28, 2019

View reviewed changes

egor-ryashin requested changes Feb 28, 2019

View reviewed changes

review changes

ab31ac5

missed one

85f6ef6

leventov requested changes Feb 28, 2019

View reviewed changes

clintropolis added 2 commits February 28, 2019 15:18

fix straggler, add doc about decommissioning stalling if no active se…

5fd6ffe

…rvers

Merge remote-tracking branch 'upstream/master' into maintenance-mode-…

acfea61

…to-decommission

egor-ryashin approved these changes Mar 4, 2019

View reviewed changes

leventov requested changes Mar 4, 2019

View reviewed changes

clintropolis added 2 commits March 4, 2019 15:24

fix missed typo, docs

756ac62

refine docs

335a84a

leventov requested changes Mar 5, 2019

View reviewed changes

clintropolis added 2 commits March 5, 2019 16:09

doc changes, replace generals

fc54161

add explicit comment to mention suppressed stats for balanceTier

ce2555a

leventov requested changes Mar 6, 2019

View reviewed changes

clintropolis added 5 commits March 6, 2019 15:25

rename decommissioningVelocity to decommissioningMaxSegmentsToMovePer…

e9b1242

…cent and update docs

fix precondition check

1da2cb8

decommissioningMaxPercentOfMaxSegmentsToMove

5abb5da

fix test

4834037

fix test

4b5bd4c

leventov requested changes Mar 7, 2019

View reviewed changes

leventov mentioned this pull request Mar 7, 2019

Spell checking tool #7209

Open

fixes

be4daba

gianm approved these changes Mar 9, 2019

View reviewed changes

gianm merged commit a44df65 into apache:master Mar 9, 2019

jon-wei mentioned this pull request Mar 9, 2019

[Backport] rename maintenance mode to decommission (#7154) #7220

Merged

clintropolis deleted the maintenance-mode-to-decommission branch March 10, 2019 01:07

gianm mentioned this pull request Mar 11, 2019

0.14.0-incubating release notes #7126

Closed

rename maintenance mode to decommission #7154

rename maintenance mode to decommission #7154

Conversation

clintropolis commented Feb 27, 2019 • edited

leventov Feb 28, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leventov Feb 28, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clintropolis commented Feb 28, 2019

Choose a reason for hiding this comment

fjy commented Mar 2, 2019

egor-ryashin commented Mar 3, 2019

fjy commented Mar 4, 2019

clintropolis commented Mar 4, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fjy commented Mar 5, 2019

leventov Mar 5, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leventov Mar 6, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fjy commented Mar 7, 2019

leventov commented Mar 7, 2019 • edited

fjy commented Mar 7, 2019

Choose a reason for hiding this comment

leventov Mar 7, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clintropolis commented Feb 27, 2019 •

edited

leventov Feb 28, 2019 •

edited

leventov Feb 28, 2019 •

edited

leventov Mar 5, 2019 •

edited

leventov Mar 6, 2019 •

edited

leventov commented Mar 7, 2019 •

edited

leventov Mar 7, 2019 •

edited

leventov Mar 7, 2019 •

edited