maintenance mode for Historical #6349

egor-ryashin · 2018-09-19T18:50:23Z

With this feature, a Historical node can be put in maintenance mode, by adding it to historicalNodesInMaintenance in Druid Coordinator dynamic config. Coordinator will move segments from such nodes with a specified priority nodesInMaintenancePriority. That allows to move segments from nodes which will be replaced afterwards avoiding availability issues which could be caused by abrupt shutdown.
Issue #3247

This design suggests Coordinator as the container of the maintenance list which requires minimal changes to the current codebase (DruidCoordinatorBalancer, LoadRule).
An alternative design, where Historical nodes contain maintenance flag, will require, I expect,
double amount of code, as a separate thread will need to get that data from Historical nodes via an HTTP endpoint. Introducing the variable to DruidNode, seems awkward, as DruidNode is defined during service startup only, and changing this will require even greater amount of code and unit-test coverage.
So my design gives the same functionality for less amount of time. The second approach is actually an extension to my design and can be implemented at the next stage.
The initial design has only one flaw: if a Historical node is restarted the resource manager can assign a different port to the Historical node, which requires from the operator to update the maintenance list manually. This is quite rare but possible.

forbidden api fix, config deserialization fix logging fix, unit tests

himanshug · 2018-09-25T19:19:48Z

LGTM, its an useful feature.

have you tested this on a largish cluster with like (10%, 20%, 30%... ) nodes in maintenance ?

egor-ryashin · 2018-09-26T07:13:47Z

@himanshug not yet, I did local tests, I'm expecting to test it in production with a new release.

leventov · 2018-09-26T09:13:26Z

server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java

-      @JsonProperty("maxSegmentsInNodeLoadingQueue") int maxSegmentsInNodeLoadingQueue
+      @JsonProperty("maxSegmentsInNodeLoadingQueue") int maxSegmentsInNodeLoadingQueue,
+      @JsonProperty("maintenanceList") Object maintenanceList,
+      @JsonProperty("maintenanceModeSegmentsPriority") int maintenanceModeSegmentsPriority


Shouldn't it be nullable for backward compatibility? It's strange to see both the class and the builder with @JsonCreator

If JSON doesn't contain new fields then the object will have the empty list and ratio = 7, with the empty list Coordinator will ignore those parameters.

leventov · 2018-09-26T09:24:38Z

server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java

@@ -51,6 +51,8 @@
  private final boolean emitBalancingStats;
  private final boolean killAllDataSources;
  private final Set<String> killDataSourceWhitelist;
+  private final Set<String> maintenanceList;


Confusing that Set is called "list". Suggested to call it "hostsInMaintenanceMode" or like that.

Same with killDataSourceWhitelist and killPendingSegmentsSkipList. Old names could be preserved in the @JsonProperty annotations only. Additionally with these two fields, it's very unclear that data source names are stored in them.

Changed to historicalNodesInMaintenance

leventov · 2018-09-26T09:41:41Z

server/src/main/java/org/apache/druid/server/coordinator/ServerHolder.java

@@ -33,14 +33,22 @@
  private static final Logger log = new Logger(ServerHolder.class);
  private final ImmutableDruidServer server;
  private final LoadQueuePeon peon;
+  private final boolean maintenance;


Suggested "inMaintenanceMode". Also a comment or a reference to the PR/issue is appreciated, it might be very non-obvious for the reader what does "maintenance" mean here.

leventov · 2018-09-26T09:46:00Z

server/src/main/java/org/apache/druid/server/coordinator/helper/DruidCoordinatorBalancer.java


-    if (toMoveTo.size() <= 1) {
-      log.info("[%s]: One or fewer servers found.  Cannot balance.", tier);
+    if (general.size() == 0) {


Probably should add || (general.size() == 1 && maintenance.isEmpty())

leventov · 2018-09-26T09:47:24Z

server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java

@@ -95,6 +100,8 @@ public CoordinatorDynamicConfig(
    this.killDataSourceWhitelist = parseJsonStringOrArray(killDataSourceWhitelist);
    this.killPendingSegmentsSkipList = parseJsonStringOrArray(killPendingSegmentsSkipList);
    this.maxSegmentsInNodeLoadingQueue = maxSegmentsInNodeLoadingQueue;
+    this.maintenanceList = parseJsonStringOrArray(maintenanceList);
+    this.maintenanceModeSegmentsPriority = Math.min(10, Math.max(0, maintenanceModeSegmentsPriority));


I think it should rather check bounds than saturating, to avoid silent consumption of configuration errors

leventov · 2018-09-26T10:38:33Z

...r/src/test/java/org/apache/druid/server/coordinator/rules/BroadcastDistributionRuleTest.java

+                                     .withDruidCluster(secondCluster)
+                                     .withSegmentReplicantLookup(SegmentReplicantLookup.make(secondCluster))
+                                     .withBalancerReferenceTimestamp(DateTimes.of("2013-01-01"))
+                                     .withAvailableSegments(Lists.newArrayList(


Could add withAvailableSegments(DataSegment... segments) method for convenience.

I don't think we should extend a production class method list for test purposes only.

This is what should be done. Test writing convenience and succinctness is important. There are a lot of methods marked as @VisibleForTesting in production classes.

leventov · 2018-09-26T10:39:23Z

server/src/test/java/org/apache/druid/server/coordinator/rules/LoadRuleTest.java

+            .times(2);
+    EasyMock.replay(throttler, mockPeon, mockBalancerStrategy);
+
+    LoadRule rule = createLoadRule(ImmutableMap.of(


Unnecessary breakdown

leventov · 2018-09-26T10:39:59Z

server/src/test/java/org/apache/druid/server/coordinator/rules/LoadRuleTest.java

+
+    DruidCoordinatorRuntimeParams params = DruidCoordinatorRuntimeParams.newBuilder()
+                                                                       .withDruidCluster(druidCluster)
+                                                                       .withSegmentReplicantLookup(


Could break .newBuilder() on the next line to avoid this far right alignment

leventov · 2018-09-26T10:41:26Z

server/src/test/java/org/apache/druid/server/coordinator/rules/LoadRuleTest.java

@@ -760,4 +964,36 @@ private static LoadQueuePeon createLoadingPeon(List<DataSegment> segments)

    return mockPeon;
  }
+
+  private static int serverId = 0;


Suggested AtomicInteger just in case

leventov · 2018-09-26T10:42:02Z

server/src/test/java/org/apache/druid/server/http/CoordinatorDynamicConfigTest.java

  }

  @Test
  public void testBuilderDefaults()
  {

    CoordinatorDynamicConfig defaultConfig = CoordinatorDynamicConfig.builder().build();
-    assertConfig(defaultConfig, 900000, 524288000, 100, 5, 15, 10, 1, false, ImmutableSet.of(), false, 0);
+    assertConfig(defaultConfig, 900000, 524288000, 100, 5, 15, 10, 1, false, ImmutableSet.of(), false, 0, ImmutableSet.of(),
+                 7);


According to the Druid formatting rules, should break all or none arguments.

clintropolis · 2018-09-29T00:42:28Z

Neat idea! I'm just getting started to review, but was wondering if there is any reason for this be a part of the balancer (which is already reasonably complicated on it's own) instead of introducing a new coordinator runnable dedicated to migrating segments off of historicals which have been added to the list?

leventov · 2018-09-29T12:26:56Z

server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinatorRuntimeParams.java

@@ -346,6 +348,13 @@ public Builder withDataSources(Collection<ImmutableDruidDataSource> dataSourcesC
      return this;
    }

+    @VisibleForTesting
+    public Builder withAvailableSegments(DataSegment... availableSegmentsCollection)


Suggested just "availableSegments"

leventov · 2018-09-29T12:29:27Z

server/src/main/java/org/apache/druid/server/coordinator/rules/LoadRule.java

-    predicate = predicate.and(s -> !s.isMaintenance());
-
-    return queue.stream().filter(predicate).collect(Collectors.toList());
+    Predicate<ServerHolder> general = s -> !s.isInMaintenance();


Suggested "isGeneral" or "notInMaintenance"

isGeneral is too abstract, notInMaintenance - having a negation in naming is usually not a good idea.

isGeneral is the same as current general, just more conventional name for a Predicate object

ah, I got it ✅

leventov · 2018-09-29T12:30:59Z

server/src/main/java/org/apache/druid/server/coordinator/helper/DruidCoordinatorBalancer.java

@@ -105,14 +105,17 @@ private void balanceTier(
    }

    Map<Boolean, List<ServerHolder>> partitions = servers.stream()
-                                                      .collect(Collectors.partitioningBy(ServerHolder::isMaintenance));
+                                                      .collect(Collectors.partitioningBy(ServerHolder::isInMaintenance));


.collect() is not aligned with .stream(). Suggested to break the whole expression:

Map<Boolean, List<ServerHolder>> partitions = servers.stream().collect(Collectors.partitioningBy(ServerHolder::isInMaintenance));

leventov · 2018-09-29T12:33:21Z

server/src/test/java/org/apache/druid/server/coordinator/DruidCoordinatorBalancerTest.java

+
+  private static ImmutableMap<String, DataSegment> segmentsToMap(DataSegment... segments)
+  {
+    return ImmutableMap.copyOf(


Could be Maps.uniqueIndex(Arrays.asList(segments), DataSegment::getIdentifier)

leventov · 2018-09-29T12:34:19Z

server/src/test/java/org/apache/druid/server/coordinator/rules/LoadRuleTest.java

@@ -965,10 +935,10 @@ private static LoadQueuePeon createLoadingPeon(List<DataSegment> segments)
    return mockPeon;
  }

-  private static int serverId = 0;
+  private static AtomicInteger serverId = new AtomicInteger();


Please add final

leventov · 2018-09-29T12:37:06Z

server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java

-    this.maintenanceModeSegmentsPriority = Math.min(10, Math.max(0, maintenanceModeSegmentsPriority));
+    this.historicalNodesInMaintenance = parseJsonStringOrArray(historicalNodesInMaintenance);
+    Preconditions.checkArgument(
+        nodesInMaintenancePriority >= 0 && nodesInMaintenancePriority <= 10,


Does the priority of 0 make sense?

It allows to suppress maintenance for a time without clearing the maintenance list. Segments won't be loaded to a maintenance cluster, but won't be moved from there, that could help if some general servers get overloaded.

leventov · 2018-09-29T12:38:36Z

server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java

@@ -100,8 +101,12 @@ public CoordinatorDynamicConfig(
    this.killDataSourceWhitelist = parseJsonStringOrArray(killDataSourceWhitelist);


Could you please rename killDataSourceWhitelist and killPendingSegmentsSkipList parameters, fields and getters to not mention "list" and the latter to mention "dataSource" (it's actual elements), while JSON property names could stay the same for compatibility.

I'm going to change killDataSourceWhitelist -> killableDatasources and killPendingSegmentsSkipList -> protectedPendingSegmentDatasources. Makes sense?

"DataSources". Some explanation of what do those variables mean would also be nice.

That is actually documented in docs http://druid.io/docs/latest/configuration/index.html

I could copy from there, but, I think, it would look redundantly.

Source code should be understandable and self-contained on it's own. For somebody to understand it now, it's required to find this doc and then find how the mentioned property is mapped backed to some source code concepts, that's very difficult and time-consuming when somebody just wants to read and understand code.

Also see #4861

leventov · 2018-10-09T17:24:35Z

server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java

@@ -192,10 +195,14 @@ public boolean isKillAllDataSources()
    return killAllDataSources;
  }

+  /**
+   * List of dataSources for which pendingSegments are NOT cleaned up
+   * if property druid.coordinator.kill.pendingSegments.on is true.


Could you please reference druid.coordinator.kill.pendingSegments.on in source code terms? Like a boolean field in some class?

There is no specific variable, it conditionally binds the class, as I understand

ConditionalMultibind.create( properties, binder, DruidCoordinatorHelper.class, CoordinatorIndexingServiceHelper.class ).addConditionBinding( "druid.coordinator.merge.on", Predicates.equalTo("true"), DruidCoordinatorSegmentMerger.class ).addConditionBinding( "druid.coordinator.conversion.on", Predicates.equalTo("true"), DruidCoordinatorVersionConverter.class ).addConditionBinding( "druid.coordinator.kill.on", Predicates.equalTo("true"), DruidCoordinatorSegmentKiller.class ).addConditionBinding( "druid.coordinator.kill.pendingSegments.on", Predicates.equalTo("true"),

Then it could say "list of data sources skipped in {@link DruidCoordinatorCleanupPendingSegments}"

leventov · 2018-10-09T17:24:44Z

server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java

-  @JsonProperty
-  public Set<String> getKillDataSourceWhitelist()
+  /**
+   * List of dataSources for which kill tasks are sent if property druid.coordinator.kill.on is true.


server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java

drcrallen · 2018-10-10T14:42:36Z

server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java

+   * By leveraging the priority an operator can prevent general nodes from overload or decrease maitenance time
+   * instead.
+   *
+   * @return number in range [0, 10]


why not just 1-100 if it represents a %?

The initial term was a priority, which is usually not a big number (eg, Java thread priority).
Percentage term emerged during writing that description. Moreover, I don't expect someone would like to define 65%, 75% or even 67%, 23% and so on, usually 10, 20, ..., 80 is good enough, and then adding 0 every time seems redudant. Meanwhile a priority term allows to be more concise.

drcrallen · 2018-10-10T15:59:09Z

I'm reviewing this

drcrallen

This is a cool feature.

Can you please add some docs in the code and some in the README for the new configs.

Also, can you please comment in the master comment in this PR on the design tradeoff of having the coordinator be the central point for controlling if a node is under maintenance instead of having the historical nodes themselves store their "I'm in maintenance" state

drcrallen · 2018-10-10T15:58:58Z

server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java

@@ -92,11 +98,17 @@ public CoordinatorDynamicConfig(
    this.balancerComputeThreads = Math.max(balancerComputeThreads, 1);
    this.emitBalancingStats = emitBalancingStats;
    this.killAllDataSources = killAllDataSources;
-    this.killDataSourceWhitelist = parseJsonStringOrArray(killDataSourceWhitelist);
-    this.killPendingSegmentsSkipList = parseJsonStringOrArray(killPendingSegmentsSkipList);
+    this.killableDatasources = parseJsonStringOrArray(killableDatasources);


Any idea why this is using a function in the constructor instead of a jackson parser?

I assume, so that it can handle "killableDatasources": ["a","b"] as well as "killableDatasources": "a, b"

drcrallen · 2018-10-10T16:00:51Z

server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java


-    if (this.killAllDataSources && !this.killDataSourceWhitelist.isEmpty()) {
+    if (this.killAllDataSources && !this.killableDatasources.isEmpty()) {


can you retain the WhiteList descriptor here?

I asked to change this here: #6349 (comment)

@egor-ryashin could you please rename to killableDataSources that is more conventional spelling

server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java

drcrallen · 2018-10-10T16:02:30Z

server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java

+   * @return list of host:port entries
+   */
+  @JsonProperty
+  public Set<String> getHistoricalNodesInMaintenance()


if these are host and port entries, can they use HostAndPort objects?

It's possible, but it works intensively with DruidServerMetadata which contains simple String, and HostAndPort doesn't provide efficient toString():

/** Rebuild the host:port string, including brackets if necessary. */ @Override public String toString() { StringBuilder builder = new StringBuilder(host.length() + 7); if (host.indexOf(':') >= 0) { builder.append('[').append(host).append(']'); } else { builder.append(host); } if (hasPort()) { builder.append(':').append(port); } return builder.toString(); }

drcrallen · 2018-10-10T16:10:10Z

server/src/main/java/org/apache/druid/server/coordinator/ServerHolder.java

+    this(server, peon, false);
+  }
+
+  public ServerHolder(ImmutableDruidServer server, LoadQueuePeon peon, boolean inMaintenance)


Looking forward, how is in maintenance different than blacklist?

For example, lets say a historical is able to detect that it is unhealthy, and it wants to shed its load in as safe of a way as possible, so it declares its desire to have its load shed to other nodes. Would inMaintenance be the proper kind of thing for that?

Or if another problem arrises where a node has fractional problems, like it can only operate at about 50% of where it should. Is it proper to have the node be inMaintenance, or would a "fractional operating level" be appropriate here?

What I'm curious of is if it makes sense for the initial implementation to have an int here which represents an arbitrary "weight capacity" for the node, and "in maintenance" simply sets that weight capacity to 0.

Such a thing would open up other methodologies of shedding load in the future, or allow coordinator rules to take a "weight" into account for a server, where "weight" is intended to be dynamic and not a static value for the server.

For example, lets say a historical is able to detect that it is unhealthy, and it wants to shed its load in as safe of a way as possible, so it declares its desire to have its load shed to other nodes. Would inMaintenance be the proper kind of thing for that?
Or if another problem arrises where a node has fractional problems, like it can only operate at about 50% of where it should. Is it proper to have the node be inMaintenance, or would a "fractional operating level" be appropriate here?

Well, I would made 2 separate types like inMaintenance, Unhealthy to make it clear for an operator, otherwise it could be confusing to see a manual list extending itself at random. Also, we need to define detection algorithm for Unhealty status.

What I'm curious of is if it makes sense for the initial implementation to have an int here which represents an arbitrary "weight capacity" for the node, and "in maintenance" simply sets that weight capacity to 0.

Yes, inMaintenance, Unhealthy can carry a weight number that way. I assume, priority for each type could be different too, so an operator can have manually defined nodes to be drained faster, for example.

drcrallen · 2018-10-10T16:15:20Z

server/src/main/java/org/apache/druid/server/coordinator/helper/DruidCoordinatorBalancer.java

+    Map<Boolean, List<ServerHolder>> partitions =
+        servers.stream().collect(Collectors.partitioningBy(ServerHolder::isInMaintenance));
+    final List<ServerHolder> maintenance = partitions.get(true);
+    final List<ServerHolder> general = partitions.get(false);


suggest availableServers or similar

drcrallen · 2018-10-10T16:16:25Z

server/src/main/java/org/apache/druid/server/coordinator/helper/DruidCoordinatorBalancer.java

+    int maintenanceSegmentsToMove = (int) Math.ceil(maxSegmentsToMove * priority / 10.0);
+    log.info("Processing %d segments from servers in maintenance mode", maintenanceSegmentsToMove);
+    Pair<Integer, Integer> maintenanceResult = balanceServers(params, maintenance, general, maintenanceSegmentsToMove);
+    int generalSegmentsToMove = maxSegmentsToMove - maintenanceResult.lhs;


this also posted to a weird line, looks like my comments are off

suggest adding the max prefix here

drcrallen · 2018-10-10T16:16:48Z

server/src/main/java/org/apache/druid/server/coordinator/helper/DruidCoordinatorBalancer.java

    final int maxSegmentsToMove = Math.min(params.getCoordinatorDynamicConfig().getMaxSegmentsToMove(), numSegments);
+    int priority = params.getCoordinatorDynamicConfig().getNodesInMaintenancePriority();
+    int maintenanceSegmentsToMove = (int) Math.ceil(maxSegmentsToMove * priority / 10.0);


I mean, changing the name would help make it clearer

but I think I missed the comment by a line :-P

You mean changing maintenanceSegmentsToMove -> maxMaintenanceSegmentsToMove?

drcrallen · 2018-10-10T16:23:13Z

server/src/main/java/org/apache/druid/server/coordinator/rules/LoadRule.java

  )
  {
    final NavigableSet<ServerHolder> queue = druidCluster.getHistoricalsByTier(tier);
    if (queue == null) {
      log.makeAlert("Tier[%s] has no servers! Check your cluster configuration!", tier).emit();
      return Collections.emptyList();
    }
-
-    return queue.stream().filter(predicate).collect(Collectors.toList());
+    Predicate<ServerHolder> isGeneral = s -> !s.isInMaintenance();


suggest changing name to isNotInMaintenance

Why not isAvailable or something along those lines since the implication is that these servers are available to assign segments to, and to go along with your other variable rename suggestion in balancer logic

IMHO since this is specifically around maintenance then keeping the boolean methods related to the maintenance.

That being said, I'm curious if this predicate is in the wrong place. Should it just be part of the passed in predicate, and leave no "extra" or surprising predicate logic in this method?

getFilteredHolders is always called to find servers which can be a target location for a segment, I think it should never return maintenance servers.
Changed to isNotInMaintenance ✅

drcrallen · 2018-10-10T16:23:43Z

server/src/main/java/org/apache/druid/server/http/CoordinatorDynamicConfigsResource.java

      return Response.status(Response.Status.BAD_REQUEST)
-                     .entity(ImmutableMap.of("error", setResult.getException()))
+                     .entity(ImmutableMap.<String, Object>of("error", e.getMessage()))


I think there's a Jetty util for cleaning the exceptions floating around somewhere

I suppose it's ServletResourceUtils.jsonize?

ServletResourceUtils.sanitizeException I believe

fjy · 2018-10-10T18:43:04Z

@leventov Given this PR still requires quite a bit of review and testing, I think we should slate it for 0.13.1

clintropolis

Overall LGTM 🤘

I agree with @drcrallen on variable rename suggestions to make code a little clearer

I also highly recommend updating the logic in the computeCost method of org.apache.druid.server.coordinator.CostBalancerStrategy to take into account if a server is in maintenance and return Double.POSITIVE_INFINITY for the cost of that server, as a failsafe in case anything that is calling into the balancer to pick a server omits to do this check externally.

clintropolis · 2018-10-10T19:08:57Z

server/src/main/java/org/apache/druid/server/coordinator/rules/LoadRule.java

  )
  {
    final NavigableSet<ServerHolder> queue = druidCluster.getHistoricalsByTier(tier);
    if (queue == null) {
      log.makeAlert("Tier[%s] has no servers! Check your cluster configuration!", tier).emit();
      return Collections.emptyList();
    }
-
-    return queue.stream().filter(predicate).collect(Collectors.toList());
+    Predicate<ServerHolder> isGeneral = s -> !s.isInMaintenance();


Why not isAvailable or something along those lines since the implication is that these servers are available to assign segments to, and to go along with your other variable rename suggestion in balancer logic

dclim · 2018-10-11T17:49:19Z

@egor-ryashin will you have time to address @drcrallen 's comments in the next few hours?

egor-ryashin · 2018-10-11T18:03:45Z

@dclim it's unclear, I'm confused by some comments, I'm waiting for a reply.

dclim · 2018-10-12T20:08:48Z

Okay, I'm starting the release process for 0.13.0 now so we can keep the 0.13.1 milestone for this one.

egor-ryashin · 2019-01-10T11:23:05Z

@clintropolis @drcrallen I wonder if I answered all your questions?

clintropolis · 2019-01-10T19:31:43Z

Oops, will have a look today

clintropolis

Very sorry for the delay! I've had another look and I'm good with this patch after conflicts are fixed and travis passes 👍

egor-ryashin · 2019-01-15T12:25:01Z

@clintropolis I've resolved merge conflicts

clintropolis · 2019-01-15T22:17:30Z

👍, @drcrallen do you have any more comments on this PR?

glasser · 2019-01-15T22:33:43Z

This seems like a great feature to use for us for implementing k8s node upgrades (we use local disks on our k8s machines for historicals, but sometimes do need to upgrade node pools). We would set maintenance mode on one machine at a time, wait for its segments to be fully replicated, then replace its machine, and move on to the next one.

In order to do this automatically I'm curious what the impact on metrics/API is. All it does is cause the coordinator to try to move segments around: it doesn't affect the response to metrics or the sys SQL entries, right? So I could wait until the historical's segment/count metric goes to zero, or until the SQL query {"query":"SELECT count(*) AS segments FROM sys.server_segments WHERE server = '10.48.27.16:8083'"} goes to 0?

egor-ryashin · 2019-01-18T14:03:02Z

@glasser

So I could wait until the historical's segment/count metric goes to zero, or until the SQL query {"query":"SELECT count(*) AS segments FROM sys.server_segments WHERE server = '10.48.27.16:8083'"} goes to 0?

Exactly.

clintropolis · 2019-02-04T17:11:43Z

@egor-ryashin sorry for the delay getting this merged, could you fix up conflicts again? @drcrallen any more comments on this PR? I think it would be nice to get in 0.14.

egor-ryashin · 2019-02-04T22:20:47Z

@clintropolis Resolved.

jon-wei · 2019-02-05T02:09:38Z

Going to go ahead and merge this for inclusion in 0.14.0 release since it already has 2 approvals

glasser · 2019-02-05T07:15:52Z

One thing that's slightly awkward about storing this list as an array in the dynamic config: you can't atomically add or remove an element from this list. If implementing something that uses this feature dynamically (say from a k8s historical's pre-stop hook) it would be nice if they could send a single request that is guaranteed to add them to the list without accidentally removing something else from the list.

Do you think it would be reasonable to add an optional query param to the /druid/coordinator/v1/config POST handler to allow you to specify the expected previous contents, so you can do a compare-and-swap write? One caveat is that you'd need to be sure that the value you sent is exactly the value stored in the DB, not corrupted by being round-tripped through Jackson...

Another option would be to add a specific HTTP endpoint to add an element to a list in the dynamic config. (There'd still need to be a SQL-level compare-and-swap.)

clintropolis · 2019-02-08T22:26:35Z

@glasser I think that would be a useful follow up, I like the 2nd approach better myself since it could be done without first fetching the config and presumably you already know the host you want to add to the list. It would also lend itself to something like adding an http endpoint directly to the historical servers that could be used to have them add themselves to the maintenance list, similar in spirit to what /druid/worker/v1/enable and /druid/worker/v1/disable provide for middle managers.

egor-ryashin · 2019-02-09T11:40:17Z

I have no argument as the PR is supposed to be an MVP. I plan to improve it after I thoroughly test it in production environment at least.

leventov · 2019-02-20T13:53:06Z

server/src/main/java/org/apache/druid/server/coordinator/rules/BroadcastDistributionRule.java

@@ -46,8 +46,9 @@ public CoordinatorStats run(DruidCoordinator coordinator, DruidCoordinatorRuntim
    } else {
      params.getDruidCluster().getAllServers().forEach(
          eachHolder -> {
-            if (colocatedDataSources.stream()
-                                    .anyMatch(source -> eachHolder.getServer().getDataSource(source) != null)) {
+            if (!eachHolder.isInMaintenance()


Shouldn't this !eachHolder.isInMaintenance() condition be in an outer if, rather than in this if-else chain? As it is written, if a node in the maintenance mode, the coordinator will delete segments from it rather than leave the node alone.

leventov · 2019-02-20T14:16:13Z

In retrospect, I think the "maintenance" term is too vague and somewhat misleading. It suggests that a node is (perhaps temporarily) "conserved", that is, the coordinator doesn't touch it, forever or for some time. I think something like "shuttingDown" or "windingDown" would be better. @egor-ryashin what do you think?

glasser · 2019-02-20T15:24:44Z

"draining"?

leventov · 2019-02-20T22:35:20Z

I don't quite like "shuttingDown" because of false associations with Java's ExecutorService.shutdown/shutdownNow. "draining" sounds good.

egor-ryashin · 2019-02-25T23:58:01Z

@leventov
"conserved" is a bit unusual for me as I initially worked on issue specification which was based on "maintenance" term. Besides Mesos also uses "maintenance" term for the same purpose (consequently Marathon, Spark). That term is more popular, I think we should stick to it too.

leventov · 2019-02-26T22:10:28Z

I think clarity and avoiding developer confusion (I raised this issue after the term confused me when I was working on code and I spent a fair amount of time trying to understand why "maintenance" nodes are not in fact conserved) trumps the fact that the term is already adopted in some other projects for something similar. Are you also sure that it was not a naming mistake in that projects?

I've raised #7148 regarding this.

maintenance mode for Historical

a319ee4

forbidden api fix, config deserialization fix logging fix, unit tests

leventov self-requested a review September 25, 2018 22:01

leventov requested changes Sep 26, 2018

View reviewed changes

leventov added Area - Segment Balancing/Coordination Area - Operations labels Sep 26, 2018

addressed comments

f0dda80

leventov reviewed Sep 29, 2018

View reviewed changes

egor-ryashin added 2 commits October 9, 2018 19:43

addressed comments

6faf498

a style fix

68f74d6

leventov reviewed Oct 9, 2018

View reviewed changes

addressed comments

ba14340

leventov approved these changes Oct 9, 2018

View reviewed changes

leventov added this to the 0.13.0 milestone Oct 10, 2018

drcrallen reviewed Oct 10, 2018

View reviewed changes

server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java Show resolved Hide resolved

drcrallen reviewed Oct 10, 2018

View reviewed changes

a unit-test fix due to recent code-refactoring

b76521a

drcrallen requested changes Oct 10, 2018

View reviewed changes

fjy modified the milestones: 0.13.0, 0.13.1 Oct 10, 2018

clintropolis requested changes Oct 10, 2018

View reviewed changes

docs & refactoring

ab945d9

clintropolis reviewed Jan 11, 2019

View reviewed changes

egor-ryashin added 2 commits January 14, 2019 22:54

Merge remote-tracking branch 'upstream/master' into feature-3247

6be37fc

post merge cleaning up

a28deb1

clintropolis approved these changes Jan 15, 2019

View reviewed changes

Resolved conflicts

05d32b1

jon-wei merged commit 97b6407 into apache:master Feb 5, 2019

gianm mentioned this pull request Feb 7, 2019

Ability to put historical nodes into maintenance modes - Do not count towards replication #3247

Closed

leventov reviewed Feb 20, 2019

View reviewed changes

leventov deleted the feature-3247 branch February 20, 2019 14:08

jon-wei mentioned this pull request Feb 22, 2019

0.14.0-incubating release notes #7126

Closed

leventov mentioned this pull request Feb 26, 2019

The term for "maintenance" nodes #7148

Closed

leventov mentioned this pull request Feb 28, 2019

rename maintenance mode to decommission #7154

Merged

leventov mentioned this pull request Mar 11, 2019

Add the pull-request template #7206

Merged

		@@ -100,8 +101,12 @@ public CoordinatorDynamicConfig(
		this.killDataSourceWhitelist = parseJsonStringOrArray(killDataSourceWhitelist);


		if (this.killAllDataSources && !this.killDataSourceWhitelist.isEmpty()) {
		if (this.killAllDataSources && !this.killableDatasources.isEmpty()) {

maintenance mode for Historical #6349

maintenance mode for Historical #6349

Conversation

egor-ryashin commented Sep 19, 2018 • edited Loading

himanshug commented Sep 25, 2018

egor-ryashin commented Sep 26, 2018

Choose a reason for hiding this comment

egor-ryashin Sep 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

egor-ryashin Sep 28, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leventov Sep 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clintropolis commented Sep 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

egor-ryashin Oct 4, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leventov Oct 9, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drcrallen commented Oct 10, 2018

drcrallen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

egor-ryashin Oct 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

egor-ryashin commented Sep 19, 2018 •

edited

Loading

egor-ryashin Sep 26, 2018 •

edited

Loading

egor-ryashin Sep 28, 2018 •

edited

Loading

leventov Sep 26, 2018 •

edited

Loading

egor-ryashin Oct 4, 2018 •

edited

Loading

leventov Oct 9, 2018 •

edited

Loading

egor-ryashin Oct 10, 2018 •

edited

Loading