-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
maintenance mode for Historical #6349
Conversation
forbidden api fix, config deserialization fix logging fix, unit tests
LGTM, its an useful feature. have you tested this on a largish cluster with like (10%, 20%, 30%... ) nodes in maintenance ? |
@himanshug not yet, I did local tests, I'm expecting to test it in production with a new release. |
@JsonProperty("maxSegmentsInNodeLoadingQueue") int maxSegmentsInNodeLoadingQueue | ||
@JsonProperty("maxSegmentsInNodeLoadingQueue") int maxSegmentsInNodeLoadingQueue, | ||
@JsonProperty("maintenanceList") Object maintenanceList, | ||
@JsonProperty("maintenanceModeSegmentsPriority") int maintenanceModeSegmentsPriority |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be nullable for backward compatibility? It's strange to see both the class and the builder with @JsonCreator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If JSON doesn't contain new fields then the object will have the empty list and ratio = 7, with the empty list Coordinator will ignore those parameters.
@@ -51,6 +51,8 @@ | |||
private final boolean emitBalancingStats; | |||
private final boolean killAllDataSources; | |||
private final Set<String> killDataSourceWhitelist; | |||
private final Set<String> maintenanceList; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confusing that Set
is called "list". Suggested to call it "hostsInMaintenanceMode" or like that.
Same with killDataSourceWhitelist
and killPendingSegmentsSkipList
. Old names could be preserved in the @JsonProperty
annotations only. Additionally with these two fields, it's very unclear that data source names are stored in them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to historicalNodesInMaintenance
@@ -33,14 +33,22 @@ | |||
private static final Logger log = new Logger(ServerHolder.class); | |||
private final ImmutableDruidServer server; | |||
private final LoadQueuePeon peon; | |||
private final boolean maintenance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested "inMaintenanceMode". Also a comment or a reference to the PR/issue is appreciated, it might be very non-obvious for the reader what does "maintenance" mean here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
|
||
if (toMoveTo.size() <= 1) { | ||
log.info("[%s]: One or fewer servers found. Cannot balance.", tier); | ||
if (general.size() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should add || (general.size() == 1 && maintenance.isEmpty())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
@@ -95,6 +100,8 @@ public CoordinatorDynamicConfig( | |||
this.killDataSourceWhitelist = parseJsonStringOrArray(killDataSourceWhitelist); | |||
this.killPendingSegmentsSkipList = parseJsonStringOrArray(killPendingSegmentsSkipList); | |||
this.maxSegmentsInNodeLoadingQueue = maxSegmentsInNodeLoadingQueue; | |||
this.maintenanceList = parseJsonStringOrArray(maintenanceList); | |||
this.maintenanceModeSegmentsPriority = Math.min(10, Math.max(0, maintenanceModeSegmentsPriority)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should rather check bounds than saturating, to avoid silent consumption of configuration errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
.withDruidCluster(secondCluster) | ||
.withSegmentReplicantLookup(SegmentReplicantLookup.make(secondCluster)) | ||
.withBalancerReferenceTimestamp(DateTimes.of("2013-01-01")) | ||
.withAvailableSegments(Lists.newArrayList( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add withAvailableSegments(DataSegment... segments)
method for convenience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should extend a production class method list for test purposes only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what should be done. Test writing convenience and succinctness is important. There are a lot of methods marked as @VisibleForTesting
in production classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
.times(2); | ||
EasyMock.replay(throttler, mockPeon, mockBalancerStrategy); | ||
|
||
LoadRule rule = createLoadRule(ImmutableMap.of( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary breakdown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
|
||
DruidCoordinatorRuntimeParams params = DruidCoordinatorRuntimeParams.newBuilder() | ||
.withDruidCluster(druidCluster) | ||
.withSegmentReplicantLookup( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could break .newBuilder()
on the next line to avoid this far right alignment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
@@ -760,4 +964,36 @@ private static LoadQueuePeon createLoadingPeon(List<DataSegment> segments) | |||
|
|||
return mockPeon; | |||
} | |||
|
|||
private static int serverId = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested AtomicInteger just in case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
} | ||
|
||
@Test | ||
public void testBuilderDefaults() | ||
{ | ||
|
||
CoordinatorDynamicConfig defaultConfig = CoordinatorDynamicConfig.builder().build(); | ||
assertConfig(defaultConfig, 900000, 524288000, 100, 5, 15, 10, 1, false, ImmutableSet.of(), false, 0); | ||
assertConfig(defaultConfig, 900000, 524288000, 100, 5, 15, 10, 1, false, ImmutableSet.of(), false, 0, ImmutableSet.of(), | ||
7); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the Druid formatting rules, should break all or none arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
Neat idea! I'm just getting started to review, but was wondering if there is any reason for this be a part of the balancer (which is already reasonably complicated on it's own) instead of introducing a new coordinator runnable dedicated to migrating segments off of historicals which have been added to the list? |
@@ -346,6 +348,13 @@ public Builder withDataSources(Collection<ImmutableDruidDataSource> dataSourcesC | |||
return this; | |||
} | |||
|
|||
@VisibleForTesting | |||
public Builder withAvailableSegments(DataSegment... availableSegmentsCollection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested just "availableSegments"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
predicate = predicate.and(s -> !s.isMaintenance()); | ||
|
||
return queue.stream().filter(predicate).collect(Collectors.toList()); | ||
Predicate<ServerHolder> general = s -> !s.isInMaintenance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested "isGeneral" or "notInMaintenance"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isGeneral
is too abstract, notInMaintenance
- having a negation in naming is usually not a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isGeneral is the same as current general, just more conventional name for a Predicate object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I got it ✅
@@ -105,14 +105,17 @@ private void balanceTier( | |||
} | |||
|
|||
Map<Boolean, List<ServerHolder>> partitions = servers.stream() | |||
.collect(Collectors.partitioningBy(ServerHolder::isMaintenance)); | |||
.collect(Collectors.partitioningBy(ServerHolder::isInMaintenance)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.collect()
is not aligned with .stream()
. Suggested to break the whole expression:
Map<Boolean, List<ServerHolder>> partitions =
servers.stream().collect(Collectors.partitioningBy(ServerHolder::isInMaintenance));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
|
||
private static ImmutableMap<String, DataSegment> segmentsToMap(DataSegment... segments) | ||
{ | ||
return ImmutableMap.copyOf( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be Maps.uniqueIndex(Arrays.asList(segments), DataSegment::getIdentifier)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
@@ -965,10 +935,10 @@ private static LoadQueuePeon createLoadingPeon(List<DataSegment> segments) | |||
return mockPeon; | |||
} | |||
|
|||
private static int serverId = 0; | |||
private static AtomicInteger serverId = new AtomicInteger(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
this.maintenanceModeSegmentsPriority = Math.min(10, Math.max(0, maintenanceModeSegmentsPriority)); | ||
this.historicalNodesInMaintenance = parseJsonStringOrArray(historicalNodesInMaintenance); | ||
Preconditions.checkArgument( | ||
nodesInMaintenancePriority >= 0 && nodesInMaintenancePriority <= 10, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the priority of 0 make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It allows to suppress maintenance for a time without clearing the maintenance list. Segments won't be loaded to a maintenance cluster, but won't be moved from there, that could help if some general servers get overloaded.
@@ -100,8 +101,12 @@ public CoordinatorDynamicConfig( | |||
this.killDataSourceWhitelist = parseJsonStringOrArray(killDataSourceWhitelist); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please rename killDataSourceWhitelist
and killPendingSegmentsSkipList
parameters, fields and getters to not mention "list" and the latter to mention "dataSource" (it's actual elements), while JSON property names could stay the same for compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to change killDataSourceWhitelist
-> killableDatasources
and killPendingSegmentsSkipList
-> protectedPendingSegmentDatasources
. Makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"DataSources". Some explanation of what do those variables mean would also be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is actually documented in docs http://druid.io/docs/latest/configuration/index.html
I could copy from there, but, I think, it would look redundantly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Source code should be understandable and self-contained on it's own. For somebody to understand it now, it's required to find this doc and then find how the mentioned property is mapped backed to some source code concepts, that's very difficult and time-consuming when somebody just wants to read and understand code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also see #4861
@@ -192,10 +195,14 @@ public boolean isKillAllDataSources() | |||
return killAllDataSources; | |||
} | |||
|
|||
/** | |||
* List of dataSources for which pendingSegments are NOT cleaned up | |||
* if property druid.coordinator.kill.pendingSegments.on is true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please reference druid.coordinator.kill.pendingSegments.on
in source code terms? Like a boolean field in some class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no specific variable, it conditionally binds the class, as I understand
ConditionalMultibind.create(
properties,
binder,
DruidCoordinatorHelper.class,
CoordinatorIndexingServiceHelper.class
).addConditionBinding(
"druid.coordinator.merge.on",
Predicates.equalTo("true"),
DruidCoordinatorSegmentMerger.class
).addConditionBinding(
"druid.coordinator.conversion.on",
Predicates.equalTo("true"),
DruidCoordinatorVersionConverter.class
).addConditionBinding(
"druid.coordinator.kill.on",
Predicates.equalTo("true"),
DruidCoordinatorSegmentKiller.class
).addConditionBinding(
"druid.coordinator.kill.pendingSegments.on",
Predicates.equalTo("true"),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it could say "list of data sources skipped in {@link DruidCoordinatorCleanupPendingSegments}
"
@JsonProperty | ||
public Set<String> getKillDataSourceWhitelist() | ||
/** | ||
* List of dataSources for which kill tasks are sent if property druid.coordinator.kill.on is true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java
Show resolved
Hide resolved
* By leveraging the priority an operator can prevent general nodes from overload or decrease maitenance time | ||
* instead. | ||
* | ||
* @return number in range [0, 10] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just 1-100 if it represents a %?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial term was a priority, which is usually not a big number (eg, Java thread priority).
Percentage term emerged during writing that description. Moreover, I don't expect someone would like to define 65%, 75% or even 67%, 23% and so on, usually 10, 20, ..., 80 is good enough, and then adding 0 every time seems redudant. Meanwhile a priority term allows to be more concise.
I'm reviewing this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a cool feature.
Can you please add some docs in the code and some in the README for the new configs.
Also, can you please comment in the master comment in this PR on the design tradeoff of having the coordinator be the central point for controlling if a node is under maintenance instead of having the historical nodes themselves store their "I'm in maintenance" state
@@ -92,11 +98,17 @@ public CoordinatorDynamicConfig( | |||
this.balancerComputeThreads = Math.max(balancerComputeThreads, 1); | |||
this.emitBalancingStats = emitBalancingStats; | |||
this.killAllDataSources = killAllDataSources; | |||
this.killDataSourceWhitelist = parseJsonStringOrArray(killDataSourceWhitelist); | |||
this.killPendingSegmentsSkipList = parseJsonStringOrArray(killPendingSegmentsSkipList); | |||
this.killableDatasources = parseJsonStringOrArray(killableDatasources); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea why this is using a function in the constructor instead of a jackson parser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume, so that it can handle "killableDatasources": ["a","b"]
as well as "killableDatasources": "a, b"
|
||
if (this.killAllDataSources && !this.killDataSourceWhitelist.isEmpty()) { | ||
if (this.killAllDataSources && !this.killableDatasources.isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you retain the WhiteList descriptor here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked to change this here: #6349 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@egor-ryashin could you please rename to killableDataSources that is more conventional spelling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java
Show resolved
Hide resolved
* @return list of host:port entries | ||
*/ | ||
@JsonProperty | ||
public Set<String> getHistoricalNodesInMaintenance() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if these are host and port entries, can they use HostAndPort
objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible, but it works intensively with DruidServerMetadata
which contains simple String
, and HostAndPort
doesn't provide efficient toString()
:
/** Rebuild the host:port string, including brackets if necessary. */
@Override
public String toString() {
StringBuilder builder = new StringBuilder(host.length() + 7);
if (host.indexOf(':') >= 0) {
builder.append('[').append(host).append(']');
} else {
builder.append(host);
}
if (hasPort()) {
builder.append(':').append(port);
}
return builder.toString();
}
this(server, peon, false); | ||
} | ||
|
||
public ServerHolder(ImmutableDruidServer server, LoadQueuePeon peon, boolean inMaintenance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking forward, how is in maintenance different than blacklist?
For example, lets say a historical is able to detect that it is unhealthy, and it wants to shed its load in as safe of a way as possible, so it declares its desire to have its load shed to other nodes. Would inMaintenance be the proper kind of thing for that?
Or if another problem arrises where a node has fractional problems, like it can only operate at about 50% of where it should. Is it proper to have the node be inMaintenance, or would a "fractional operating level" be appropriate here?
What I'm curious of is if it makes sense for the initial implementation to have an int here which represents an arbitrary "weight capacity" for the node, and "in maintenance" simply sets that weight capacity to 0.
Such a thing would open up other methodologies of shedding load in the future, or allow coordinator rules to take a "weight" into account for a server, where "weight" is intended to be dynamic and not a static value for the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, lets say a historical is able to detect that it is unhealthy, and it wants to shed its load in as safe of a way as possible, so it declares its desire to have its load shed to other nodes. Would inMaintenance be the proper kind of thing for that?
Or if another problem arrises where a node has fractional problems, like it can only operate at about 50% of where it should. Is it proper to have the node be inMaintenance, or would a "fractional operating level" be appropriate here?
Well, I would made 2 separate types like inMaintenance
, Unhealthy
to make it clear for an operator, otherwise it could be confusing to see a manual list extending itself at random. Also, we need to define detection algorithm for Unhealty status.
What I'm curious of is if it makes sense for the initial implementation to have an int here which represents an arbitrary "weight capacity" for the node, and "in maintenance" simply sets that weight capacity to 0.
Yes, inMaintenance
, Unhealthy
can carry a weight number that way. I assume, priority for each type could be different too, so an operator can have manually defined nodes to be drained faster, for example.
Map<Boolean, List<ServerHolder>> partitions = | ||
servers.stream().collect(Collectors.partitioningBy(ServerHolder::isInMaintenance)); | ||
final List<ServerHolder> maintenance = partitions.get(true); | ||
final List<ServerHolder> general = partitions.get(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest availableServers
or similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
int maintenanceSegmentsToMove = (int) Math.ceil(maxSegmentsToMove * priority / 10.0); | ||
log.info("Processing %d segments from servers in maintenance mode", maintenanceSegmentsToMove); | ||
Pair<Integer, Integer> maintenanceResult = balanceServers(params, maintenance, general, maintenanceSegmentsToMove); | ||
int generalSegmentsToMove = maxSegmentsToMove - maintenanceResult.lhs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this also posted to a weird line, looks like my comments are off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest adding the max
prefix here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
final int maxSegmentsToMove = Math.min(params.getCoordinatorDynamicConfig().getMaxSegmentsToMove(), numSegments); | ||
int priority = params.getCoordinatorDynamicConfig().getNodesInMaintenancePriority(); | ||
int maintenanceSegmentsToMove = (int) Math.ceil(maxSegmentsToMove * priority / 10.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, changing the name would help make it clearer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I think I missed the comment by a line :-P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean changing maintenanceSegmentsToMove
-> maxMaintenanceSegmentsToMove
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
) | ||
{ | ||
final NavigableSet<ServerHolder> queue = druidCluster.getHistoricalsByTier(tier); | ||
if (queue == null) { | ||
log.makeAlert("Tier[%s] has no servers! Check your cluster configuration!", tier).emit(); | ||
return Collections.emptyList(); | ||
} | ||
|
||
return queue.stream().filter(predicate).collect(Collectors.toList()); | ||
Predicate<ServerHolder> isGeneral = s -> !s.isInMaintenance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest changing name to isNotInMaintenance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not isAvailable
or something along those lines since the implication is that these servers are available to assign segments to, and to go along with your other variable rename suggestion in balancer logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO since this is specifically around maintenance then keeping the boolean methods related to the maintenance.
That being said, I'm curious if this predicate is in the wrong place. Should it just be part of the passed in predicate, and leave no "extra" or surprising predicate logic in this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getFilteredHolders
is always called to find servers which can be a target location for a segment, I think it should never return maintenance servers.
Changed to isNotInMaintenance
✅
return Response.status(Response.Status.BAD_REQUEST) | ||
.entity(ImmutableMap.of("error", setResult.getException())) | ||
.entity(ImmutableMap.<String, Object>of("error", e.getMessage())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a Jetty util for cleaning the exceptions floating around somewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose it's ServletResourceUtils.jsonize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ServletResourceUtils.sanitizeException
I believe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
@leventov Given this PR still requires quite a bit of review and testing, I think we should slate it for 0.13.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM 🤘
I agree with @drcrallen on variable rename suggestions to make code a little clearer
I also highly recommend updating the logic in the computeCost
method of org.apache.druid.server.coordinator.CostBalancerStrategy
to take into account if a server is in maintenance and return Double.POSITIVE_INFINITY
for the cost of that server, as a failsafe in case anything that is calling into the balancer to pick a server omits to do this check externally.
) | ||
{ | ||
final NavigableSet<ServerHolder> queue = druidCluster.getHistoricalsByTier(tier); | ||
if (queue == null) { | ||
log.makeAlert("Tier[%s] has no servers! Check your cluster configuration!", tier).emit(); | ||
return Collections.emptyList(); | ||
} | ||
|
||
return queue.stream().filter(predicate).collect(Collectors.toList()); | ||
Predicate<ServerHolder> isGeneral = s -> !s.isInMaintenance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not isAvailable
or something along those lines since the implication is that these servers are available to assign segments to, and to go along with your other variable rename suggestion in balancer logic
@egor-ryashin will you have time to address @drcrallen 's comments in the next few hours? |
@dclim it's unclear, I'm confused by some comments, I'm waiting for a reply. |
Okay, I'm starting the release process for 0.13.0 now so we can keep the 0.13.1 milestone for this one. |
@clintropolis @drcrallen I wonder if I answered all your questions? |
Oops, will have a look today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very sorry for the delay! I've had another look and I'm good with this patch after conflicts are fixed and travis passes 👍
@clintropolis I've resolved merge conflicts |
👍, @drcrallen do you have any more comments on this PR? |
This seems like a great feature to use for us for implementing k8s node upgrades (we use local disks on our k8s machines for historicals, but sometimes do need to upgrade node pools). We would set maintenance mode on one machine at a time, wait for its segments to be fully replicated, then replace its machine, and move on to the next one. In order to do this automatically I'm curious what the impact on metrics/API is. All it does is cause the coordinator to try to move segments around: it doesn't affect the response to metrics or the sys SQL entries, right? So I could wait until the historical's |
Exactly. |
@egor-ryashin sorry for the delay getting this merged, could you fix up conflicts again? @drcrallen any more comments on this PR? I think it would be nice to get in 0.14. |
@clintropolis Resolved. |
Going to go ahead and merge this for inclusion in 0.14.0 release since it already has 2 approvals |
One thing that's slightly awkward about storing this list as an array in the dynamic config: you can't atomically add or remove an element from this list. If implementing something that uses this feature dynamically (say from a k8s historical's pre-stop hook) it would be nice if they could send a single request that is guaranteed to add them to the list without accidentally removing something else from the list. Do you think it would be reasonable to add an optional query param to the /druid/coordinator/v1/config POST handler to allow you to specify the expected previous contents, so you can do a compare-and-swap write? One caveat is that you'd need to be sure that the value you sent is exactly the value stored in the DB, not corrupted by being round-tripped through Jackson... Another option would be to add a specific HTTP endpoint to add an element to a list in the dynamic config. (There'd still need to be a SQL-level compare-and-swap.) |
@glasser I think that would be a useful follow up, I like the 2nd approach better myself since it could be done without first fetching the config and presumably you already know the host you want to add to the list. It would also lend itself to something like adding an http endpoint directly to the historical servers that could be used to have them add themselves to the maintenance list, similar in spirit to what |
I have no argument as the PR is supposed to be an MVP. I plan to improve it after I thoroughly test it in production environment at least. |
@@ -46,8 +46,9 @@ public CoordinatorStats run(DruidCoordinator coordinator, DruidCoordinatorRuntim | |||
} else { | |||
params.getDruidCluster().getAllServers().forEach( | |||
eachHolder -> { | |||
if (colocatedDataSources.stream() | |||
.anyMatch(source -> eachHolder.getServer().getDataSource(source) != null)) { | |||
if (!eachHolder.isInMaintenance() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this !eachHolder.isInMaintenance()
condition be in an outer if
, rather than in this if-else chain? As it is written, if a node in the maintenance mode, the coordinator will delete segments from it rather than leave the node alone.
In retrospect, I think the "maintenance" term is too vague and somewhat misleading. It suggests that a node is (perhaps temporarily) "conserved", that is, the coordinator doesn't touch it, forever or for some time. I think something like "shuttingDown" or "windingDown" would be better. @egor-ryashin what do you think? |
"draining"? |
I don't quite like "shuttingDown" because of false associations with Java's ExecutorService.shutdown/shutdownNow. "draining" sounds good. |
@leventov |
I think clarity and avoiding developer confusion (I raised this issue after the term confused me when I was working on code and I spent a fair amount of time trying to understand why "maintenance" nodes are not in fact conserved) trumps the fact that the term is already adopted in some other projects for something similar. Are you also sure that it was not a naming mistake in that projects? I've raised #7148 regarding this. |
With this feature, a Historical node can be put in maintenance mode, by adding it to
historicalNodesInMaintenance
in Druid Coordinator dynamic config. Coordinator will move segments from such nodes with a specified prioritynodesInMaintenancePriority
. That allows to move segments from nodes which will be replaced afterwards avoiding availability issues which could be caused by abrupt shutdown.Issue #3247
This design suggests Coordinator as the container of the maintenance list which requires minimal changes to the current codebase (DruidCoordinatorBalancer, LoadRule).
An alternative design, where Historical nodes contain maintenance flag, will require, I expect,
double amount of code, as a separate thread will need to get that data from Historical nodes via an HTTP endpoint. Introducing the variable to DruidNode, seems awkward, as DruidNode is defined during service startup only, and changing this will require even greater amount of code and unit-test coverage.
So my design gives the same functionality for less amount of time. The second approach is actually an extension to my design and can be implemented at the next stage.
The initial design has only one flaw: if a Historical node is restarted the resource manager can assign a different port to the Historical node, which requires from the operator to update the maintenance list manually. This is quite rare but possible.