Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rename maintenance mode to decommission #7154

Merged
merged 15 commits into from
Mar 9, 2019

Conversation

clintropolis
Copy link
Member

@clintropolis clintropolis commented Feb 27, 2019

Fixes #7148.

New coordinator config properties are:

...
  "decommissioningNodes": ["localhost:8182", "localhost:8282"],
  "decommissioningMaxPercentOfMaxSegmentsToMove": 70
...

Minor adjustments to documentation and javadocs.

@clintropolis clintropolis added this to the 0.14.0 milestone Feb 27, 2019
"historicalNodesInMaintenance": ["localhost:8182", "localhost:8282"],
"nodesInMaintenancePriority": 7
"decommissionNodes": ["localhost:8182", "localhost:8282"],
"decommissionPriority": 7
Copy link
Member

@leventov leventov Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parhaps better to call it "decommissionVelocity" as @drcrallen suggested here: #6349 (comment) because "priority" is a super overloaded term.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, though I think the properties should be consistent named so maybe I'll either rename the other decommissioningNodes or call this one decommissionVelocity to match. Anyone have a preference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to decommissioningNodes and decommissioningVelocity

@@ -694,7 +694,7 @@ public CoordinatorHistoricalManagerRunnable(final int startingLeaderCounter)
}

// Find all historical servers, group them by subType and sort by ascending usage
Set<String> nodesInMaintenance = params.getCoordinatorDynamicConfig().getHistoricalNodesInMaintenance();
Set<String> decommissioned = params.getCoordinatorDynamicConfig().getDecommissionNodes();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested "decommissioningNodes"

@@ -32,18 +32,18 @@
private static final Logger log = new Logger(ServerHolder.class);
private final ImmutableDruidServer server;
private final LoadQueuePeon peon;
private final boolean inMaintenance;
private final boolean isDecommissioned;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested "isDecommissioning" or "beingDecommissioned"

final List<ServerHolder> availableServers = partitions.get(false);
log.info(
"Found %d servers in maintenance, %d available servers servers",
maintenanceServers.size(),
"Found %d decomissioned servers, %d available servers servers",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"%d decommissioning servers, % active servers"

  • word "servers" is duplicated
  • perhaps "active" is a better term in the course of this method, because servers to be decomissioned are available (in distributed systems terms) too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to 'active'

availableServers.size()
);

if (maintenanceServers.isEmpty()) {
if (decommssionedServers.isEmpty()) {
if (availableServers.size() <= 1) {
log.info("[%s]: %d available servers servers found. Cannot balance.", tier, availableServers.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"servers" duplicated

if (availableServers.size() <= 1) {
log.info("[%s]: %d available servers servers found. Cannot balance.", tier, availableServers.size());
}
} else if (availableServers.isEmpty()) {
log.info("[%s]: no available servers servers found during maintenance. Cannot balance.", tier);
log.info("[%s]: no available servers servers found during decommissioning. Cannot balance.", tier);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested to change the message to something like "no active servers found, segments can't be moved off %d decommissioning servers"

  • Maybe it should be warn(), not info()?
  • Seems that this logging statement and the statement at line 125 duplicate the statement at line 117. Maybe this part of code should be restructured.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"no active servers found, segments can't be moved off %d decommissioning servers" leads me to the question: should we really give up decommission in this case? Maybe if segments are available on historical nodes from other tiers (or from the same tier, on other decommissioning servers or on full active servers -- see below), or actually always, we should continue decommission by just dropping segments from the decommissioning servers and not moving them anywhere?

Also this condition doesn't cover the situation when there are active historical nodes in the tier, but they are either full, or their load queuess have maxSegmentsInNodeLoadingQueue elements (loading segments). (Note that in the latter case, balancing step won't be skipped entirely in the if (!currentlyMovingSegments.get(tier).isEmpty()) { branch above in the code, because historicals' segment loading queues may be filled up by LoadRule, because LoadRule uses an independent mechanism to implement the "making a balancing burst and then letting all segments to be loaded before the next balacing burst" pattern, ReplicationThrottler).

Copy link
Member

@leventov leventov Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I elaborated on the problems related to the independence of DruidCoordinatorBalancer and LoadRule in #7159.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"no active servers found, segments can't be moved off %d decommissioning servers" leads me to the question: should we really give up decommission in this case? Maybe if segments are available on historical nodes from other tiers (or from the same tier, on other decommissioning servers or on full active servers -- see below), or actually always, we should continue decommission by just dropping segments from the decommissioning servers and not moving them anywhere?

@egor-ryashin, what's your take on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leventov the initial idea was to keep the replication at the same level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidated the 'cannot balance' checks here and changed the message here to just log.warn that there is an insufficient number of active servers to do anything and added a return

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the initial idea was to keep the replication at the same level.

This should be reflected in docs, as well as other corner cases related to decommissioning nodes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to add docs that decommissioning can become stalled if there are no active servers available to move segments to.

@@ -59,8 +59,8 @@
private DataSegment smallSegment;
private DruidCluster secondCluster;
private ServerHolder generalServer;
private ServerHolder maintenanceServer2;
private ServerHolder maintenanceServer1;
private ServerHolder decommissionedServer2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested "decommissioning" prefix throughout the tests

int priority = params.getCoordinatorDynamicConfig().getDecommissionPriority();
int maxDecommissionedSegmentsToMove = (int) Math.ceil(maxSegmentsToMove * priority / 10.0);
log.info("Processing %d segments from decommissioned servers", maxDecommissionedSegmentsToMove);
Pair<Integer, Integer> decommissionedResult =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested "decommissioningResult"

int maxGeneralSegmentsToMove = maxSegmentsToMove - maintenanceResult.lhs;
log.info("Processing %d segments from servers in general mode", maxGeneralSegmentsToMove);
int priority = params.getCoordinatorDynamicConfig().getDecommissionPriority();
int maxDecommissionedSegmentsToMove = (int) Math.ceil(maxSegmentsToMove * priority / 10.0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested "maxSegmentsToMoveOffDecommissioningNodes"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote for smth ending in "segments".

@@ -251,7 +251,7 @@ public void testMoveMaintenancePriority()
}

@Test
public void testZeroMaintenancePriority()
public void testZeroDecommissionPriority()
{
DruidCoordinatorRuntimeParams params = setupParamsForMaintenancePriority(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Residual "Maintenance". Please search the whole codebase for this.

if (availableServers.size() <= 1) {
log.info("[%s]: %d available servers servers found. Cannot balance.", tier, availableServers.size());
}
} else if (availableServers.isEmpty()) {
log.info("[%s]: no available servers servers found during maintenance. Cannot balance.", tier);
log.info("[%s]: no available servers servers found during decommissioning. Cannot balance.", tier);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leventov the initial idea was to keep the replication at the same level.

@clintropolis
Copy link
Member Author

Thanks for review @leventov and @egor-ryashin 👍

@@ -257,7 +257,7 @@ private void assertConfig(
Set<String> expectedKillableDatasources,
boolean expectedKillAllDataSources,
int expectedMaxSegmentsInNodeLoadingQueue,
Set<String> decommissioned,
Set<String> decommissioning,
int decommissionPriority
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Residual 'priority'. Please search the codebase for other residuals

@fjy
Copy link
Contributor

fjy commented Mar 2, 2019

@leventov @egor-ryashin any more comments?

@egor-ryashin
Copy link
Contributor

Aside from other comments, LGTM.

@fjy
Copy link
Contributor

fjy commented Mar 4, 2019

@egor-ryashin can you change your status to approval?

@clintropolis
Copy link
Member Author

I think all comments should be addressed as of the last commit. I couldn't find any remaining related uses of 'maintenance' or 'priority', and docs have been updated.

|`nodesInMaintenancePriority`| Priority of segments from servers in maintenance. Coordinator takes ceil(maxSegmentsToMove * (priority / 10)) from servers in maitenance during balancing phase, i.e.:<br>0 - no segments from servers in maintenance will be processed during balancing<br>5 - 50% segments from servers in maintenance<br>10 - 100% segments from servers in maintenance<br>By leveraging the priority an operator can prevent general nodes from overload or decrease maitenance time instead.|7|
|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are "slow" nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, acceptable replication time and number of nodes. Value 1000 could be a start point for a rather big cluster. Default value is 0 (loading queue is unbounded) |0|
|`decommissioningNodes`| List of 'decommissioning' historical servers. The Coordinator doesn't assign new segments to these servers and moves segments from the servers at the rate specified by `decommissioningVelocity`.|none|
|`decommissioningVelocity`| Decommissioning velocity indicates what proportion of balancer 'move' operations out of `maxSegmentsToMove` total will be spent towards 'decommissioning' servers by moving their segments to active servers, instead of normal 'balancing' moves. Coordinator takes ceil(maxSegmentsToMove * (velocity / 10)) from servers in maitenance during balancing phase, i.e.:<br>0 - no segments from decommissioning servers will be processed during balancing<br>5 - 50% segments from decommissioning servers<br>10 - 100% segments from decommissioning servers<br>By leveraging the velocity an operator can prevent general servers from overload or decrease decommissioning time instead. Decommissioning can become stalled if there are no available active servers to place the segments.|7|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this description may be confusing. I suggest

Decommissioning velocity determines the maximum number of segments that may be moved away from 'decommissioning' servers non-decommissioning (that is, active) servers during one Coordinator's run, relative to the total maximum segment movements allowed during one Coordinator's run (which, in turn, is determined by the maxSegmentsToMove configuration). Specifically, the maximum is ceil(maxSegmentsToMove * (velocity / 10)). For example, if decommissioningVelocity is 0 no segments will be moved away from 'decommissioning' servers. If decommissioningVelocity is 5 no more than ceil(maxSegmentsToMove * 0.5) segments may be moved away from 'decommissioning' servers. By leveraging the velocity an operator can prevent general servers from overload or decrease decommissioning time instead. Decommissioning can become stalled if there are no available active servers to place the segments. The value should be between 0 and 10.

The difference in language is that instead of saying "will be spent" and "will be processed", we say "maximum that may be moved", because Coordinator doesn't guarantee and doesn't need to fulfill the "quota". Also removed the phrase "spent towards 'decommissioning' servers" because it may be confusing: we don't do anything "towards" decommissioning servers, rather "away from".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed. I also modified

By leveraging the velocity an operator can prevent general servers from overload or decrease decommissioning time instead.

to

By leveraging the velocity an operator can prevent active servers from overload by prioritizing balancing, or decrease decommissioning time instead.

To switch from 'general' to 'active' terminology and try to clarify that preventing overload means prioritizing balance over decommissioning.

* Decommissioning velocity indicates what proportion of balancer 'move' operations out of
* {@link CoordinatorDynamicConfig#getMaxSegmentsToMove()} total will be spent towards 'decommissioning' servers
* by moving their segments to active servers, instead of normal 'balancing' segments between servers.
* Coordinator takes ceil(maxSegmentsToMove * (velocity / 10)) from servers in maitenance during balancing phase:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"maitenance", as well as in several other places in the PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, missed the typo :(

log.info("[%s]: no available servers servers found during maintenance. Cannot balance.", tier);
if ((decommissioningServers.isEmpty() && activeServers.size() <= 1) || activeServers.isEmpty()) {
log.warn("[%s]: insufficient active servers. Cannot balance.", tier);
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it planned that we don't emit stats in this case? If yes, a comment should be left explicitly stating this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think makes sense to include the numbers of active and decommissioning servers to this logging statement, so that the cases are distinguishable in logs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it planned that we don't emit stats in this case? If yes, a comment should be left explicitly stating this.

I think it is pretty consistent with behavior of other 'cannot balance' conditions such as still having segments moving, not having any segments, and was changed to return here based on this comment #7154 (comment) which seemed reasonable to me.

I think makes sense to include the numbers of active and decommissioning servers to this logging statement, so that the cases are distinguishable in logs.

I removed the count here because the log statement immediately before this one lists the count of 'active' and 'decommissioning' servers. Should these log messages be consolidated, or just print the count info twice?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At very minimum, a comment like "not emitting moved and unmoved counts on purpose here" is needed. So that readers of this code don't have to think whether it counts are not emitted on purpose or this is an unintentional omission. However, #7154 (comment) doesn't really explain anything to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the count here because the log statement immediately before this one lists the count of 'active' and 'decommissioning' servers. Should these log messages be consolidated, or just print the count info twice?

Right, I missed that preceding logging statement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At very minimum, a comment like "not emitting moved and unmoved counts on purpose here" is needed.

Hmm, currently, all of the return points in balanceTier are effectively suppressing emitting the stats if it can't find anything to do, but I'm unsure if the return points are to explicitly not emit stats or just an optimization to exit fast and not do extra work that just overlooked emitting 0 values for stats. Is this correct behavior? In other words, should balanceTier always emit these stats or is it sensible like it is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments for all places we exit where explicitly not emitting stats for now

* maintenance2 | large segment
* name | segments
* -----------------+--------------
* general | large segment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should be "active" to match the terminology in production code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, fixed

* general | large & small segments
* maintenance1 |
* maintenance2 | large segment
* general | large & small segments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

@@ -1019,12 +1019,12 @@ private static LoadQueuePeon createOneCallPeonMock()
return mockPeon2;
}

private static ServerHolder createServerHolder(String tier, LoadQueuePeon mockPeon1, boolean maintenance)
private static ServerHolder createServerHolder(String tier, LoadQueuePeon mockPeon1, boolean decommission)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isDecommissioning

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

|`historicalNodesInMaintenance`| List of Historical nodes in maintenance mode. Coordinator doesn't assign new segments on those nodes and moves segments from the nodes according to a specified priority.|none|
|`nodesInMaintenancePriority`| Priority of segments from servers in maintenance. Coordinator takes ceil(maxSegmentsToMove * (priority / 10)) from servers in maitenance during balancing phase, i.e.:<br>0 - no segments from servers in maintenance will be processed during balancing<br>5 - 50% segments from servers in maintenance<br>10 - 100% segments from servers in maintenance<br>By leveraging the priority an operator can prevent general nodes from overload or decrease maitenance time instead.|7|
|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are "slow" nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, acceptable replication time and number of nodes. Value 1000 could be a start point for a rather big cluster. Default value is 0 (loading queue is unbounded) |0|
|`decommissioningNodes`| List of 'decommissioning' historical servers. The Coordinator doesn't assign new segments to these servers and moves segments from the servers at the rate specified by `decommissioningVelocity`.|none|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested "... and moves segments away from the 'decommissioning' servers at the maximum rate specified by decommissioningVelocity"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm, changed

@fjy
Copy link
Contributor

fjy commented Mar 5, 2019

@leventov any more comments?

|`nodesInMaintenancePriority`| Priority of segments from servers in maintenance. Coordinator takes ceil(maxSegmentsToMove * (priority / 10)) from servers in maitenance during balancing phase, i.e.:<br>0 - no segments from servers in maintenance will be processed during balancing<br>5 - 50% segments from servers in maintenance<br>10 - 100% segments from servers in maintenance<br>By leveraging the priority an operator can prevent general nodes from overload or decrease maitenance time instead.|7|
|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are "slow" nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, acceptable replication time and number of nodes. Value 1000 could be a start point for a rather big cluster. Default value is 0 (loading queue is unbounded) |0|
|`decommissioningNodes`| List of 'decommissioning' historical servers. The Coordinator doesn't assign new segments to these servers and moves segments away from the 'decommissioning' servers at the maximum rate specified by `decommissioningVelocity`.|none|
|`decommissioningVelocity`| Decommissioning velocity determines the maximum number of segments that may be moved away from 'decommissioning' servers non-decommissioning (that is, active) servers during one Coordinator's run, relative to the total maximum segment movements allowed during one Coordinator's run (which, in turn, is determined by the maxSegmentsToMove configuration). Specifically, the maximum is ceil(maxSegmentsToMove * (velocity / 10)). For example, if decommissioningVelocity is 0 no segments will be moved away from 'decommissioning' servers. If decommissioningVelocity is 5 no more than ceil(maxSegmentsToMove * 0.5) segments may be moved away from 'decommissioning' servers. By leveraging the velocity an operator can prevent active servers from overload by prioritizing balancing, or decrease decommissioning time instead. Decommissioning can become stalled if there are no available active servers to place the segments. The value should be between 0 and 10.|7|
Copy link
Member

@leventov leventov Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved away from 'decommissioning' servers to non-decommissioning

Missed preposition.

Also, backticks may be used at several places in this description.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thanks will fix. I'm also splitting that first sentence into 2 since it got a bit long.

|`historicalNodesInMaintenance`| List of Historical nodes in maintenance mode. Coordinator doesn't assign new segments on those nodes and moves segments from the nodes according to a specified priority.|none|
|`nodesInMaintenancePriority`| Priority of segments from servers in maintenance. Coordinator takes ceil(maxSegmentsToMove * (priority / 10)) from servers in maitenance during balancing phase, i.e.:<br>0 - no segments from servers in maintenance will be processed during balancing<br>5 - 50% segments from servers in maintenance<br>10 - 100% segments from servers in maintenance<br>By leveraging the priority an operator can prevent general nodes from overload or decrease maitenance time instead.|7|
|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are "slow" nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, acceptable replication time and number of nodes. Value 1000 could be a start point for a rather big cluster. Default value is 0 (loading queue is unbounded) |0|
|`decommissioningNodes`| List of 'decommissioning' historical servers. The Coordinator doesn't assign new segments to these servers and moves segments away from the 'decommissioning' servers at the maximum rate specified by `decommissioningVelocity`.|none|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's also worth mentioning, either in this configuration description or in the description of decommissioningVelocity, that if decommissioningVelocity is 0 then Coordinator not only doesn't move segment away from 'decomissioning' servers per se but also abstains from making any balancing movements involving 'decomissioning' servers. In this case, 'decomissioning' nodes indeed are in a sort of "maintenance" mode, as per the former config naming.

I think the current descriptions don't make this clear enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, to put it more generally, Coordinator always abstains from movement decisions involving 'decomissioning' servers (other than moving segments away from 'decomissioning' to non-decomissioning; specifically, Coordinator abstains from making movement decisions between decomissioning servers and from active to decomissioning servers) and tries to move segments away from decomissioning servers to the limit imposed by decommissioningVelocity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that is a good point and definitely worth documenting, thanks for the catch

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Coordinator doesn't assign new segments to these servers and moves segments away from the 'decommissioning' servers at the maximum rate specified by decommissioningVelocity.

It sounds odd that the servers are referred to as "these servers" and then "the 'decommissioning' servers" in the same sentence. The opposite should be better, I think.

@@ -632,7 +632,7 @@ private DruidCoordinatorRuntimeParams setupParamsForMaintenancePriority(int prio

mockCoordinator(coordinator);

// either maintenance servers list or general ones (ie servers list is [2] or [1, 3])
// either decommissioning servers list or general ones (ie servers list is [2] or [1, 3])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"general".

@@ -59,8 +59,8 @@
private DataSegment smallSegment;
private DruidCluster secondCluster;
private ServerHolder generalServer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"generalServer" -> "activeServer". Clint, please search code.

|`historicalNodesInMaintenance`| List of Historical nodes in maintenance mode. Coordinator doesn't assign new segments on those nodes and moves segments from the nodes according to a specified priority.|none|
|`nodesInMaintenancePriority`| Priority of segments from servers in maintenance. Coordinator takes ceil(maxSegmentsToMove * (priority / 10)) from servers in maitenance during balancing phase, i.e.:<br>0 - no segments from servers in maintenance will be processed during balancing<br>5 - 50% segments from servers in maintenance<br>10 - 100% segments from servers in maintenance<br>By leveraging the priority an operator can prevent general nodes from overload or decrease maitenance time instead.|7|
|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are "slow" nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, acceptable replication time and number of nodes. Value 1000 could be a start point for a rather big cluster. Default value is 0 (loading queue is unbounded) |0|
|`decommissioningNodes`| List of 'decommissioning' historical servers. The Coordinator doesn't assign new segments to these servers and moves segments away from the 'decommissioning' servers at the maximum rate specified by `decommissioningVelocity`.|none|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Coordinator doesn't assign new segments to these servers and moves segments away from the 'decommissioning' servers at the maximum rate specified by decommissioningVelocity.

It sounds odd that the servers are referred to as "these servers" and then "the 'decommissioning' servers" in the same sentence. The opposite should be better, I think.

|`nodesInMaintenancePriority`| Priority of segments from servers in maintenance. Coordinator takes ceil(maxSegmentsToMove * (priority / 10)) from servers in maitenance during balancing phase, i.e.:<br>0 - no segments from servers in maintenance will be processed during balancing<br>5 - 50% segments from servers in maintenance<br>10 - 100% segments from servers in maintenance<br>By leveraging the priority an operator can prevent general nodes from overload or decrease maitenance time instead.|7|
|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are "slow" nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, acceptable replication time and number of nodes. Value 1000 could be a start point for a rather big cluster. Default value is 0 (loading queue is unbounded) |0|
|`decommissioningNodes`| List of 'decommissioning' historical servers. The Coordinator doesn't assign new segments to these servers and moves segments away from the 'decommissioning' servers at the maximum rate specified by `decommissioningVelocity`.|none|
|`decommissioningVelocity`| Decommissioning velocity determines the maximum number of segments that may be moved away from 'decommissioning' servers to non-decommissioning (that is, active) servers during one Coordinator's run. This value is relative to the total maximum segment movements allowed during one run which is determined by the `maxSegmentsToMove` configuration. Specifically, the maximum is `ceil(maxSegmentsToMove * (velocity / 10))`. For example, if `decommissioningVelocity` is 5, no more than `ceil(maxSegmentsToMove * 0.5)` segments may be moved away from 'decommissioning' servers. If `decommissioningVelocity` is 0, segments will neither be moved from _or to_ 'decommissioning' servers, effectively putting them in a sort of 'maintenance' mode that will not participate in balancing or assignment by load rules. Decommissioning can also become stalled if there are no available active servers to place the segments. By leveraging the velocity an operator can prevent active servers from overload by prioritizing balancing, or decrease decommissioning time instead. The value should be between 0 and 10.|7|
Copy link
Member

@leventov leventov Mar 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for asking you doing and re-doing renames and docs, but I think we should better use "percent" than "velocity".

The reason is that after fixing #7159, there should be a single cap, maxSegmentsToMove. There should also be a configuration parameter that specifies what percent of that movement cap may be spent (at maximum) on segment loading and dropping (this is currently specified by a separate config replicationThrottleLimit). For that (future) configuration, I want to use "percent", (for example, "minGuaranteedBalancingMovesPercent"), because I think that 10% step (as in "velocity") might not be precise enough, and because "velocity" is simply not the right term for this situation.

Now, there is an observation that moving segments away from decommissioning nodes looks very much like a temporary "drop" rule "for servers". For this reason, I want the configurations that specify min guaranteed balancing quota and max 'decommissioning' movement quota to use the same units.

The other reason why percent may be preferable is that we don't need to explain what that are with ceil, / 10 etc. Everybody knows what percent are. So it's less likely that users specify a wrong number because they misinterpret the units (e. g. they specify decommissioningVelocity=10 because they think that the velocity is actually expressed in percent. I. e. they wanted decommissioningVelocity=1).

Specifically, I suggest this configuration be called "maxPercentOfDecommissioningMoves". It doesn't follow the prefix principle. "decommisioningMaxPercentOfMoves" is probably also acceptable, but because of strange word order, it's less understandable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully agree, I was considering this as well and think it is a lot more intuitive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll suggest decommissioningMaxSegmentsToMovePercent:

  • keeps the decommissioning prefix
  • has MaxSegmentsToMove in the name, showing that the parameters are related
  • placement of Percent follows MaxSegmentsToMove to show what the percent applies to

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah,decommissioningMaxSegmentsToMovePercent sounds good to me, will go with that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest "decommissioningMaxPercentInMaxSegmentsToMove", because there are two different maximums in play here:

  • The maximum percent: the specified percent is "maximum" because it doesn't need to be satisfied. E. g. if there are no decommissioning nodes the actual percentage will be zero.
  • The maximum segments to move.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would 'of' instead of 'in' be better, decommissioningMaxPercentOfMaxSegmentsToMove? Also, this seems a bit long, but then again it's not so much longer than decommissioningMaxSegmentsToMovePercent...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decommissioningMaxPercentOfMaxSegmentsToMove sounds fine to me, I'm not sure that there would be a good short name for the property

@fjy
Copy link
Contributor

fjy commented Mar 7, 2019

@leventov any more comments?

@leventov
Copy link
Member

leventov commented Mar 7, 2019

@fjy why do you repeatedly ping me in this issue less than 24 hours since my last response, don't you see that I respond in this issue every day anyway?

Although the discussion in the dev mailing list about the minimum poking period didn't arrive at any conclusion, nobody even proposed the poking period to be less than 2-3 working days.

@fjy
Copy link
Contributor

fjy commented Mar 7, 2019

@leventov this is the last issue remaining before we can do a release so we would appreciate that you help move the release along

"historicalNodesInMaintenance": ["localhost:8182", "localhost:8282"],
"nodesInMaintenancePriority": 7
"decommissioningNodes": ["localhost:8182", "localhost:8282"],
"decommissioningMaxPercentOfMaxSegmentsToMove": 7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

70%

|`historicalNodesInMaintenance`| List of Historical nodes in maintenance mode. Coordinator doesn't assign new segments on those nodes and moves segments from the nodes according to a specified priority.|none|
|`nodesInMaintenancePriority`| Priority of segments from servers in maintenance. Coordinator takes ceil(maxSegmentsToMove * (priority / 10)) from servers in maitenance during balancing phase, i.e.:<br>0 - no segments from servers in maintenance will be processed during balancing<br>5 - 50% segments from servers in maintenance<br>10 - 100% segments from servers in maintenance<br>By leveraging the priority an operator can prevent general nodes from overload or decrease maitenance time instead.|7|
|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are "slow" nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, acceptable replication time and number of nodes. Value 1000 could be a start point for a rather big cluster. Default value is 0 (loading queue is unbounded) |0|
|`decommissioningNodes`| List of historical servers to 'decommission'. Coordinator will not assign new segments to 'decomissioning' servers, and segments will be moved away from them to be placed on 'active' servers at the maximum rate specified by `decommissioningMaxPercentOfMaxSegmentsToMove`.|none|
Copy link
Member

@leventov leventov Mar 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two nits:

  • "servers, and" - double space
  • "active" is not a term in the same category as 'decommissioning', so I would not put it into quotes. Perhaps better to say non-decommissioning here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also typo: decommissioning

|`nodesInMaintenancePriority`| Priority of segments from servers in maintenance. Coordinator takes ceil(maxSegmentsToMove * (priority / 10)) from servers in maitenance during balancing phase, i.e.:<br>0 - no segments from servers in maintenance will be processed during balancing<br>5 - 50% segments from servers in maintenance<br>10 - 100% segments from servers in maintenance<br>By leveraging the priority an operator can prevent general nodes from overload or decrease maitenance time instead.|7|
|`maxSegmentsInNodeLoadingQueue`|The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are "slow" nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, acceptable replication time and number of nodes. Value 1000 could be a start point for a rather big cluster. Default value is 0 (loading queue is unbounded) |0|
|`decommissioningNodes`| List of historical servers to 'decommission'. Coordinator will not assign new segments to 'decomissioning' servers, and segments will be moved away from them to be placed on 'active' servers at the maximum rate specified by `decommissioningMaxPercentOfMaxSegmentsToMove`.|none|
|`decommissioningMaxPercentOfMaxSegmentsToMove`| The maximum number of segments that may be moved away from 'decommissioning' servers to non-decommissioning (that is, active) servers during one Coordinator's run. This value is relative to the total maximum segment movements allowed during one run which is determined by `maxSegmentsToMove`. If `decommissioningMaxPercentOfMaxSegmentsToMove` is 0, segments will neither be moved from _or to_ 'decommissioning' servers, effectively putting them in a sort of 'maintenance' mode that will not participate in balancing or assignment by load rules. Decommissioning can also become stalled if there are no available active servers to place the segments. By leveraging decommissioning percent, an operator can prevent active servers from overload by prioritizing balancing, or decrease decommissioning time instead. The value should be between 0 and 100.|70|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest the following (a few fixes highlighted in italics):

Determines the maximum number of segments that may be moved away from 'decommissioning' servers to non-decommissioning (that is, active) servers during one Coordinator's run. The maximum number of segment movements away from 'decommissioning' servers is relative to the total maximum segment movements allowed during one run which is determined by maxSegmentsToMove. If decommissioningMaxPercentOfMaxSegmentsToMove is 0, segments will neither be moved from or to 'decommissioning' servers, effectively putting them in a sort of "maintenance" mode that will not participate in balancing or assignment by load rules. Decommissioning can also become stalled if there are no available active servers to place the segments. By leveraging the maximum percent of decommissioning segment movements, an operator can prevent active servers from overload by prioritizing balancing, or decrease decommissioning time instead. The value should be between 0 and 100.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think starting with The maximum number is pretty consistent with the other descriptions:

|`mergeBytesLimit`|The maximum total uncompressed size in bytes ...
|`mergeSegmentsLimit`|The maximum number of segments that can be ...
|`maxSegmentsToMove`|The maximum number of segments that can be ...
|`replicantLifetime`|The maximum number of Coordinator runs for ...
|`replicationThrottleLimit`|The maximum number of segments that ...

I also think that repeating the near same phrase in the 2nd sentence isn't really necessary, but agree with the other changes 👍

Copy link
Member

@leventov leventov Mar 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this kind of consistency at the beginnings of the doc descriptions is a good thing.

You can reword it to start with "The maximum percent", but then the whole sentence should be changed.

The problem with "This value" is that the percent value is not relative to maxSegmentsToMove. The result of applying the percent to maxSegmentsToMove is relative to maxSegmentsToMove (quite obviously).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah i see the problem, I think some things can be reworded. How about this, where I sort of combine the ideas of the first 2 sentences, and also throw in a reference to decommissioningNodes, maybe:

The percent of maxSegmentsToMove that determines the maximum number of segments that may be moved away from 'decommissioning' servers (specified by decommissioningNodes) to non-decommissioning servers during one Coordinator balancer run. If decommissioningMaxPercentOfMaxSegmentsToMove is 0, segments will neither be moved from or to 'decommissioning' servers, effectively putting them in a sort of "maintenance" mode that will not participate in balancing or assignment by load rules. Decommissioning can also become stalled if there are no available active servers to place the segments. By adjusting decommissioningMaxPercentOfMaxSegmentsToMove, an operator can prevent active servers from overload by prioritizing balancing, or decrease decommissioning time instead. The value should be between 0 and 100.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clintropolis's proposed wording looks good to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 good wording.

* Historical nodes list in maintenance mode. Coordinator doesn't assign new segments on those nodes and moves
* segments from those nodes according to a specified priority.
* List of historical servers to 'decommission'. Coordinator will not assign new segments to 'decomissioning' servers,
* and segments will be moved away from them to be placed on 'active' servers at the maximum rate specified by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think don't need quotes on active.

* 10 - 100% segments from servers in maintenance
* By leveraging the priority an operator can prevent general nodes from overload or decrease maitenance time
* instead.
* The maximum number of segments that may be moved away from 'decommissioning' servers to non-decommissioning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested "Determines the maximum number ..."

* If `decommissioningMaxPercentOfMaxSegmentsToMove` is 0, segments will neither be moved from _or to_ 'decommissioning'
* servers, effectively putting them in a sort of 'maintenance' mode that will not participate in balancing or
* assignment by load rules. Decommissioning can also become stalled if there are no available active servers to place
* the segments. By leveraging decommissioning percent, an operator can prevent active servers from overload by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested "By leveraging the maximum percent of decommissioning segment movements, ..."

* the segments. By leveraging decommissioning percent, an operator can prevent active servers from overload by
* prioritizing balancing, or decrease decommissioning time instead. The value should be between 0 and 100.
*
* @return number in range [0, 100]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we also use @Min() and @Max() annotations in the codebase, perhaps should use it here too.

@@ -231,32 +231,35 @@ public int getMaxSegmentsInNodeLoadingQueue()
}

/**
* Historical nodes list in maintenance mode. Coordinator doesn't assign new segments on those nodes and moves
* segments from those nodes according to a specified priority.
* List of historical servers to 'decommission'. Coordinator will not assign new segments to 'decomissioning' servers,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: decommissioning

*/
@Test
public void testLoadReplicaDuringMaitenance()
public void testLoadReplicaDuringDecomissioning()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: decommissioning

params = new DruidCoordinatorBalancerTester(coordinator).run(params);
Assert.assertEquals(1L, params.getCoordinatorStats().getTieredStat("movedCount", "normal"));
Assert.assertThat(peon3.getSegmentsToLoad(), is(equalTo(ImmutableSet.of(segment2))));
}

/**
* Should balance segments as usual (ignoring priority) with empty maintenanceList.
* Should balance segments as usual (ignoring percent) with empty decommissioningList.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps meant decommissioningNodes, not decommissioningList

@leventov
Copy link
Member

leventov commented Mar 7, 2019

@clintropolis could you please also add "meitenance" and "decomission" as prohibited regex patterns in Checkstyle?

@leventov leventov mentioned this pull request Mar 7, 2019
@gianm
Copy link
Contributor

gianm commented Mar 8, 2019

@clintropolis could you please also add "meitenance" and "decomission" as prohibited regex patterns in Checkstyle?

This doesn't need to be done in this PR, IMO. Especially given that we want to backport this PR to 0.14.0, it should be as small as possible.

@gianm
Copy link
Contributor

gianm commented Mar 8, 2019

This doesn't need to be done in this PR, IMO. Especially given that we want to backport this PR to 0.14.0, it should be as small as possible.

Actually, I don't think it should be done at all in this PR, even if we weren't going to backport it. The exact approach to take to detect misspellings (or whether it's worth doing at all) may be controversial and should be hashed out on #7209.

@fjy
Copy link
Contributor

fjy commented Mar 8, 2019

👍 This is ready after tests pass

Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; looks like all comments have been addressed, will merge.

@gianm gianm merged commit a44df65 into apache:master Mar 9, 2019
jon-wei pushed a commit to jon-wei/druid that referenced this pull request Mar 9, 2019
* rename maintenance mode to decommission

* review changes

* missed one

* fix straggler, add doc about decommissioning stalling if no active servers

* fix missed typo, docs

* refine docs

* doc changes, replace generals

* add explicit comment to mention suppressed stats for balanceTier

* rename decommissioningVelocity to decommissioningMaxSegmentsToMovePercent and update docs

* fix precondition check

* decommissioningMaxPercentOfMaxSegmentsToMove

* fix test

* fix test

* fixes
@clintropolis clintropolis deleted the maintenance-mode-to-decommission branch March 10, 2019 01:07
fjy pushed a commit that referenced this pull request Mar 10, 2019
* rename maintenance mode to decommission

* review changes

* missed one

* fix straggler, add doc about decommissioning stalling if no active servers

* fix missed typo, docs

* refine docs

* doc changes, replace generals

* add explicit comment to mention suppressed stats for balanceTier

* rename decommissioningVelocity to decommissioningMaxSegmentsToMovePercent and update docs

* fix precondition check

* decommissioningMaxPercentOfMaxSegmentsToMove

* fix test

* fix test

* fixes
clintropolis added a commit to implydata/druid-public that referenced this pull request Apr 3, 2019
* rename maintenance mode to decommission

* review changes

* missed one

* fix straggler, add doc about decommissioning stalling if no active servers

* fix missed typo, docs

* refine docs

* doc changes, replace generals

* add explicit comment to mention suppressed stats for balanceTier

* rename decommissioningVelocity to decommissioningMaxSegmentsToMovePercent and update docs

* fix precondition check

* decommissioningMaxPercentOfMaxSegmentsToMove

* fix test

* fix test

* fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants