Scaling down should remove instances from the container with the most imbalance #1560
Comments
@billonahill This is an interesting idea. I think the algorithm can be simplified to work as follows: |
+1 I was thinking the same thing, which is that step 1 isn't really necessary since what matters is the relative allocation factor, not whether it's above or below the ideal. It's conceptually helpful to describe the ideal we're striving for only, but not required when implementing. |
+1 |
I see one issue with the proposed approach. As scale up results in increase in number of containers (mostly), i think scale down should prioritize reducing number of containers. The above approach will result in the following if 1 each of A and B is added and removed T0: deploy 4 A to B T1: scale up 1 A and 1 B T2: scale down A by 1 T3: scale down B by 1 |
@ashvina if T3 came after T2 you'd end up with (A B) (A B) (A B) right? Either way, yes this algorithm will be optimizing for balance over resource utilization. I think that's acceptable for now until we see how this feature is used. It's more tailored for the case where a single component is being scaled up and down by large amounts, and the balanced approach would free the previously added containers. When scaling down from T1 to T2 what you show seems better to me that going to (A A B) (A A B) (B) IMO. I'm inclined to try to achieve balance and if we need to at some point we could implement a repack that isn't biased towards least disruption that just repacks everything as efficiently as possible, to prune wasted space. Kind of like a Full GC that could happen if too many scale up/downs have happened and we're fragmented. |
@billonahill - In the example, event T3 is scaling-down component B. So it will be (A A B) (A) (A B) if Increase in number of containers can increase the total resource cost of a topology, specially for Aurora. In Aurora all containers will be of same size (the largest one). So the Also should we design the default scale down approach for a specific use case? Is scaling up a single component and scaling it down expected to be the most common operation? IMO, no. A user might scale up all the components. I think scale down should try to come as close to the packing algo as possible. |
Thinking aloud... A scale-down event will result in some disruption as processes are killed, some in-memory state will be lost etc. How about performing rebalancing and defragmentation (GC) when scale down is invoked? |
My reason for suspecting that a single component will be scaled is that when we see issues in production it's typically a single component that appears to be causing back pressure. As a result I suspect it will be fairly common that a.) the initial topology will be fairly well packed and that b.) users will adjust one at a time. When that happens, it's very likely that new, fairly unbalanced containers will be added, which should be remove upon a scale down. Repacking the entire topology is something we should experiment with and possibly iterate towards, but until we have stateful processing, we (i.e. Twitter) want to optimize for least disruption in terms of plan changes, speed of execution and state loss. |
I agree. IMO, least disruption in terms of plan changes is important, even if the underlying engine supports elastic operations, e.g. operator scaling, task migration. That's because plan changes always come with cost. |
I thought more about this based on @ashvina comments and I think that essentially @billonahill also wants to remove the containers that consist of same components whenever that possible. However, the solution he suggests is through the imbalance metric. The problem with this solution is that it might lose opportunities for removing containers because it attempts to remove components from the most loaded instances. I think we can still achieve what Bill wants (removing from newly added containers) without using the imbalance metric. The overall idea is that during scale down, we want to remove as many containers as possible (essentially from all the sets of possible solutions we want the one that minimizes the minimum container load). I think we can write an integer program that does that for not only single component scale downs but also for multiple component scale down in one command. Another approach is to use a heuristic that operates at a single component at a time. Let's take an example: We have 9 containers with 2 As and 2 Bs (A A B B), 1 container (A,B), 1 container with (A A A A A) and 1 container (A A A) each as a result of a scale up operation. The user wants to scale down A 10 times. The heuristic woks as follows: First pick the containers that contain As only (2 containers in this example) and sort them based on increasing number of instances . Start removing from the first container in the list (A A A), moving to the next one (A A A A A) until both containers are removed or we do not need to remove more instances. Note that in this example, we still need to remove two instances of A. Now we look at the containers that contain As and other components as well. We sort them in increasing order of load (total number of instances per container) and break ties by sorting them in increasing number of As. So the first container in the list will be (A B). We remove A from this container (result is container (B)) and then remove A from the next container (A A B B) to get container (A B B). I think this heuristic will have the expected behavior that Bill mentioned in his original example (in cases where users want to scale up/down one component at a time) and at the same time will attempt to remove as many containers as possible. In Ashvin's example above, the container (AB) will be removed resulting in the original packing plan. Any thoughts? We can also create an integer program and solve it thorugh some optimization framework if we want to be more accurate/fancy. |
@avflor what you're describing sounds similar to the scoring algo that I described with the addition of a secondary sort in the event of a scoring tie, by total number of instances ascending. This way you favor draining homogeneous containers first. Sounds reasonable to me. +1 |
@billonahill Yes, the basic differences are the following:
Using this heuristic we remove from least loaded containers thus increasing the chances of removing containers overall, but at the same time achieve the behavior we want for the scenarios you described using container categorization. |
@avflor - +1 looks good to me @billonahill @wangli1426 - Could you please help me understand the impact of redistributing process? Restarting an instance would require rebuilding in-memory state from an external state store. Is there any other heron system or user level cost? Could you please share some more details? |
@avflor no need to specially categorize in 1) as you describe since the homogeneous containers will have a score of 1.0 so they're naturally sort to the top, above heterogeneous. @ashvina restarting processes has start up costs in terms of time taken to shutdown/deregister plus time taken to reinitialize, as well as load on other components during service discovery, as well as re-registration with others. As a result, moving and restarting 1000s of process when adding or removing only a few, for example, would incur unnecessary cost. |
@billonahill Note that I don't use the ratio as the score. The scoring metric now is load (total number of instances in a container). The ratio score will result in removing from overloaded containers which is not correct since you will lose opportunities for container removal. By container categorization + different scoring function you avoid these problems while still getting the behavior you want in the scenario you described. Let me give an example: Let's take the case where we have the containers (A A B B ) (A A B B ) (A A B B ) and the user wants tor remove 2As and 2 Bs. You heuristic (ration as scoring function) after removal of 2 As will result in: (A B B ) (A B B) (A A B B) and after removal of B will result in (A B ) (A B) ( A A B B). We lost an opportunity here to remove one container. The heuristic I propose will produce ( A A B B) (A A B B). |
@ashvina Heron currently does not support restoring operator state (you referred to as in-memory state) after restarting a heron instance. But I believe it will be achieved in the near future. There are two ways to scale an operator. The first way is to kill all the instances of the operator and then restart a desirable number of instances. To guarantee operator state consistency, before an instance is called, the associated state should be written to a persistent storage. When new instances start, the state is restored from the persistent storage. Additional efforts are needed to decide how to re-partition the state to the instances. The second way is to redistribute the state from the instances to delete among the existing instances, or migration partial state from existing instances to the instances to create. By control the routing rules in the stream manager and the process of state migration carefully, live migration can be achieved without restarting instances. Compared with the first method, this method significantly reduces the interruption to the data processing, but comes with high implementation complex. So I believe we can implement the first method first. |
Is there any plan to support HeronInstance migration among different containers? With this feature, we can easily design algorithm to remove the under-utilized containers. For instance, consider we have two containers: (A A B B) ( B B A). After B is scaled-down to only one instance, the containers look like: (A A B) ( A ). If HeronInstance migration is supported, we can move A from the second container to the first one and release the second container. |
@wangli1426 - I am not aware of anyone working on HeronInstance migration. It is a good idea and was discussed informally a couple of times in the past. @billonahill - you mentioned this in one of the comments above, "moving and restarting 1000s of process when adding or removing only a few, for example, would incur unnecessary cost". I am not sure if I understand this. If scale down is removing only a few instances, then very container plans will change. Which means defragmentation will impact few processes only. |
@ashvina only the container plan of the instances removed would be effected if we just removed a few of them. If we instead reshuffled all instances across all containers, then every container plan would be effected. @avflor I see what you're saying now. I mis-read the first time. Yes, we should take into account all component changes requested when selecting a container. I'll follow up on #1584 |
Actualy, @avflor the algorithm you propose only works in examples like you've shows where you have a balanced plan to start with, which we've found is often not the case after scaling up. Take for example this plan: [AABB, AABB, AABB, AABB, AAAA, BBBB] If a request is made to remove 4As and 4Bs the user would expect to remove the last two, but instead we would get this: [AABB, AABB, AAAA, BBBB] If they did that again the would get this: [AAAA, BBBB] If instead they scaled down 4As and 4Bs in two different scale down requests they would get back to: [AABB, AABB, AABB, AABB] I think we'd need one more set of tie breakers (see 2):
I believe this would work for both scenarios that you and I presented. |
@billonahill I think we need a call to clarify all this :). My algorithm works at one component at a time as yours (not multiple ones). If a user has submitted a single request to remove 4As and 4 Bs, then the algorithm will work on the first A, second A, third A, fourth A, first B, second B, third B and fourth B in that order. Let's see the algo now: 1st A removal: We categorize the containers into 2 groups. We start with first group and remove 1 A. The result would be [[AABB, AABB, AABB, AABB, AAA,BBBB]. 2nd A removal. Creating again two groups: it is easy to see that 3rd and 4rth removal will result in removing one container resulting in: Then we get to 1st B. We have the following groups: We remove B from group 1 resulting in: As previosly after the fourth removal we will have [AABB, AABB, AABB, AABB]. Note that if you had multiple containers to pick in a group you pick the one with the least number of instances. In this example we had only one container every time in the homogeneous group so that didn't come across. |
For some reason I thought you were previously proposing an initial scoring taking into account all components being requested, which I see wasn't your intent. The challenge with the primary scoring being based on number on total instances ascending is that it biases away from removing imbalanced containers. For example if someone makes a request to remove 3As and a B from this: [ABC, ABC, ABC, AAAB] they would get this: [C, BC, BC, AAAB] Unbalance appears as a result of scale up, which is why if we want scale down to effectively back out those changes when scaling down, we would focus on removing imbalance first IMO. |
@billionahill. The result of scale up (which is homogeneous containers) will be undone with both your and my algorithm since they both remove the homogeneous containers (yours because of the imbalance metric and mine because of categorization). In this scenario both algos will produce the old state (state before scale upoperation). The problem comes when you operate on mixed containers. If your goal is for mixed containers to remove imbalance then that's ok, but if your goal is to remove as much containers as possible during scale down then imbalance metric will not do that. IMO the latter provides better user experience than the former because users ultimately scale down to release resources. |
The example you give me is a use case where your heuristic will remove a container and my will not. However, note that these are heuristics. There are cases that these will not work. The observation here is that optimizing for removing imbalance doesn't necessarily mean that you remove containers. In fact it can prevent container removal. If you want to optimize for removing containers then a heuristic that removes from the least loaded container will result in the general case in more removals that a heuristic that optimizes for imbalance. If we agree on these observations and you still think that removing imbalance is the first goal here then it's ok. But I want to make sure you understand my point that these are two different and conflicting objectives and that removing imbalance results in lost opportunities for removing containers in the general case. |
Folks, would it be worth creating a list of "popular" use cases, and compare the algorithms against that list? Also, what do you think about adding a policy configuration to choose between container optimization and balanced scale down? |
@ashvina We need to agree on the objective function first which means that we need to agree that optimizing for imbalance and optimizing for container removal is a different thing. When we agree on the objective function then we can determine the best algo fr our use cases. |
At some point we should consider adopting scale up/down strategies, but not until we see how people actually use the feature. Until then we can provide what we think it a reasonable default. @avflor I've updated the algorithm in #1584 per your suggestion. After reading it repeatedly I think I finally grokked all the scoring stages and wrote something that handles the examples we've discussed. |
I agree with @ashvina . It seems that we mixed different objectives, say load balance and minimization of # of containers, together. IMO, if we consider the problem we discuss as a Bin Packing problem, minimization of the # of containers used implicitly covers the load balance objective. Because you can remove more containers, only if the containers are well balanced. The difference is that in our scenario there is already a packing plan, so we also want to minimize the instance movement cost (Removing an instance has no cost, but moving an instance does) when doing the bin packing. By proposing a algorithm to solve the bin packing problem with the consideration on minimization the instance movement overhead, we can achieve both load balance, container removal and operator scaling-up/down, simultaneously. |
The repacking algos currently remove instances in a round-robin fashion from container 1, which can result in the imbalanced packing described below. Instead instances should be removed from the containers that have the most imbalance.
For example:
ExclamationTopology
:~/bin/heron submit local ~/.heron/examples/heron-examples.jar com.twitter.heron.examples.ExclamationTopology ExclamationTopology --deploy-deactivated --verbose
~/bin/heron update local ExclamationTopology --component-parallelism=exclaim1:6
~/bin/heron update local ExclamationTopology --component-parallelism=exclaim1:4
Instead the packing in 3 should look like 1. The proposed algorithm is as follows:
First, compute the idealized allocation (representation?) factor for component A on a given container as the ratio of component A's parallelism to the sum of all component parallelisms. For example, after Step 2 above the idealized allocation factor for
exclaim1
would be 6/8 or 0.75, since the topology has 8 total instances of which 6 are As.Second, compute the actual allocation factor on a given container for component A as the ratio A's instance count to the count of all instances on the container. Following the example above, after Step 2 the allocation factors for A on each of the nodes is [2/3, 2/3, 2/2] or [0.67, 0.67, 1.0].
Finally, the offset from the ideal could be computed as [-0.08, -0.08, +0.25], which shows that container 3 is the most over allocated and should be targeted for instance A removal.
The text was updated successfully, but these errors were encountered: