Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Validate resource constraint (RAM and CPU) in RoundRobinPacking #3142

Merged

Conversation

xiaoyao1991
Copy link
Member

@xiaoyao1991 xiaoyao1991 commented Jan 2, 2019

This PR might break some mis-configured topologies in production

This PR addresses some issues raised previously in the slack chat that the packing algorithm did not honor container-level and instance-level constraints on CPU resource. Additionally, we added resource constraint validation in RoundRobinPacking. Specifically:

  • topology.container.cpu and topology.container.ram are honored and considered as the hard cap constraint for the corresponding resource in one container.
  • topology.component.cpumap and topology.component.rammap are honored and considered as the instance resource mapping in one container.
  • The sum of instance resources in one container, plus padding, should not exceed the container-level constraint, otherwise the scheduling would fail.


T increaseBy(int percentage);

boolean greaterThan(T other);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Comparable interface already provides the compareTo method. Do we really need these 4 comparison methods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The ByteAmount had these methods and I preserved these to guarantee minimum changes. I guess they were there just for readability?


T plus(T other);

T multiply(int factor);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the factor and percentage are int?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics for factor is percentage in ByteAmount and don't allow decimal spaces. The original method signatures in ByteAmount use int too.

import java.util.HashMap;
import java.util.Map;

public final class CPUShare implements ResourceMeasure<CPUShare> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to create a separate CPUShare class to handle the CPU resource calculation? Is that possible to reuse what we already have in the ram resource calculation? Or will there be RamShare and DiskShare classes in following patches?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem necessary to separate RamShare and DiskShare as RAM and disk are both measured in ByteAmount. CPU is measured in share of time used.

Previously, we didn't take CPU resource constraints into consideration when composing packing plan. Now that we do -- namely we need to collect instance cpu resource mapping, and the logic of that is very similar to that of instance RAM resource mapping (previously as getInstancesRAMMapInContainer()). But due to the difference in the type of measurement(ByteAmount vs double), we might end up with a lot of code duplicates. Hence, I abstract out CPUShare so that it looks very similar to what we have in ByteAmount, and thus we can have one calculateInstancesResourceMapInContainer() that works for both CPU and RAM (potentially disk if needed) instance resource mapping collection.

@xiaoyao1991 xiaoyao1991 force-pushed the validate_resource_constraint_in_RRPacking branch from dc78dbb to 5bd71c0 Compare January 3, 2019 22:13
* Test the scenario CPU map config is completely set
*/
@Test
public void testCompleteCpuMapRequested() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add these cases:

  • container cpu is more than needed by instances.
  • container cpu is less than needed by instances.

also, other processes in container could need a fixed amount of CPU as well. So if container cpu == needed by instances, it might be better to return false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Added more tests to cover the mentioned scenarios


ByteAmount containerDiskInBytes = getContainerDiskHint(roundRobinAllocation);
double containerCpu = getContainerCpuHint(roundRobinAllocation);
double containerCpuHint = getContainerCpuHint(roundRobinAllocation);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containerCpuHint (configured container cpu, or max (padding + 1 per instance)) is not used. As the result, I think user specify container cpu is ignored. Maybe need a max(containerCpuHint, containerCpu) I feel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually used down in calculateInstancesResourceMapInContainer as the containerResHint. It is used to validate the all instances + padding in roundRobinAllocation does not exceed the containerResHint. It's just not used anymore in packingInternal, similar to how we treated containerRamHint before the change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true. Assuming user configures:

  • container cpu to be 5
  • 1 cpu per instance
  • 3 instances in container

Then:
cpu hint is 5,
instance cpu is valid (enough for padding and instances)

In the old calculation, container cpu is 5 because user configured it.
In the new calculation, container cpu is 4 (1 * 3 + 1 padding).

With "max(containerCpuHint, containerCpu)", container cpu is the same as before.

I think it is important to make the result consistent. Some users might rely on setting container cpu to allocate more processing power to their instances instead of configuring per instance cpu (although it is technically correct).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. I see. Then I think we don't even need to take the max out of 2. ContainerCpuHint is always gonna be greater no matter what.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if container cpu is not configured, the getContainerCpuHint() would return padding + 1 * instance number, which is smaller than your new value, unless you update the getContainerCpuHint() function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh you are right, I just figured that situation :) Updated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original logic is messy and makes it hard to update. :(

Copy link
Member

@nlu90 nlu90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@huijunwu huijunwu merged commit a1b761f into apache:master Jan 10, 2019
@xiaoyao1991 xiaoyao1991 deleted the validate_resource_constraint_in_RRPacking branch February 15, 2019 01:34
sreev pushed a commit to sreev/incubator-heron that referenced this pull request Apr 9, 2020
…he#3142)

* init

* general resource constraint validation

* pass existing unit tests

* add more tests

* rename

* rename

* generic ResourceMeasure

* fixed wc example

* even more general generics

* address comments

* address comments by putting more tests

* set safe amount of cpu

* meaningful constants in ExampleResource
nicknezis pushed a commit that referenced this pull request Sep 14, 2020
* init

* general resource constraint validation

* pass existing unit tests

* add more tests

* rename

* rename

* generic ResourceMeasure

* fixed wc example

* even more general generics

* address comments

* address comments by putting more tests

* set safe amount of cpu

* meaningful constants in ExampleResource
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants