Calculate the CPUWeight directly in the Executor #865

klapkov · 2023-11-30T11:19:04Z

Calculate the CPUWeight directly in the Executor

Summary

Recently it was discussed that moving the CPUWeight calculation to Diego world be better than the current setup. Currently it is calculated in the cloud_controller here: https://github.com/cloudfoundry/cloud_controller_ng/blob/cf03d73f760c2db565300074e7798a30aeb0da1b/lib/cloud_controller/diego/task_cpu_weight_calculator.rb#L7

After that it is passed trough the desiredLRP structure. The values of the min and max memory are hardcoded, which is not ideal. So this PR is a proposal on how we can do that locally in the executor. This change will probably need to happen in steps. First step is adding the calculation in the executor. Also adding configurable properties in the rep for the min and max memory allowed. This way we can first start to ignore the values passed in the desiredLRP and just assign what we got as CPUWeight in the executor.

At a later stage, the cpu_weight will need to be removed from the desiredLRP structure and the cloud_controller will need to remove their calculator.

Diego repo

Executor

Describe alternatives you've considered (optional)

PR in Executor: cloudfoundry/executor#90
Here we add the min_instance_memory, max_instance_memory and cpu_weight in the ContainerConfig structure. Calculation added as well and some test cases that test if the cpu_weight is calculated correctly. Once the cpu_weight is removed from the desiredLRP, a lot of the tests will need revisions.

PR in Diego-release: #866

Here we add props for min and max memory. Also ensure with template test that min_instance_memory will not be < 0

stephanme · 2023-11-30T13:19:09Z

Some background info:

Currently, CPU assignment to app instance is capped at 8G instance memory, originally reported in cloudfoundry/cloud_controller_ng#1178
Back then, this was more a theoretical problem but meanwhile we face user requests for 16G app instances and even more. Often, the reason is not memory but the wish for more CPU.

In last CAPI Open Office Hour (23/11/8, cc: @Gerg), we discussed 3 options:

increase the hardcoded MAX_CPU_PROXY value in task_cpu_weight_calculator.rb
- would just move but not eliminate the hardcoded value
- changes the documented behavior, needs agreement between WGs and probably an RFC
make the MIN/MAX_CPU_PROXY values of the task_cpu_weight_calculator configurable in CAPI
- allows a compatible change, i.e. cf-deployment stays as today but we can configure foundations to grant more CPU
consolidate cpu calculation on Diego side
- reduces interface between CC and Diego: input from CC is just the instance memory
- min and max memory for CPU share assignment will be configurable in Diego (allows compatible change)

From CAPI side, we prefer option 3: less coupling between CC and Diego, CPU assignment is managed completely by Diego.

emalm · 2023-12-06T21:57:31Z

I'm wary of pushing that memory/CPU allocation coupling into Diego (option 3): the CPU constraint behavior and relationship with memory for CF apps is already difficult to understand, and it gets moreso if it's defined only implicitly in the Diego components. Also, we've had casual discussions for years about exposing CPU as an independent resource request for CF apps (and possibly making it explicitly allocated within Diego, which could make a lot more sense with enforced CPU entitlements instead of just CPU shares), and option 3 moves in the opposite direction from that in a way that could be tricky to reconcile later.

It seems much more straightforward to do option 2, combined with relaxing/removing the BBS API's validation that CPU weight is capped at 100. There'd be a potential transient incompatibility with a newer CC against an older Diego BBS API, but using the current values as defaults and documenting the configuration order to roll out on upgrades should resolve that in practice.

Also, any of these options is going to change the documented behavior about app CPU allocation, so that's not really a ding against option 1 alone.

ameowlia · 2023-12-08T17:17:15Z

I agree with @emalm. There has been chatter about the desire to let apps decide their own CPU needs independent of memory. It would be a shame to move further away from that.

PlamenDoychev · 2023-12-11T10:18:38Z

Hi @emalm, @ameowlia ,

Thanks for the feedback provided. We are fine option 2 as well, we were not aware about the ideas of decoupling the CPU from the MEM.
If we choose option 2 then this topic more or less goes in the CAPI area.

@stephanme wdyt? :)

stephanme · 2023-12-11T13:00:27Z

I wasn't aware of the idea to specify CPU independently of memory as well. An, of course, I don't want to make it more difficult.

Still, I don't believe that the current CPU weight calculated by CAPI is a good interface parameter between CAPI and Diego. The CPU weight currently depends on the app instance memory and its real impact can't be understood without knowing Diego details (e.g. what is the guaranteed CPU for 1G mem?). The capping at an arbitrary value just demonstrates this in my opinion.
When CPU shall get a own resource I would prefer a parameter that users can understand, e.g. an absolute value in cores or a weight relative to the regular/fair-share cpu that an app gets assigned relative to memory (current value would be 100% independent of mem = app gets available cpu according to its memory share which translates into 0.25 cores per GB on typical standard VMs of IaaS providers - but this is Diego internal knowledge).

I have no hard objections against option 2 for now but I don't think that this will really help to get CPU configurable per app in future.

@Gerg Your thoughts? Should we discuss it one more time in next CAPI Open Office Hour?

PlamenDoychev · 2023-12-18T08:29:48Z

Hi @Gerg, @stephanme,

Can you provide timeline when this topic could be discussed CAPI internal?

Gerg · 2023-12-29T19:02:25Z

I think it makes sense for CC to maintain control over CPU shares, even if it is a derived property for now. This leaves the door open for making CPU shares user-configurable in the future.

One option is we can start exposing CPU shares as a read-only property on the API, so users can at least get a sense of what the derived value is. This would also be a stepping stone to making them user-configurable.

If/when we introduce independent CPU share scaling, we will probably also need to introduce independent quotas for them.

Gerg · 2023-12-29T19:12:52Z

In the near-term, making max memory configurable, instead of hard-coded in the source code seems like a reasonable step, even outside of the CPU weight discussion (Option 2).

I'm less clear on increasing the max CPU weight value on BBS. Since it's an arbitrary/relative value, I'm not sure that changing the max value buys us much, other than slightly increased precision.

klapkov · 2024-01-09T08:10:00Z

Closing this issue here, since it was agreed to implement this on cloud_controller's side.

emalm · 2024-01-23T00:52:04Z

@Gerg @stephanme There are a couple of places where Diego would likely have to relax or drop CPU-related limits for workloads in coordination with CC removing its own limits. Ones I'm aware of are:

the BBS API caps CPUWeight at 100,
the cell rep caps the CPU shares at 1024 by default (h/t @mkocher @acrmp).

It's worth double-checking why those limits are there, but I'm inclined to think that they both could just be dropped: they may have made sense in cases where Diego's APIs would have been exposed more directly to end users (Lattice?), but that's not the case today.

klapkov added the enhancement label Nov 30, 2023

This was referenced Nov 30, 2023

[cpu_weight] Add instance memory boundaries and test #866

Closed

[cpu_weight] Add Calculation of cpu_weight and instance memory boundaries cloudfoundry/executor#90

Closed

stephanme mentioned this issue Jan 8, 2024

CPU assignment for app instance >8G memory cloudfoundry/cloud_controller_ng#3588

Closed

klapkov closed this as completed Jan 9, 2024

PlamenDoychev mentioned this issue Jan 12, 2024

[BBS] Application CPU assignment #886

Closed

PlamenDoychev mentioned this issue Mar 9, 2024

Add PlamenDoychev as ARP approver cloudfoundry/community#793

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate the CPUWeight directly in the Executor #865

Calculate the CPUWeight directly in the Executor #865

klapkov commented Nov 30, 2023 •

edited

Loading

stephanme commented Nov 30, 2023

emalm commented Dec 6, 2023

ameowlia commented Dec 8, 2023 •

edited

Loading

PlamenDoychev commented Dec 11, 2023

stephanme commented Dec 11, 2023

PlamenDoychev commented Dec 18, 2023

Gerg commented Dec 29, 2023

Gerg commented Dec 29, 2023

klapkov commented Jan 9, 2024

emalm commented Jan 23, 2024

Calculate the CPUWeight directly in the Executor #865

Calculate the CPUWeight directly in the Executor #865

Comments

klapkov commented Nov 30, 2023 • edited Loading