Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate the CPUWeight directly in the Executor #865

Closed
klapkov opened this issue Nov 30, 2023 · 10 comments
Closed

Calculate the CPUWeight directly in the Executor #865

klapkov opened this issue Nov 30, 2023 · 10 comments

Comments

@klapkov
Copy link
Contributor

klapkov commented Nov 30, 2023

Calculate the CPUWeight directly in the Executor

Summary

Recently it was discussed that moving the CPUWeight calculation to Diego world be better than the current setup. Currently it is calculated in the cloud_controller here: https://github.com/cloudfoundry/cloud_controller_ng/blob/cf03d73f760c2db565300074e7798a30aeb0da1b/lib/cloud_controller/diego/task_cpu_weight_calculator.rb#L7

After that it is passed trough the desiredLRP structure. The values of the min and max memory are hardcoded, which is not ideal. So this PR is a proposal on how we can do that locally in the executor. This change will probably need to happen in steps. First step is adding the calculation in the executor. Also adding configurable properties in the rep for the min and max memory allowed. This way we can first start to ignore the values passed in the desiredLRP and just assign what we got as CPUWeight in the executor.

At a later stage, the cpu_weight will need to be removed from the desiredLRP structure and the cloud_controller will need to remove their calculator.

Diego repo

Executor

Describe alternatives you've considered (optional)

PR in Executor: cloudfoundry/executor#90
Here we add the min_instance_memory, max_instance_memory and cpu_weight in the ContainerConfig structure. Calculation added as well and some test cases that test if the cpu_weight is calculated correctly. Once the cpu_weight is removed from the desiredLRP, a lot of the tests will need revisions.

PR in Diego-release: #866

Here we add props for min and max memory. Also ensure with template test that min_instance_memory will not be < 0

@stephanme
Copy link

Some background info:

Currently, CPU assignment to app instance is capped at 8G instance memory, originally reported in cloudfoundry/cloud_controller_ng#1178
Back then, this was more a theoretical problem but meanwhile we face user requests for 16G app instances and even more. Often, the reason is not memory but the wish for more CPU.

In last CAPI Open Office Hour (23/11/8, cc: @Gerg), we discussed 3 options:

  1. increase the hardcoded MAX_CPU_PROXY value in task_cpu_weight_calculator.rb
    • would just move but not eliminate the hardcoded value
    • changes the documented behavior, needs agreement between WGs and probably an RFC
  2. make the MIN/MAX_CPU_PROXY values of the task_cpu_weight_calculator configurable in CAPI
    • allows a compatible change, i.e. cf-deployment stays as today but we can configure foundations to grant more CPU
  3. consolidate cpu calculation on Diego side
    • reduces interface between CC and Diego: input from CC is just the instance memory
    • min and max memory for CPU share assignment will be configurable in Diego (allows compatible change)

From CAPI side, we prefer option 3: less coupling between CC and Diego, CPU assignment is managed completely by Diego.

@emalm
Copy link
Member

emalm commented Dec 6, 2023

I'm wary of pushing that memory/CPU allocation coupling into Diego (option 3): the CPU constraint behavior and relationship with memory for CF apps is already difficult to understand, and it gets moreso if it's defined only implicitly in the Diego components. Also, we've had casual discussions for years about exposing CPU as an independent resource request for CF apps (and possibly making it explicitly allocated within Diego, which could make a lot more sense with enforced CPU entitlements instead of just CPU shares), and option 3 moves in the opposite direction from that in a way that could be tricky to reconcile later.

It seems much more straightforward to do option 2, combined with relaxing/removing the BBS API's validation that CPU weight is capped at 100. There'd be a potential transient incompatibility with a newer CC against an older Diego BBS API, but using the current values as defaults and documenting the configuration order to roll out on upgrades should resolve that in practice.

Also, any of these options is going to change the documented behavior about app CPU allocation, so that's not really a ding against option 1 alone.

@ameowlia
Copy link
Member

ameowlia commented Dec 8, 2023

I agree with @emalm. There has been chatter about the desire to let apps decide their own CPU needs independent of memory. It would be a shame to move further away from that.

@PlamenDoychev
Copy link

Hi @emalm, @ameowlia ,

Thanks for the feedback provided. We are fine option 2 as well, we were not aware about the ideas of decoupling the CPU from the MEM.
If we choose option 2 then this topic more or less goes in the CAPI area.

@stephanme wdyt? :)

@stephanme
Copy link

I wasn't aware of the idea to specify CPU independently of memory as well. An, of course, I don't want to make it more difficult.

Still, I don't believe that the current CPU weight calculated by CAPI is a good interface parameter between CAPI and Diego. The CPU weight currently depends on the app instance memory and its real impact can't be understood without knowing Diego details (e.g. what is the guaranteed CPU for 1G mem?). The capping at an arbitrary value just demonstrates this in my opinion.
When CPU shall get a own resource I would prefer a parameter that users can understand, e.g. an absolute value in cores or a weight relative to the regular/fair-share cpu that an app gets assigned relative to memory (current value would be 100% independent of mem = app gets available cpu according to its memory share which translates into 0.25 cores per GB on typical standard VMs of IaaS providers - but this is Diego internal knowledge).

I have no hard objections against option 2 for now but I don't think that this will really help to get CPU configurable per app in future.

@Gerg Your thoughts? Should we discuss it one more time in next CAPI Open Office Hour?

@PlamenDoychev
Copy link

Hi @Gerg, @stephanme,

Can you provide timeline when this topic could be discussed CAPI internal?

@Gerg
Copy link
Member

Gerg commented Dec 29, 2023

I think it makes sense for CC to maintain control over CPU shares, even if it is a derived property for now. This leaves the door open for making CPU shares user-configurable in the future.

One option is we can start exposing CPU shares as a read-only property on the API, so users can at least get a sense of what the derived value is. This would also be a stepping stone to making them user-configurable.

If/when we introduce independent CPU share scaling, we will probably also need to introduce independent quotas for them.

@Gerg
Copy link
Member

Gerg commented Dec 29, 2023

In the near-term, making max memory configurable, instead of hard-coded in the source code seems like a reasonable step, even outside of the CPU weight discussion (Option 2).

I'm less clear on increasing the max CPU weight value on BBS. Since it's an arbitrary/relative value, I'm not sure that changing the max value buys us much, other than slightly increased precision.

@klapkov
Copy link
Contributor Author

klapkov commented Jan 9, 2024

Closing this issue here, since it was agreed to implement this on cloud_controller's side.

@emalm
Copy link
Member

emalm commented Jan 23, 2024

@Gerg @stephanme There are a couple of places where Diego would likely have to relax or drop CPU-related limits for workloads in coordination with CC removing its own limits. Ones I'm aware of are:

It's worth double-checking why those limits are there, but I'm inclined to think that they both could just be dropped: they may have made sense in cases where Diego's APIs would have been exposed more directly to end users (Lattice?), but that's not the case today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants