-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RM Ch04] Remove oversubscription from basic #1424
Comments
@pgoyal01 @karinesevilla @iangardner22 @CsatariGergely @tomkivlin @markshostak @ASawwaf I'm raising this issue because the RM has not completed the action about setting cpu allocation ratio to 1:1 for basic, and that the RA1 now has a pull request that has both a basic flavour with 1:1 and a basic flavour with 4:1 cpu allocation ratios. I would like to ensure we have agreement on going forward with moving basic to 1:1 allocation ratio as was discussed, and that the RM can then provide some clearer perspective to RA1 issue #1387. we can take a simple vote with: |
based on CH02 : Basic: for VNFCs that can tolerate resource over-subscription and variable latency, and we list an example of NMS , AAA , Which the performance does not matter |
I wasn't in Prague when the decision, was made, and have not seen any reasonable justification for the decision, and so I will not vote on this. Please remember that is addition to some VNFs there are associated software (for example, services portals) that are perfectly OK with higher CPU Allocation Ratios. May I suggest that the CNTT take the following position -- critique/alternate suggestions welcome:
BTW nowhere do we address Memory/Storage over allocation ratios. The OpenStack suggested Memory Allocation Ratio is 1.5:1. |
PS Basic doesn't replace Compute Intensive. If I remember correctly, Compute Intensive had CPU Pinning, NUMA, huge pages and also some network throughput requirements. |
Can we not define both. Basic1 (1:1) and Basic4 (4:1)? I don't recall the rationale for removing one of the key benefits of virtualisation unless predictable performance is an overarching mandate for CNTT RM for all use cases? I get it for VNFs but not for supporting services that spend a lot of time idle. |
This is a good question. I'll try to explain why I think it's a bad idea. I believe that this either strands resources or requires a definition of a new host-group with a different amount of physical memory.
So, I'm arguing that Basic1 profiles and a Basic4 profiles
and thus, that we should thus not have both a Basic1 and a Basic4. |
I don't recall the discussion in Prague so apologies if I'm repeating what was already brought up. My concerns with this PR and associated issue are:
|
Tom raised a good point and I think now that CPU allocation ratio should not be part of RM. |
Kelvin, This was discussed ad nauseam between Prague and the RM mtgs, and then again discussed and approved by the TSC. It appears you created an issue to correct a defect (PR #1064 didn't update the 1:1 ratio), and you're attempting to use it to make a functional change (i.e add an over-subscription capability). If we want to reintroduce over-subscription, that's fine, but given the amount of effort we just went through to remove it, be sure to open an issue for that purpose, document the rationale, and add it to the agenda. My 2 cents...
Hi Ian, We can, but...The rationale was along the lines of simplification and focus for CNTT. An Operator could certainly add such a profile if they wanted to, but given the target workloads for CNTT, it didn't sound like the effort to design and more importantly, validate the extra profile could be justified at this time. Later, once the primary workloads are humming, I see no issue with adding over-subscription. Note, the CI was only parked, not deleted. @tomkivlin Hi Tom, To cherry pick some of your questions: WRT #3, parking over-subscription was a means to focus CNTT on infra for the primary use cases, so our limited resources aren't spread too thin. Note, the intent and language was always to "park" Compute Intensive, not to delete it. Once we have finished everything specific to target workloads, the scope could be widened again, and in the interim, there's no reason a given Operator couldn't add an oversubscribed profile for their infra, but to your question, the value is in staying focused and not trying to boil the ocean. WRT #4, in a word, no, not from now on. See above. WRT #5, totally agree. We'd love to hear any suggestions on how the Models' attributes could be modified or augmented to better address the needs of Containerized workloads. We're always updating the language to be more generic, but that's just generic "paint" to avoid precluding certain use cases. What would be really valuable, would be enhancements to normalize existing attributes across types of workload, as well as additional attributes, that may or may not be specific to Containers. Again, we're happy to discuss/brainstorm any time. @kedmison Kelvin, Any comments or clarifications? Thanks, |
Thanks Mark, I understand the background and sorry for coming in at the 11th hour. For example, I understand why we are defining standard abstracted capabilities and interfaces in the RM - my understanding of the goal of CNTT (and then OVP 2.0) is to enable software and infra vendors to know what each other are developing against and to enable interoperability benefits for both parties and the operators. I need to read around more of the performance benchmarking aspects of CNTT and OVP but to me it either feels like a step too far, or that we have got quite a big gap in our documentation and capability today around that part. |
@tomkivlin Hi Tom,
I'm not sure why they were originally, but my impression of the intent in Issue 973 was to remove over-subscription ratios, as I'm looking at it as we removed over-subscription. Full stop. As opposed to, we're specifying a 1:1 over-sub ratio. In other words, either you specify a 1:n over-sub ratio, or you turn the over-sub feature off (1:1). Granted, it's a fine line... As for the appropriateness of the attribute being intrinsic to an IT, as opposed to being part of an extension or a "knob", I'll defer to Kelvin @kedmison to comment on, as I suspect the best answer/solution will be rooted largely in the OpenStack API, with the exception of the bare-metal use case.
Probably a combination of the two. :-) As described above, the characterization is a compromise solution. As far as the doc gap, if you're referring to documenting the characterization methodology or instrumentation, the RM team is about to, or hopefully, already has written the first draft of the guidelines. @kedmison how is that coming along? :-) Thanks, |
The intent was not to 'remove oversubscription' from the RM, (thus leaving it undefined and open to interpretation/customization) but to in effect ensure that oversubscription was not taking place. From a Kubernetes/RA2 perspective, compliance should not be an issue here. From a RA1/Openstack perspective, I think there wind up being bin-packing problems that I elaborated earlier that wind up reducing or eliminating altogether the benefits of over-subscription. |
Thanks @kedmison. My concern is that as things are worded/presented, it seems to me that an operator who chooses to over-subscribe, which is a valid design choice, would be non-conformant with CNTT? Is that the intent? If this is solely being stated with a view to allow the Reference Implementation to be used as a baseline for later performance testing, then that's fine, but I'd rather see a note stating that Vendor Implementations can allow over-subscription and individual operators can choose to use over-subscription if they choose to do so. |
@tomkivlin The primary argument is that some workloads will require an expectation of performance from Basic, and cannot tolerate overbooking. So, from an RM perspective, yes, I think that the operators must make available a non-over-subscribed basic, and if they want to have an additional, overbooked variant of Basic, I believe it should be outside of CNTT but an operator is not prevented from doing so. |
@pgoyal01 @tomkivlin I see in RA1 that Table 4.2 has already defined a B1 with no overbooking and a B4 with overbooking. Can we align on the RM specifying no overbooking (to be consistent across RA1 and RA2) and then the RA1 allowing the use of overbooking by specifying an additional Basic4 flavour there that is not specified at the RM level? |
@kedmison - I agree there needs to be consistency. For me, I would be happy with your proposed approach if we could add a note saying the purpose of this is for performance benchmarking purposes, meaning operators can choose different ratios if they accept the risk of an impact on performance. I'll suggest a change in the PR. Once that PR is merged, I'll do one to update RA2. |
@kedmison RM specified a 4:1 allocation ratio (and it is also in RI-1) and when you started that there should be a no over allocation Basic, we decided to include both while a decision was made on which is it to be. In RA-1 we can add a note about performance implications -- BTW not necessarily the case if the application really doesn't need the capacity. Any adds will now be in the next release. |
* Update basic to not be over-subscribed. Fixes #1424. * Add note regarding of 1:1 for basic profile Indicate operators may choose a different allocation ratio for Basic profile if operators also assume the risk of performance impact.
* Update basic to not be over-subscribed. Fixes anuket-project#1424. * Add note regarding of 1:1 for basic profile Indicate operators may choose a different allocation ratio for Basic profile if operators also assume the risk of performance impact.
Issue #973 documented the intent to both park compute intensive flavour and to remove oversubscription from the Basic flavour.
The Parking of compute-intensive was done, but the removal of oversubscription was not. This issue addresses the removal of over-subscription by setting CPU allocation ratio to 1:1 for Basic flavour.
The text was updated successfully, but these errors were encountered: