[Tech] technologies.md - update PCI-PassThrough notes re. containerisation #1336

tomkivlin · 2020-03-19T10:01:07Z

Section 8.1 is very focussed on the pros/cons as it relates to VMs, in particular with implicit assumptions that VMs will or will have the possibility to move (Live Migration, vMotion, etc.) between hypervisor hosts.

For containers, the implicit assumptions are subtly different, which changes the pros/cons. For example, containers are immutable and are never "migrated" between worker nodes. In addition, with a multi-AZ deployment architecture, there may be no need for the worker nodes to be able to "migrate" between hypervisor hosts either. Maybe a number of scenarios for containerised workloads is required.

Can we add in some notes to this section regarding the way in which SR-IOV and other IO Virtualisation is achieved with containers (e.g. using Kubernetes Device Plugin) and the pros/cons for such workloads?

markshostak · 2020-03-19T11:54:08Z

@tomkivlin Hi Tom, Can you provide a link to the chapter you're referring to pls? Thx, -M

tomkivlin · 2020-03-19T11:59:00Z

@tomkivlin Hi Tom, Can you provide a link to the chapter you're referring to pls? Thx, -M

https://github.com/cntt-n/CNTT/blob/master/doc/tech/technologies.md#8.1

SaadUSheikh · 2020-03-19T13:15:13Z

@tomkivlin based on latest industry trend specially coming from DIs-aggregated networks (Regional and Towars Edge DC's) we believe below two should be listed down in this list also .

P4
ASIC embedded on X-86 ( A direction which intel is supporting recently)

pgoyal01 · 2020-03-19T13:55:12Z

@tomkivlin Totally agree with your statement, "with a multi-AZ deployment architecture, there may be no need for the worker nodes to be able to "migrate" between hypervisor hosts either" but that applies equally to VMs and some operators do not support "live migration"

markshostak · 2020-03-19T14:14:20Z

@tomkivlin Thanks, Tom.

Section 8.1 is very focused on the pros/cons as it relates to VMs...

I believe our intent was more about use cases that employ cascaded kernels, and hence incur a double penalty on data transfers that would, without optimization, be processed twice, once in each kernel, and options to recover/avoid the additional tax, some of which do and some of which don't, violate CNTT principles. Whereas Containers are typically perched atop a single kernel, so this section may not be applicable to them.

...with implicit assumptions that VMs will or will have the possibility to move (Live Migration, vMotion, etc.) between hypervisor hosts... ...For containers, the implicit assumptions are subtly different, which changes the pros/cons. For example, containers are immutable and are never "migrated" between worker nodes.

My understanding is the underlying tenant of the section relates to portability, rather than migrateability. That is, a VNF workload that uses some form of h/w "acceleration" technology, such as SRIOU, should be able to be scheduled "anywhere", or at least, should not be constrained to Server Pool A, because Server Pool B, identical in all other respects, has a different NIC, and therefore requires a different variant of the VNF which contains the drivers for the NICs in B. Having portability can then help enable live migration, but in practice I've seen environments where portability is very important, yet migration is never used (nor even supported). Hence, I don't think the fact that the technologies discussed in the section lend themselves to migration, has much bearing on the differences between VMs and Containers. If it doesn't apply, it doesn't apply.

Can we add in some notes to this section regarding the way in which SR-IOV and other IO Virtualisation is achieved with containers (e.g. using Kubernetes Device Plugin) and the pros/cons for such workloads?

I agree we should definitely add something, likely much more than a few notes, to describe how we prescribe incorporation of acceleration technologies for Containers. However, for the reasons above I think it's a separate section from the existing text, as the challenges and mechanisms are both different.

Kubernetes Device Plugins seem like a great place to start, and given they're normalized by host-based kernel drivers, there's no need that I see for PCI-PT. The host kernel has enough visibility into the container to accurately DMA the data directly into the Container's process' user-space buffers. The h/w routing part of SRIOV could certainly be leveraged if we want to specify it, but I don't see an application for the PT portion, in the Container use case.

Bottom Line:
After going through the thought exercise above, your proposed material may be a more natural fit for RA-2. The hardware-specific s/w (i.e. drivers) supporting the K8s dev plugins get loaded in the host, and that's consistent w/ CNTT strategy/principles, so there may not be a need for us to call it out specially in Sect 8, nor do I see any non-conforming technology that would require a mitigation policy. It may be as simple as just codify the API references (e.g. gRPC over sockets) in the RA, the mechanism we want to use (e.g. drivers in the host), then go have a pint! :-) Thoughts?

markshostak · 2020-04-08T12:27:35Z

@tomkivlin Hi Tom, Just wanted to bump this one, so it doesn't go idle. Thx, -M

tomkivlin · 2020-04-09T08:21:06Z

My understanding is the underlying tenant of the section relates to portability, rather than migrateability. That is, a VNF workload that uses some form of h/w "acceleration" technology, such as SRIOU, should be able to be scheduled "anywhere", or at least, should not be constrained to Server Pool A, because Server Pool B, identical in all other respects, has a different NIC, and therefore requires a different variant of the VNF which contains the drivers for the NICs in B.

This makes sense and would be great it this was brought out more. As you say, that might be sufficient along with wording in RA2.

markshostak · 2020-04-10T13:04:46Z

@tomkivlin Hi Tom, While I have no objection to using examples in the docs, I try to avoid it unless it's in an informative section. If you think any of the language above would help clarify the doc, please feel free to use them in a PR. There's a lot of material above, and you have a knack for identifying the most succinct and relevant portions. Thx, -Mark

SaadUSheikh · 2020-04-10T13:50:44Z

@tomkivlin portability may not be a necessary requirement for VM's while migration can be . I suggest in this section to reflect technologies on a high level only and reference to details form community where appropriate . For example all diagrams here shows a view of hyper visor.

Similarly i want to ask one fundamental question do this list includes all relevant technologies .I find EVS , P4 , Asic on COTS etc not listed in document ,

markshostak · 2020-04-14T14:33:01Z

@SaadUllahSheikh We should give it a title more reflective of what it is. Relevant Technologies is overly broad, and could include anything and everything. @rabi-abdel Would you be in agreement with opening an issue to find a name with a tighter scope? Thx, -Mark

P.S. P4? I'm not sure I understand the relevance to CNTT on that one.

tomkivlin · 2020-04-14T14:35:13Z

@tomkivlin Hi Tom, While I have no objection to using examples in the docs, I try to avoid it unless it's in an informative section. If you think any of the language above would help clarify the doc, please feel free to use them in a PR. There's a lot of material above, and you have a knack for identifying the most succinct and relevant portions. Thx, -Mark

@markshostak I'm happy to craft a PR to attempt this, sure. Can change the title at the same time if we want?

markshostak · 2020-04-14T14:52:01Z

@tomkivlin @rabi-abdel I think we should revise the title. I need some insight from Rabi on what the intent of the section intrinsically is supposed to be. It's genesis was in the technology exception development, but it was forked off from Appendix A, so it may not be focused just on exceptions. Need some insight on the vision for the section... Rabi?

markshostak · 2020-04-14T14:54:42Z

@tomkivlin WRT to your other point, please do, and I'm happy to add any insight or provide any feedback, but I think you've got it. Thx, -Mark

rabi-abdel · 2020-04-14T16:36:13Z

@markshostak the section was not intended to only focus on exception, the aim basically intended to clarify CNTT position in various technologies. so changing the title to reflect that better is OK with me. in the container questions, yes, I think adding more details about IO virtualisation and containers make sense and it is important though to clarify CNTT position in any of those approaches/technologies. I think it will be great if @tomkivlin can draft a PR and @markshostak and all can help contributing into it. (including changing the title).

scottsteinbrueck · 2020-08-12T19:05:44Z

@tomkivlin Are you planning to create a PR, or shall we close this issue ?

tomkivlin · 2020-08-12T19:07:46Z

@tomkivlin Are you planning to create a PR, or shall we close this issue ?

Good question. This dropped off my list but I will do a PR.

tomkivlin added this to the Backlog milestone Mar 19, 2020

tomkivlin added this to To do in Technical Steering via automation Mar 19, 2020

markshostak mentioned this issue Mar 21, 2020

[RA2 appendix]: Following RM SR-IOV exception #1268

Merged

rabi-abdel removed this from the Backlog milestone May 15, 2020

rabi-abdel added the Backlog label May 15, 2020

project-bot bot moved this from To do to Backlog in Technical Steering May 15, 2020

rabi-abdel moved this from Backlog to To do in Technical Steering Jun 17, 2020

rabi-abdel removed the Backlog label Jun 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tech] technologies.md - update PCI-PassThrough notes re. containerisation #1336

[Tech] technologies.md - update PCI-PassThrough notes re. containerisation #1336

tomkivlin commented Mar 19, 2020

markshostak commented Mar 19, 2020

tomkivlin commented Mar 19, 2020

SaadUSheikh commented Mar 19, 2020

pgoyal01 commented Mar 19, 2020

markshostak commented Mar 19, 2020

markshostak commented Apr 8, 2020

tomkivlin commented Apr 9, 2020

markshostak commented Apr 10, 2020

SaadUSheikh commented Apr 10, 2020

markshostak commented Apr 14, 2020

tomkivlin commented Apr 14, 2020

markshostak commented Apr 14, 2020

markshostak commented Apr 14, 2020

rabi-abdel commented Apr 14, 2020

scottsteinbrueck commented Aug 12, 2020

tomkivlin commented Aug 12, 2020

[Tech] technologies.md - update PCI-PassThrough notes re. containerisation #1336

[Tech] technologies.md - update PCI-PassThrough notes re. containerisation #1336

Comments

tomkivlin commented Mar 19, 2020

markshostak commented Mar 19, 2020

tomkivlin commented Mar 19, 2020

SaadUSheikh commented Mar 19, 2020

pgoyal01 commented Mar 19, 2020

markshostak commented Mar 19, 2020

markshostak commented Apr 8, 2020

tomkivlin commented Apr 9, 2020

markshostak commented Apr 10, 2020

SaadUSheikh commented Apr 10, 2020

markshostak commented Apr 14, 2020

tomkivlin commented Apr 14, 2020

markshostak commented Apr 14, 2020

markshostak commented Apr 14, 2020

rabi-abdel commented Apr 14, 2020

scottsteinbrueck commented Aug 12, 2020

tomkivlin commented Aug 12, 2020