feat(api): add Flat VPC virtualization type for zero-DPU hosts by chet · Pull Request #1775 · NVIDIA/infra-controller

chet · 2026-05-18T18:34:15Z

Description

This closes #1522.

Adds Flat to VpcVirtualizationType for VPCs hosted on zero-DPU machines. ETV and FNN both presume a Carbide-managed DPU data plane, so using them for zero-DPU hosts meant allocating overlay machinery that nothing consumed. Flat just records the VPC and lets the network operator's switch fabric own reachability.

Also, as I had mentioned to @Matthias247 and @bcavnvidia on the side:

I ended up doing a bit more refactoring when I was in there. While I was working it, I was like, "you know, it'd be a lot nicer if this wasn't just a bunch of additional matching + conditional branching" -- so I tried to break it out by defining new approach VPC capabilities (kind of like machine capabilities and rack capabilities), and using that modeling to simplify some of the decision making.

Per-variant policy lives in a new VpcCapabilities profile in model::vpc::capability: which host fabric interface the type attaches to (Dpu or Nic), which segment types it accepts, whether it supports IPv6 / routing profiles / stretched-L2 SVI, and which other types it peers with. Each variant maps to one profile constant; handlers consult capability methods that just read from the profile. Adding a future VPC type is a six-field profile plus one match arm, no handler edits.

Flat VPCs and HostInband segments are mutually bound -- a Flat VPC can only hold HostInband segments, and HostInband segments can only live in Flat VPCs. Tenants pick FLAT through the same VPC create flow as any other type.

Docs in a separate PR. Tests added!

Signed-off-by: Chet Nichols III chetn@nvidia.com

Type of Change

Add - New feature or capability
Change - Changes in existing functionality
Fix - Bug fixes
Remove - Removed features or deprecated functionality
Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes

This PR contains breaking changes

Testing

Unit tests added/updated
Integration tests added/updated
Manual testing performed
No testing required (docs, internal refactor, etc.)

Additional Notes

github-actions · 2026-05-18T19:45:28Z

🌿 Preview your docs: https://nvidia-preview-pull-request-1775.docs.buildwithfern.com/infra-controller

kensimon · 2026-05-19T14:41:05Z

+            // virtualization types. Their capabilities will determine
+            // if they are allowed or not.
+            vpc1.network_virtualization_type
+                .ensure_can_peer_with(vpc2.network_virtualization_type)


So does this mean FNN-to-Flat peering is allowed? If so I think we need to update the logic in crate::ethernet_virtualization::tenant_network()to allow for this, since it currently only allows identical virtualization types

Yeah also good call! @bcavnvidia was asking me about this too -- like if this replaces VpcPeeringPolicy or otherwise.

My thought is the capabilities define what peering we CAN do, and that VpcPeeringPolicy will remain, allowing us to define at a site-level what we WILL do.

Maybe we don't need to, but it seems like nice layering.

And yeahhh if leverage VpcPeeringPolicy alongside the capabilities, we can enhance capabilities with a new exchanges_overlay_vni_for_peering parameter, and that allows us to generalize the tenant network flow (without needing any specific reference to ::Fnn) to something like:

let vpc_peer_ids: Vec<VpcId> = match policy { VpcPeeringPolicy::Exclusive => { let allowed = virtualization_type.capabilities().peers_with.to_vec(); db::vpc_peering::get_vpc_peer_vnis(txn, vpc_id, allowed) .await? .into_iter() .map(|(id, _)| id) .collect() } VpcPeeringPolicy::Mixed => { db::vpc_peering::get_vpc_peer_ids(txn, vpc_id).await? } VpcPeeringPolicy::None => vec![], }; vpc_peer_prefixes = get_prefixes_by_vpcs(txn, &vpc_peer_ids).await?; if virtualization_type.exchanges_overlay_vni_for_peering() { let vni_peer_types: Vec<_> = ALL_VARIANTS.iter().copied() .filter(|t| t.exchanges_overlay_vni_for_peering()) .collect(); vpc_peer_vnis = db::vpc_peering::get_vpc_peer_vnis( txn, vpc_id, vni_peer_types, ) .await? .iter() .map(|(_, vni)| *vni as u32) .collect(); }

kensimon · 2026-05-19T15:13:28Z

+            .ensure_supports_segment(&new_network_segment)
+            .map_err(CarbideError::from)?;
+        virtualization_type.allocates_svi_for(&new_network_segment)
    } else {


Should we fail if there's no VPC and new_network_segment.segment_type is HostInband? Your new code in api/src/handlers/instance/mod.rs will fail allocations onto HostInband segments if there's no vpc_id for the segment, so maybe it's better to validate that here?

Oh yeah good call out -- so, I think a HostInband segment should be able to exist without being bound to a VPC.

For example, maybe we want a HostInband segment for playing around with zero DPU provisioning, but maybe we'd never bind it to a VPC, and that should be ok; if we required a VPC, we'd have this weird inter-dependency thing where we'd need to create a VPC just for getting zero DPU hosts provisioned. This is kind of similar to Admin maybe?

All that to say, you're right in calling this out. I think the actual adjustment is to update the comments in api/src/handlers/instance/mod.rs to explain that better, and improve the error handling a bit to return an error specific to the segment not being bound to a VPC at allocation time.

I guess TLDR is it should be fine to have a HostInband segment not within a VPC, BUT, once it comes time to allocate an instance from the host into a VPC, the segment the host is in needs to be bound to a VPC?

This closes NVIDIA#1522. Adds `Flat` to `VpcVirtualizationType` for VPCs hosted on zero-DPU machines. ETV and FNN both presume a Carbide-managed DPU data plane, so using them for zero-DPU hosts meant allocating overlay machinery that nothing consumed. Flat just records the VPC and lets the network operator's switch fabric own reachability. Per-variant policy lives in a new `VpcCapabilities` profile in `model::vpc::capability`: which host fabric interface the type attaches to (`Dpu` or `Nic`), which segment types it accepts, whether it supports IPv6 / routing profiles / stretched-L2 SVI, and which other types it peers with. Each variant maps to one profile constant; handlers consult capability methods that just read from the profile. Adding a future VPC type is a six-field profile plus one match arm, no handler edits. Flat VPCs and `HostInband` segments are mutually bound -- a Flat VPC can only hold HostInband segments, and HostInband segments can only live in Flat VPCs. Tenants pick `FLAT` through the same VPC create flow as any other type. Docs in a separate PR. Tests added! Signed-off-by: Chet Nichols III <chetn@nvidia.com>

chet requested a review from a team as a code owner May 18, 2026 18:34

chet force-pushed the vpc-flat branch from 6227fc6 to bcf22eb Compare May 18, 2026 19:44

kensimon reviewed May 19, 2026

View reviewed changes

chet force-pushed the vpc-flat branch from bcf22eb to ff3938b Compare May 19, 2026 22:13

chet requested a review from Coco-Ben as a code owner May 19, 2026 22:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): add Flat VPC virtualization type for zero-DPU hosts#1775

feat(api): add Flat VPC virtualization type for zero-DPU hosts#1775
chet wants to merge 1 commit into
NVIDIA:mainfrom
chet:vpc-flat

chet commented May 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

kensimon May 19, 2026

Uh oh!

chet May 19, 2026

Uh oh!

chet May 19, 2026

Uh oh!

kensimon May 19, 2026

Uh oh!

chet May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chet commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Related Issues (Optional)

Breaking Changes

Testing

Additional Notes

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

kensimon May 19, 2026

Choose a reason for hiding this comment

Uh oh!

chet May 19, 2026

Choose a reason for hiding this comment

Uh oh!

chet May 19, 2026

Choose a reason for hiding this comment

Uh oh!

kensimon May 19, 2026

Choose a reason for hiding this comment

Uh oh!

chet May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chet commented May 18, 2026 •

edited

Loading