Skip to content

feat(api): plumb new InstanceNetworkConfig.auto field through the system#1576

Merged
chet merged 1 commit into
NVIDIA:mainfrom
chet:instance-network-config-auto
May 12, 2026
Merged

feat(api): plumb new InstanceNetworkConfig.auto field through the system#1576
chet merged 1 commit into
NVIDIA:mainfrom
chet:instance-network-config-auto

Conversation

@chet
Copy link
Copy Markdown
Contributor

@chet chet commented May 11, 2026

Description

This is the first part of #1533 (there will be 1-2 subsequent PRs for integration), but I thought it was important to send a tracer of sorts to show where it fed through.

This adds a new auto field to the InstanceNetworkConfig proto and api-model structs with a corresponding TryFrom implementation. No callers use this field yet; just getting it dropped into place and defaulting it to false.

This does include the simple exclusivity check, but all of the subsequent wiring will come in a follow-up PR.

This will be used for "configuring" instance interfaces zero DPU hosts (or hosts whose DPUs are in NIC mode). Since there is no configuration needed (because they're already configured), they don't really need to set anything. However, it was decided there should be an explicit signal. That signal is going to be auto.

When auto is set, it means:

  • interfaces must be empty, because the tenant is asking NICo to auto-configure interfaces.
  • NICo will resolve the interfaces from the HostInband segments for the host.

auto could mean something in the future with regards to some type of pluggable SDN component, where we're saying, "let the plugin deal with configuring the interface(s)". It just happens that in this case it's NICo doing it for zero DPU hosts.

Tests added.

Signed-off-by: Chet Nichols III chetn@nvidia.com

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@chet chet requested a review from a team as a code owner May 11, 2026 22:02
Copy link
Copy Markdown
Contributor

@Matthias247 Matthias247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's at least add a check soon that auto isn't set set for hosts with DPUs

@chet
Copy link
Copy Markdown
Contributor Author

chet commented May 11, 2026

@Matthias247 Yeah I'm actually adding the check right now -- I'll ping you back when it's up.

This is the first part of NVIDIA#1533 (there will be 1-2 subsequent PRs for integration), but I thought it was important to send a tracer of sorts to show where it fed through.

This adds a new `auto` field to the `InstanceNetworkConfig` proto and `api-model` structs with a corresponding `TryFrom` implementation. No callers use this field yet; just getting it dropped into place and defaulting it to `false`. All of the subsequent wiring will come in a follow-up PR.

This will be used for "configuring" instance interfaces zero DPU hosts (or hosts whose DPUs are in NIC mode). Since there is no configuration needed (because they're already configured), they don't really need to set anything. However, it was decided there should be an explicit signal. That signal is going to be `auto`.

When `auto` is set, it means:
- `interfaces` must be empty, because the tenant is asking NICo to `auto`-configure interfaces.
- NICo will resolve the interfaces from the `HostInband` segments for the host.

`auto` could mean something in the future with regards to some type of pluggable SDN component, where we're saying, "let the plugin deal with configuring the interface(s)". It just happens that in this case it's NICo doing it for zero DPU hosts.

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
@chet chet force-pushed the instance-network-config-auto branch from 7feb84d to dcae057 Compare May 11, 2026 23:56
@chet
Copy link
Copy Markdown
Contributor Author

chet commented May 11, 2026

@Matthias247 Done!

@Matthias247
Copy link
Copy Markdown
Contributor

@Matthias247 Done!

You mean the if auto && !config.interfaces.is_empty() part? What I had in mind was something like

if auto && machine.attached_dpu_machine_ids.len() > 0 { // or look at machine_capabilities
    return CarbideError::InvalidArgument(...).into();
}

but follow ups are fine

@chet
Copy link
Copy Markdown
Contributor Author

chet commented May 12, 2026

@Matthias247 Done!

You mean the if auto && !config.interfaces.is_empty() part? What I had in mind was something like

if auto && machine.attached_dpu_machine_ids.len() > 0 { // or look at machine_capabilities
    return CarbideError::InvalidArgument(...).into();
}

but follow ups are fine

Oh yeah! Yeah that's going to be in the next PR where I start actually wiring things up.

@chet chet merged commit 99ef4f6 into NVIDIA:main May 12, 2026
84 of 85 checks passed
chet added a commit to chet/bare-metal-manager-core that referenced this pull request May 14, 2026
…tion

This is the second part of NVIDIA#1533, wiring up the auto field that was plumbed through in NVIDIA#1576.

When a tenant sets `auto: true`, NICo resolves the instance's interfaces from the host's `HostInband` segments and stores the resolved config internally. On the wire, callers still see `{ auto: true, interfaces: [] }`, just like they originally sent. The resolved details still only surface in `instance.status.network.interfaces`. We do this via a new `InstanceNetworkConfig::into_external_view()` helper.

Additional tweaks include:
 - `add_inband_interfaces_to_config` (and `with_inband_interfaces_from_machine`) are gated on `network_config.auto`, and no longer auto-fill silently.
 - Instance allocate gates `auto`-ness on host class: zero DPU hosts are expected to be `auto`, DPU hosts are not.
 - Instance update path re-resolves on `auto: true` against the host's current `HostInband` segments. No-op if nothing changed, but a real update if the operator added/removed segments since allocation.

Tests added/updated!

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
chet added a commit to chet/bare-metal-manager-core that referenced this pull request May 14, 2026
…tion

This is the second part of NVIDIA#1533, wiring up the auto field that was plumbed through in NVIDIA#1576.

When a tenant sets `auto: true`, NICo now resolves the instance's interfaces from the host's `HostInband` segments and stores the resolved config internally, instead of magically just doing things. A tenant can set `auto:true` without `interfaces`, or leave `auto:false` and set `interfaces`, but it cannot do both.

On the wire, `auto` callers will still see `{ auto: true, interfaces: [] }` as their "stored" config, just like they originally sent (via a new `::into_external_view()` helper, even though internally we are storing them; the resolved details will surface in `instance.status.network.interfaces`, just like they do today for hosts with DPUs.

Additional tweaks include:
 - `add_inband_interfaces_to_config` is gated on `network_config.auto`, and no longer auto-fills silently.
 - Instance allocate gates `auto`-ness on host class: zero DPU hosts are expected to be `auto`, DPU hosts are not.
 - Instance update path re-resolves on `auto: true` against the host's current `HostInband` segments. No-op if nothing changed, but a real update if the operator added/removed segments since allocation.
-  `with_inband_interfaces_from_machine` was dead code so its going away.

Tests added/updated!

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
chet added a commit to chet/bare-metal-manager-core that referenced this pull request May 14, 2026
…tion

This is the second part of NVIDIA#1533, wiring up the auto field that was plumbed through in NVIDIA#1576.

When a tenant sets `auto: true`, NICo now resolves the instance's interfaces from the host's `HostInband` segments and stores the resolved config internally, instead of magically just doing things. A tenant can set `auto:true` without `interfaces`, or leave `auto:false` and set `interfaces`, but it cannot do both.

On the wire, `auto` callers will still see `{ auto: true, interfaces: [] }` as their "stored" config, just like they originally sent (via a new `::into_external_view()` helper, even though internally we are storing them; the resolved details will surface in `instance.status.network.interfaces`, just like they do today for hosts with DPUs.

Additional tweaks include:
 - `add_inband_interfaces_to_config` is gated on `network_config.auto`, and no longer auto-fills silently.
 - Instance allocate gates `auto`-ness on host class: zero DPU hosts are expected to be `auto`, DPU hosts are not.
 - Instance update path re-resolves on `auto: true` against the host's current `HostInband` segments. No-op if nothing changed, but a real update if the operator added/removed segments since allocation.
-  `with_inband_interfaces_from_machine` was dead code so its going away.

Tests added/updated!

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
chet added a commit to chet/bare-metal-manager-core that referenced this pull request May 14, 2026
…tion

This is the second part of NVIDIA#1533, wiring up the auto field that was plumbed through in NVIDIA#1576.

When a tenant sets `auto: true`, NICo now resolves the instance's interfaces from the host's `HostInband` segments and stores the resolved config internally, instead of magically just doing things. A tenant can set `auto:true` without `interfaces`, or leave `auto:false` and set `interfaces`, but it cannot do both.

On the wire, `auto` callers will still see `{ auto: true, interfaces: [] }` as their "stored" config, just like they originally sent (via a new `::into_external_view()` helper, even though internally we are storing them; the resolved details will surface in `instance.status.network.interfaces`, just like they do today for hosts with DPUs.

Additional tweaks include:
 - `add_inband_interfaces_to_config` is gated on `network_config.auto`, and no longer auto-fills silently.
 - Instance allocate gates `auto`-ness on host class: zero DPU hosts are expected to be `auto`, DPU hosts are not.
 - Instance update path re-resolves on `auto: true` against the host's current `HostInband` segments. No-op if nothing changed, but a real update if the operator added/removed segments since allocation.
-  `with_inband_interfaces_from_machine` was dead code so its going away.

Tests added/updated!

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
chet added a commit that referenced this pull request May 15, 2026
…tion (#1674)

This is the second part of #1533, wiring up the auto field that was
plumbed through in #1576.

When a tenant sets `auto: true`, NICo now resolves the instance's
interfaces from the host's `HostInband` segments and stores the resolved
config internally, instead of magically just doing things. A tenant can
set `auto:true` without `interfaces`, or leave `auto:false` and set
`interfaces`, but it cannot do both.

On the wire, `auto` callers will still see `{ auto: true, interfaces: []
}` as their "stored" config, just like they originally sent (via a new
`::into_external_view()` helper, even though internally we are storing
them; the resolved details will surface in
`instance.status.network.interfaces`, just like they do today for hosts
with DPUs.

Additional tweaks include:
- `add_inband_interfaces_to_config` is gated on `network_config.auto`,
and no longer auto-fills silently.
- Instance allocate gates `auto`-ness on host class: zero DPU hosts are
expected to be `auto`, DPU hosts are not.
- Instance update path re-resolves on `auto: true` against the host's
current `HostInband` segments. No-op if nothing changed, but a real
update if the operator added/removed segments since allocation.
- `with_inband_interfaces_from_machine` was dead code so its going away.

I've got some `machine-a-tron` stuff I'll do after this PR.

Tests added/updated!

Signed-off-by: Chet Nichols III <chetn@nvidia.com>

## Description
<!-- Describe what this PR does -->

## Type of Change
<!-- Check one that best describes this PR -->
- [x] **Add** - New feature or capability
- [x] **Change** - Changes in existing functionality  
- [ ] **Fix** - Bug fixes
- [ ] **Remove** - Removed features or deprecated functionality
- [ ] **Internal** - Internal changes (refactoring, tests, docs, etc.)

## Related Issues (Optional)
<!-- If applicable, provide GitHub Issue. -->

## Breaking Changes
- [ ] This PR contains breaking changes

<!-- If checked above, describe the breaking changes and migration steps
-->

## Testing
<!-- How was this tested? Check all that apply -->
- [x] Unit tests added/updated
- [x] Integration tests added/updated  
- [ ] Manual testing performed
- [ ] No testing required (docs, internal refactor, etc.)

## Additional Notes
<!-- Any additional context, deployment notes, or reviewer guidance -->



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added automatic network interface resolution for instances on zero-DPU
hosts using `auto: true` configuration.

* **Bug Fixes**
* Enforced immutability of automatic network configuration flag for
existing instances.
* Stricter validation: `auto` mode now only valid on zero-DPU hosts and
rejects explicit interface specifications.

* **Documentation**
* Clarified network configuration behavior and interface mappings for
automatic and manual modes.

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/infra-controller-core/pull/1674)

<!-- review_stack_entry_end -->

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
chet added a commit to chet/bare-metal-manager-rest that referenced this pull request May 23, 2026
This closes NVIDIA#351, and is the REST-side counterpart to the Flat VPC and `auto` interface work that landed in Core (NVIDIA/infra-controller#1775, NVIDIA/infra-controller#1576, and NVIDIA/infra-controller#1674), which together let tenants put instances on zero-DPU hosts (or hosts with their DPU in NIC mode).

The discussion on NVIDIA#351 was originally about whether to model this as `vpcId: null` or a special "unmanaged VPC" type. What landed in Core is closer to the latter -- Flat VPCs are real VPCs with VNIs, NSGs, and peering, but NICo doesn't drive their data plane.

How it's done:
- Brought the two new fields into `nico_nico.proto`.
- `Flat` is a new value in the `network_virtualization_type` enum across the API, DB, OpenAPI, and proto layers.
- `auto` is a new boolean on instance create / update that flips the interface configuration model from "explicit list" to "NICo resolves from HostInband segments." Mutually exclusive with `interfaces`, persisted on the instance row so partial updates can re-issue the signal without the caller re-supplying it.
- Defense-in-depth check at the REST layer that `auto: true` requires a Flat VPC. Core enforces the same rule, but we fail-fast here too to avoid round-tripping the site for an obviously bad request.

Tests added!

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
chet added a commit to chet/bare-metal-manager-rest that referenced this pull request May 23, 2026
This closes NVIDIA#351, and is the REST-side counterpart to the Flat VPC and `auto` interface work that landed in Core (NVIDIA/infra-controller#1775, NVIDIA/infra-controller#1576, and NVIDIA/infra-controller#1674), which together let tenants put instances on zero-DPU hosts (or hosts with their DPU in NIC mode).

The discussion on NVIDIA#351 was originally about whether to model this as `vpcId: null` or a special "unmanaged VPC" type. What landed in Core is closer to the latter -- Flat VPCs are real VPCs with VNIs, NSGs, and peering, but NICo doesn't drive their data plane.

How it's done:
- Brought the two new fields into `nico_nico.proto`.
- `Flat` is a new value in the `network_virtualization_type` enum across the API, DB, OpenAPI, and proto layers.
- `auto` is a new boolean on instance create / update that flips the interface configuration model from "explicit list" to "NICo resolves from HostInband segments." Mutually exclusive with `interfaces`, persisted on the instance row so partial updates can re-issue the signal without the caller re-supplying it.
- Defense-in-depth check at the REST layer that `auto: true` requires a Flat VPC. Core enforces the same rule, but we fail-fast here too to avoid round-tripping the site for an obviously bad request.

This also adds some "VPC capability-like" mappings (similar to what we have in Core) to drive logic decisions at the REST API layer.

Tests added!

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
chet added a commit to chet/bare-metal-manager-rest that referenced this pull request May 23, 2026
This closes NVIDIA#351, and is the REST-side counterpart to the Flat VPC and `auto` interface work that landed in Core (NVIDIA/infra-controller#1775, NVIDIA/infra-controller#1576, and NVIDIA/infra-controller#1674), which together let tenants put instances on zero-DPU hosts (or hosts with their DPU in NIC mode).

The discussion on NVIDIA#351 was originally about whether to model this as `vpcId: null` or a special "unmanaged VPC" type. What landed in Core is closer to the latter -- Flat VPCs are real VPCs with VNIs, NSGs, and peering, but NICo doesn't drive their data plane.

How it's done:
- Brought the two new fields into `nico_nico.proto`.
- `Flat` is a new value in the `network_virtualization_type` enum across the API, DB, OpenAPI, and proto layers.
- `auto` is a new boolean on instance create / update that flips the interface configuration model from "explicit list" to "NICo resolves from HostInband segments." Mutually exclusive with `interfaces`, persisted on the instance row so partial updates can re-issue the signal without the caller re-supplying it.
- Defense-in-depth check at the REST layer that `auto: true` requires a Flat VPC. Core enforces the same rule, but we fail-fast here too to avoid round-tripping the site for an obviously bad request.

This also adds some "VPC capability-like" mappings (similar to what we have in Core) to drive logic decisions at the REST API layer.

Tests added!

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
chet added a commit to chet/bare-metal-manager-rest that referenced this pull request May 23, 2026
This closes NVIDIA#351, and is the REST-side counterpart to the Flat VPC and `auto` interface work that landed in Core (NVIDIA/infra-controller#1775, NVIDIA/infra-controller#1576, and NVIDIA/infra-controller#1674), which together let tenants put instances on zero-DPU hosts (or hosts with their DPU in NIC mode).

The discussion on NVIDIA#351 was originally about whether to model this as `vpcId: null` or a special "unmanaged VPC" type. What landed in Core is closer to the latter -- Flat VPCs are real VPCs with VNIs, NSGs, and peering, but NICo doesn't drive their data plane.

How it's done:
- Brought the two new fields into `nico_nico.proto`.
- `Flat` is a new value in the `network_virtualization_type` enum across the API, DB, OpenAPI, and proto layers.
- `auto` is a new boolean on instance create / update that flips the interface configuration model from "explicit list" to "NICo resolves from HostInband segments." Mutually exclusive with `interfaces`, persisted on the instance row so partial updates can re-issue the signal without the caller re-supplying it.
- Defense-in-depth check at the REST layer that `auto: true` requires a Flat VPC. Core enforces the same rule, but we fail-fast here too to avoid round-tripping the site for an obviously bad request.

This also adds some "VPC capability-like" mappings (similar to what we have in Core) to drive logic decisions at the REST API layer.

Tests added!

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
chet added a commit to chet/bare-metal-manager-rest that referenced this pull request May 23, 2026
This closes NVIDIA#351, and is the REST-side counterpart to the Flat VPC and `auto` interface work that landed in Core (NVIDIA/infra-controller#1775, NVIDIA/infra-controller#1576, and NVIDIA/infra-controller#1674), which together let tenants put instances on zero-DPU hosts (or hosts with their DPU in NIC mode).

The discussion on NVIDIA#351 was originally about whether to model this as `vpcId: null` or a special "unmanaged VPC" type. What landed in Core is closer to the latter -- Flat VPCs are real VPCs with VNIs, NSGs, and peering, but NICo doesn't drive their data plane.

How it's done:
- Brought the two new fields into `nico_nico.proto`.
- `Flat` is a new value in the `network_virtualization_type` enum across the API, DB, OpenAPI, and proto layers.
- `auto` is a new boolean on instance create / update that flips the interface configuration model from "explicit list" to "NICo resolves from HostInband segments." Mutually exclusive with `interfaces`, persisted on the instance row so partial updates can re-issue the signal without the caller re-supplying it.
- Defense-in-depth check at the REST layer that `auto: true` requires a Flat VPC. Core enforces the same rule, but we fail-fast here too to avoid round-tripping the site for an obviously bad request.

This also adds some "VPC capability-like" mappings (similar to what we have in Core) to drive logic decisions at the REST API layer.

Tests added!

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
chet added a commit to chet/bare-metal-manager-rest that referenced this pull request May 23, 2026
This closes NVIDIA#351, and is the REST-side counterpart to the Flat VPC and `auto` interface work that landed in Core (NVIDIA/infra-controller#1775, NVIDIA/infra-controller#1576, and NVIDIA/infra-controller#1674), which together let tenants put instances on zero-DPU hosts (or hosts with their DPU in NIC mode).

The discussion on NVIDIA#351 was originally about whether to model this as `vpcId: null` or a special "unmanaged VPC" type. What landed in Core is closer to the latter -- Flat VPCs are real VPCs with VNIs, NSGs, and peering, but NICo doesn't drive their data plane.

How it's done:
- Brought the two new fields into `nico_nico.proto`.
- `Flat` is a new value in the `network_virtualization_type` enum across the API, DB, OpenAPI, and proto layers.
- `auto` is a new boolean on instance create / update that flips the interface configuration model from "explicit list" to "NICo resolves from HostInband segments." Mutually exclusive with `interfaces`, persisted on the instance row so partial updates can re-issue the signal without the caller re-supplying it.
- Defense-in-depth check at the REST layer that `auto: true` requires a Flat VPC. Core enforces the same rule, but we fail-fast here too to avoid round-tripping the site for an obviously bad request.

This also adds some "VPC capability-like" mappings (similar to what we have in Core) to drive logic decisions at the REST API layer.

Tests added!

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants