Skip to content

bug(health): NVUE Rest -> sdn partition fails to parse num-gpus #1961

@mkoci

Description

@mkoci

Version

main

Describe the bug.

The Switch host nvue.rest collector improperly deserializes num-gpus as Option<u32>, when the response payload is a JSON string.

Expected behavior: the NVUE REST collector accepts both numeric and string numeric num-gpus values and records the partition data
normally.

Actual behavior: SDN partition parsing fails for valid NVOS responses where num-gpus is string-encoded.

Minimum reproducible example

Configure Health to monitor an NVLink Switch and enable `nvue.rest` in the health config.

Response payload like this will not be properly deserialized:

  # Minimal payload that currently fails without the fix:
  {
    "name": "Default Partition",
    "num-gpus": "8",
    "health": "unhealthy",
    "resiliency-mode": "adaptive_bandwidth",
    "mcast-limit": 1024,
    "partition-type": "gpuuid_based"
  }

Relevant log output

`invalid type: string "8", expected u32`

Other/Misc.

No response

Code of Conduct

  • I agree to follow NCX Infra Controller's Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report

Metadata

Metadata

Assignees

Labels

bugA defect in existing software (deprecated - use issue type, but it's needed for reporting now)

Type

No fields configured for Bug.

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions