Skip to content

fix(api): support vault -> expected credential fallback for scraping Chassis#842

Merged
chet merged 1 commit into
NVIDIA:mainfrom
chet:fix_power_shelf_initial_exploration
Apr 7, 2026
Merged

fix(api): support vault -> expected credential fallback for scraping Chassis#842
chet merged 1 commit into
NVIDIA:mainfrom
chet:fix_power_shelf_initial_exploration

Conversation

@chet
Copy link
Copy Markdown
Contributor

@chet chet commented Apr 7, 2026

Description

Tried a power shelf ingestion, and ran into an interesting situation!

Right now the general flow for all components is:

  • Scrape the Redfish root to detect vendor information (probe_redfish_endpoint(bmc_ip_address)).
  • Do a match vendor to determine how to update/rotate the password, and store in Vault.

Power shelves (LITE-ON, specifically) don't expose any usable vendor details in the Redfish service root (which we leverage for all OTHER components), so we need to make a subsequent "fallback" call to get Chassis details to parse the vendor. There's some fallback code in place for this already. Nice.

HOWEVER, the "fallback" code (which makes a probe_vendor_name_from_chassis(...) call, is authenticated, so it queries Vault for the component credentials. The idea behind this is, for OTHER components, they always give us vendor details in the serice root, so if the call fails, it's because we've set non-default credentials on them (in Vault), and need to now make an authenticated call.

The PROBLEM is, for power shelves, we haven't set credentials yet! This is the first run.

So, I've tried to make a tweak that is as generic as possible (and left code comments).

The idea is:

Try to get credentials from Vault, and if we don't find any, use the expected credentials.

..and then the flow will continue, thus allowing power shelves to then set_sitewide_bmc_root_password(...) as expected (which is what the default/expected BMC creds are used for). This keeps the default fallback behavior we've had, while allowing for a subsequent "fallback" to just use expected credentials if needed.

Confirmed that I can indeed update the power shelf BMC passwords, so I'm getting rid of the skip_password_change variable while I'm in here. That was the only thing using it.

Signed-off-by: Chet Nichols III chetn@nvidia.com

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

…Chassis

Tried a power shelf ingestion, and ran into an interesting situation!

Right now the general flow for all components is:
- Scrape the Redfish root to detect vendor information (`probe_redfish_endpoint(bmc_ip_address)`).
- Do a `match vendor` to determine how to update/rotate the password, and store in Vault.

Power shelves (LITE-ON, specifically) don't expose any usable vendor details in the Redfish service root (which we leverage for all OTHER components), so we need to make a subsequent "fallback" call to get `Chassis` details to parse the vendor. There's some fallback code in place for this already. Nice.

HOWEVER, the "fallback" code (which makes a `probe_vendor_name_from_chassis(...)` call, is *authenticated*, so it queries Vault for the component credentials. The idea behind this is, for OTHER components, they always give us vendor details in the serice root, so if the call fails, it's because we've set non-default credentials on them (in Vault), and need to now make an authenticated call.

The PROBLEM is, for power shelves, we haven't set credentials yet! This is the first run.

So, I've tried to make a tweak that is as generic as possible (and left code comments).

The idea is:

> Try to get credentials from Vault, and if we don't find any, use the expected credentials.

..and then the flow will continue, thus allowing power shelves to then `set_sitewide_bmc_root_password(...)` as expected (which is what the default/expected BMC creds are used for). This keeps the default fallback behavior we've had, while allowing for a subsequent "fallback" to just use expected credentials if needed.

Confirmed that I can indeed update the power shelf BMC passwords.

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
@chet chet requested a review from a team as a code owner April 7, 2026 21:20
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 7, 2026

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-04-07 21:23:24 UTC | Commit: bf6707f

@chet chet merged commit 9e791f4 into NVIDIA:main Apr 7, 2026
44 checks passed
chet added a commit to chet/bare-metal-manager-core that referenced this pull request Apr 8, 2026
I recently did NVIDIA#842 to fix an issue with expected power shelf ingestion, and wanted to put together an integration test for it. While I was doing that, I realized it might be nice to have equivalent tests for switches and power shelves, and figured I'd do it as a separate PR.

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
chet added a commit to chet/bare-metal-manager-core that referenced this pull request Apr 8, 2026
I recently did NVIDIA#842 to fix an issue with expected power shelf ingestion, and wanted to put together an integration test for it. While I was doing that, I realized it might be nice to have equivalent tests for switches and power shelves, and figured I'd do it as a separate PR.

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
chet added a commit that referenced this pull request Apr 8, 2026
## Description

It turns out that Lite-On power shelves support `/Chassis/chassis` AND
`/Chassis/powershelf`. In older firmware, only `/Chassis/chassis` is
what is exposed in the Chassis Collection registry, meaning the code we
have (which checks for `"powershelf"` in the registry, fails.

I'm updating "*is power shelf*" checks to [continue to] look for
`"powershelf"`, and if that's not found, then to look for `"chassis"`
where the manufacturer contains `"lite-on`". The real fixes are:
- Making sure the power shelf vendors give us vendor information in the
service root.
- Enumerating the `/Chassis/powershelf` in the Chassis Collection (which
is already fixed in newer FW).

Confirmed on actual hardware, and also added some unit tests.

On the plus side, our credentials fallback logic from
#842 is working.
This is just the next bit (we did the fall back, collected the vendor
details, and the vendor details failed, because we were looking for
`"powershelf"`).

Signed-off-by: Chet Nichols III <chetn@nvidia.com>

## Type of Change
<!-- Check one that best describes this PR -->
- [ ] **Add** - New feature or capability
- [ ] **Change** - Changes in existing functionality  
- [ ] **Fix** - Bug fixes
- [ ] **Remove** - Removed features or deprecated functionality
- [ ] **Internal** - Internal changes (refactoring, tests, docs, etc.)

## Related Issues (Optional)
<!-- If applicable, provide GitHub Issue. -->

## Breaking Changes
- [ ] This PR contains breaking changes

<!-- If checked above, describe the breaking changes and migration steps
-->

## Testing
<!-- How was this tested? Check all that apply -->
- [ ] Unit tests added/updated
- [ ] Integration tests added/updated  
- [ ] Manual testing performed
- [ ] No testing required (docs, internal refactor, etc.)

## Additional Notes
<!-- Any additional context, deployment notes, or reviewer guidance -->

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
chet added a commit to chet/bare-metal-manager-core that referenced this pull request May 27, 2026
…overy

These land as two new subtests of `test_integration`, running alongside the existing machine tests against the shared `carbide-api` + `site-explorer`. Each one registers an expected entity, simulates the BMC showing up via DHCP, stands up a mock BMC at the assigned IP, then waits for `site-explorer` to link it to a real managed `PowerShelf` / `Switch` (and confirms we can fetch it back by ID). The power shelf test includes exercising the work from NVIDIA#842, ensuring "*service root vendor not populated*" happens (and is logged) right before it falls back and links successfully.

Notable changes in here:
- Flipped on `create_power_shelves` / `create_switches` in the test `site-explorer` config -- they default off, so the explorer wasn't creating either of them in tests (took me a sec to be like uhhh...).
- New `test_support::host_bmc_router` in `bmc-mock` so a test can hand a mock BMC router to the shared registry, without making the internal `NoopCallbacks` public.
- New `power_shelf`, `switch`, and `dhcp` helpers in `api-test-helper`, which are built on the existing `grpcurl` helper just like `tenant`/`vpc`/`subnet`

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants