fix(api): support vault -> expected credential fallback for scraping Chassis#842
Merged
Merged
Conversation
…Chassis Tried a power shelf ingestion, and ran into an interesting situation! Right now the general flow for all components is: - Scrape the Redfish root to detect vendor information (`probe_redfish_endpoint(bmc_ip_address)`). - Do a `match vendor` to determine how to update/rotate the password, and store in Vault. Power shelves (LITE-ON, specifically) don't expose any usable vendor details in the Redfish service root (which we leverage for all OTHER components), so we need to make a subsequent "fallback" call to get `Chassis` details to parse the vendor. There's some fallback code in place for this already. Nice. HOWEVER, the "fallback" code (which makes a `probe_vendor_name_from_chassis(...)` call, is *authenticated*, so it queries Vault for the component credentials. The idea behind this is, for OTHER components, they always give us vendor details in the serice root, so if the call fails, it's because we've set non-default credentials on them (in Vault), and need to now make an authenticated call. The PROBLEM is, for power shelves, we haven't set credentials yet! This is the first run. So, I've tried to make a tweak that is as generic as possible (and left code comments). The idea is: > Try to get credentials from Vault, and if we don't find any, use the expected credentials. ..and then the flow will continue, thus allowing power shelves to then `set_sitewide_bmc_root_password(...)` as expected (which is what the default/expected BMC creds are used for). This keeps the default fallback behavior we've had, while allowing for a subsequent "fallback" to just use expected credentials if needed. Confirmed that I can indeed update the power shelf BMC passwords. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
🔐 TruffleHog Secret Scan✅ No secrets or credentials found! Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉 🕐 Last updated: 2026-04-07 21:23:24 UTC | Commit: bf6707f |
spydaNVIDIA
approved these changes
Apr 7, 2026
chet
added a commit
to chet/bare-metal-manager-core
that referenced
this pull request
Apr 8, 2026
I recently did NVIDIA#842 to fix an issue with expected power shelf ingestion, and wanted to put together an integration test for it. While I was doing that, I realized it might be nice to have equivalent tests for switches and power shelves, and figured I'd do it as a separate PR. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
10 tasks
chet
added a commit
to chet/bare-metal-manager-core
that referenced
this pull request
Apr 8, 2026
I recently did NVIDIA#842 to fix an issue with expected power shelf ingestion, and wanted to put together an integration test for it. While I was doing that, I realized it might be nice to have equivalent tests for switches and power shelves, and figured I'd do it as a separate PR. Signed-off-by: Chet Nichols III <chetn@nvidia.com>
10 tasks
chet
added a commit
that referenced
this pull request
Apr 8, 2026
## Description It turns out that Lite-On power shelves support `/Chassis/chassis` AND `/Chassis/powershelf`. In older firmware, only `/Chassis/chassis` is what is exposed in the Chassis Collection registry, meaning the code we have (which checks for `"powershelf"` in the registry, fails. I'm updating "*is power shelf*" checks to [continue to] look for `"powershelf"`, and if that's not found, then to look for `"chassis"` where the manufacturer contains `"lite-on`". The real fixes are: - Making sure the power shelf vendors give us vendor information in the service root. - Enumerating the `/Chassis/powershelf` in the Chassis Collection (which is already fixed in newer FW). Confirmed on actual hardware, and also added some unit tests. On the plus side, our credentials fallback logic from #842 is working. This is just the next bit (we did the fall back, collected the vendor details, and the vendor details failed, because we were looking for `"powershelf"`). Signed-off-by: Chet Nichols III <chetn@nvidia.com> ## Type of Change <!-- Check one that best describes this PR --> - [ ] **Add** - New feature or capability - [ ] **Change** - Changes in existing functionality - [ ] **Fix** - Bug fixes - [ ] **Remove** - Removed features or deprecated functionality - [ ] **Internal** - Internal changes (refactoring, tests, docs, etc.) ## Related Issues (Optional) <!-- If applicable, provide GitHub Issue. --> ## Breaking Changes - [ ] This PR contains breaking changes <!-- If checked above, describe the breaking changes and migration steps --> ## Testing <!-- How was this tested? Check all that apply --> - [ ] Unit tests added/updated - [ ] Integration tests added/updated - [ ] Manual testing performed - [ ] No testing required (docs, internal refactor, etc.) ## Additional Notes <!-- Any additional context, deployment notes, or reviewer guidance --> Signed-off-by: Chet Nichols III <chetn@nvidia.com>
chet
added a commit
to chet/bare-metal-manager-core
that referenced
this pull request
May 27, 2026
…overy These land as two new subtests of `test_integration`, running alongside the existing machine tests against the shared `carbide-api` + `site-explorer`. Each one registers an expected entity, simulates the BMC showing up via DHCP, stands up a mock BMC at the assigned IP, then waits for `site-explorer` to link it to a real managed `PowerShelf` / `Switch` (and confirms we can fetch it back by ID). The power shelf test includes exercising the work from NVIDIA#842, ensuring "*service root vendor not populated*" happens (and is logged) right before it falls back and links successfully. Notable changes in here: - Flipped on `create_power_shelves` / `create_switches` in the test `site-explorer` config -- they default off, so the explorer wasn't creating either of them in tests (took me a sec to be like uhhh...). - New `test_support::host_bmc_router` in `bmc-mock` so a test can hand a mock BMC router to the shared registry, without making the internal `NoopCallbacks` public. - New `power_shelf`, `switch`, and `dhcp` helpers in `api-test-helper`, which are built on the existing `grpcurl` helper just like `tenant`/`vpc`/`subnet` Signed-off-by: Chet Nichols III <chetn@nvidia.com>
10 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Tried a power shelf ingestion, and ran into an interesting situation!
Right now the general flow for all components is:
probe_redfish_endpoint(bmc_ip_address)).match vendorto determine how to update/rotate the password, and store in Vault.Power shelves (LITE-ON, specifically) don't expose any usable vendor details in the Redfish service root (which we leverage for all OTHER components), so we need to make a subsequent "fallback" call to get
Chassisdetails to parse the vendor. There's some fallback code in place for this already. Nice.HOWEVER, the "fallback" code (which makes a
probe_vendor_name_from_chassis(...)call, is authenticated, so it queries Vault for the component credentials. The idea behind this is, for OTHER components, they always give us vendor details in the serice root, so if the call fails, it's because we've set non-default credentials on them (in Vault), and need to now make an authenticated call.The PROBLEM is, for power shelves, we haven't set credentials yet! This is the first run.
So, I've tried to make a tweak that is as generic as possible (and left code comments).
The idea is:
..and then the flow will continue, thus allowing power shelves to then
set_sitewide_bmc_root_password(...)as expected (which is what the default/expected BMC creds are used for). This keeps the default fallback behavior we've had, while allowing for a subsequent "fallback" to just use expected credentials if needed.Confirmed that I can indeed update the power shelf BMC passwords, so I'm getting rid of the
skip_password_changevariable while I'm in here. That was the only thing using it.Signed-off-by: Chet Nichols III chetn@nvidia.com
Type of Change
Related Issues (Optional)
Breaking Changes
Testing
Additional Notes