Skip to content

fix: health should remove prometheus metrics after collectors removal#1302

Merged
yoks merged 3 commits into
NVIDIA:mainfrom
yoks:health-prometheus-cleanup
May 1, 2026
Merged

fix: health should remove prometheus metrics after collectors removal#1302
yoks merged 3 commits into
NVIDIA:mainfrom
yoks:health-prometheus-cleanup

Conversation

@yoks
Copy link
Copy Markdown
Contributor

@yoks yoks commented Apr 30, 2026

Description

After refactoring of Hardware Health service, Prometheus metrics was moved into seprate Sink, this introduced regression when stale metrics was not removed (after Collector stopped). This PR fixes this regression by adding new message which tells all Sinks what collector is removed and they can handle it. Right now only Prometheus collector react to this message

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

#989

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

Signed-off-by: ianisimov <ianisimov@nvidia.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

@yoks yoks merged commit ffa04a7 into NVIDIA:main May 1, 2026
44 checks passed
rpowers-nv pushed a commit to rpowers-nv/ncx-infra-controller-core that referenced this pull request May 5, 2026
…NVIDIA#1302)

## Description
After refactoring of Hardware Health service, Prometheus metrics was
moved into seprate Sink, this introduced regression when stale metrics
was not removed (after Collector stopped). This PR fixes this regression
by adding new message which tells all Sinks what collector is removed
and they can handle it. Right now only Prometheus collector react to
this message

## Type of Change
<!-- Check one that best describes this PR -->
- [ ] **Add** - New feature or capability
- [ ] **Change** - Changes in existing functionality
- [x] **Fix** - Bug fixes
- [ ] **Remove** - Removed features or deprecated functionality
- [ ] **Internal** - Internal changes (refactoring, tests, docs, etc.)

## Related Issues (Optional)
NVIDIA#989

## Breaking Changes
- [ ] This PR contains breaking changes

<!-- If checked above, describe the breaking changes and migration steps
-->

## Testing
<!-- How was this tested? Check all that apply -->
- [x] Unit tests added/updated
- [ ] Integration tests added/updated
- [ ] Manual testing performed
- [ ] No testing required (docs, internal refactor, etc.)

## Additional Notes
<!-- Any additional context, deployment notes, or reviewer guidance -->

---------

Signed-off-by: ianisimov <ianisimov@nvidia.com>
Signed-off-by: rpowers <rpowers@nvidia.com>
mkoci added a commit to mkoci/infra-controller that referenced this pull request May 28, 2026
mkoci added a commit to mkoci/infra-controller that referenced this pull request May 28, 2026
mkoci added a commit to mkoci/infra-controller that referenced this pull request May 28, 2026
mkoci added a commit to mkoci/infra-controller that referenced this pull request May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants