Skip to content

Conversation

@samoz83
Copy link
Contributor

@samoz83 samoz83 commented Jan 15, 2026

Fixes #455

Description

Some hwmon devices can report the exact same name.

Previously, the collector relied solely on this name for labelling. In my case, this caused Prometheus to treat metrics from different sockets as duplicates, resulting in silent data loss (one socket overwriting the other).

The Fix:
This PR adds a unique index label (derived from the directory number, e.g., 6 or 7) to the generated metrics. This ensures every socket produces a unique time series, even if they share the same driver name.

How Has This Been Tested?

Tested on a dual-socket AMD EPYC server running Rocky Linux 9.

Before:
Output showed only one metric series for amd_hsmp_hwmon despite two sockets being present.

ceems_hwmon_power_current_watts{chip="platform_amd_hsmp",chip_name="amd_hsmp_hwmon",hostname="node02",sensor="power1"} 138.312

After:
Output now correctly shows two distinct series with unique indices:

ceems_hwmon_power_current_watts{chip="platform_amd_hsmp",chip_name="amd_hsmp_hwmon",hostname="node02",index="6",sensor="power1"} 139.646
ceems_hwmon_power_current_watts{chip="platform_amd_hsmp",chip_name="amd_hsmp_hwmon",hostname="node02",index="7",sensor="power1"} 109.366

On dual socket systems multiple hwmon devices can share the same chip_name. This adds an index label derived from the directory name to ensure uniqueness.
@samoz83 samoz83 closed this Jan 15, 2026
@samoz83 samoz83 reopened this Jan 15, 2026
@mahendrapaipuri
Copy link
Collaborator

Hello @samoz83

Thanks a lot for your time and effort in putting up this PR. Really appreciate it!!

Is it possible for you to share the files inside /sys/devices/platform/amd_hsmp folder on your server? I can add them to mock resources to test the behaviour in unit and e2e tests.

Cheers!!

@samoz83
Copy link
Contributor Author

samoz83 commented Jan 16, 2026

Hi @mahendrapaipuri

I did try to ttar the directory, but it had issues. Hopefully this tar has what you need, please say if you require me to do anything else.

amd_hsmp.tar.gz

Thanks

Signed-off-by: Mahendra Paipuri <mahendra.paipuri@gmail.com>
@mahendrapaipuri
Copy link
Collaborator

Hello @samoz83

Thanks a lot for files.

I pushed a commit on your branch that updates the e2e test fixtures. It is strange that commit exists on your branch but is not showing up on the PR. Maybe you can look into it, please? If not, close this PR and open a new one.

Thanks in advance!! And thanks again for your time and effort!

@samoz83
Copy link
Contributor Author

samoz83 commented Jan 19, 2026

Not sure what happened there, but I've managed to push your changes, hopefully should be okay now.

@mahendrapaipuri
Copy link
Collaborator

Awesome!! Cheers @samoz83

@mahendrapaipuri mahendrapaipuri merged commit 574a6f4 into ceems-dev:main Jan 19, 2026
16 of 18 checks passed
@mahendrapaipuri
Copy link
Collaborator

@samoz83 I will make a release with this patch by the end of the day! Thanks again!

@samoz83 samoz83 deleted the fix/hwmon-indexing branch January 19, 2026 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] hwmon label collision causes data loss on multi socket systems when name is the same

2 participants