Skip to content

[TEEP-4505] [datadog_checks_base] Add use_per_instance_collection option for PDH counters#22298

Open
ArmentaRoberto wants to merge 2 commits intomasterfrom
fix/pdh-per-instance-collection
Open

[TEEP-4505] [datadog_checks_base] Add use_per_instance_collection option for PDH counters#22298
ArmentaRoberto wants to merge 2 commits intomasterfrom
fix/pdh-per-instance-collection

Conversation

@ArmentaRoberto
Copy link

@ArmentaRoberto ArmentaRoberto commented Jan 11, 2026

Some Windows PDH counters return incorrect values (zeros) when using the bulk PdhGetFormattedCounterArrayW API, but work correctly with per-instance GetFormattedCounterValue calls.

This adds a new object-level configuration option that enumerates instances via EnumObjectItems and creates individual counter handles, then collects each using the per-instance API.

Known affected counters:

  • SQLServer:Workload Group Stats\CPU usage %
  • MSSQL$:Workload Group Stats\CPU usage %

What does this PR do?

Adds a new use_per_instance_collection configuration option for Windows PDH performance counter objects. When enabled, instead of using wildcard paths with bulk PdhGetFormattedCounterArrayW retrieval, instances are enumerated via EnumObjectItems and collected individually using GetFormattedCounterValue.

This provides a workaround for counters where the bulk API returns incorrect values (typically zeros) while per-instance retrieval works correctly.

Motivation

Certain Windows PDH counters, notably SQLServer:Workload Group Stats\CPU usage %, return zero values when collected via the bulk PdhGetFormattedCounterArrayW API. This is a known limitation of some PDH counter providers. The per-instance GetFormattedCounterValue API returns correct values for the same counters.

https://datadoghq.atlassian.net/browse/AGENT-15204

Customer impact: SQL Server Resource Governor CPU metrics were reporting 0% despite actual workload activity, making capacity planning and performance monitoring impossible.

How to test the change?

  1. Configure a Windows host with SQL Server and Resource Governor workload groups
  2. Add the following check configuration:
metrics:
  SQLServer:Workload Group Stats:
    name: sqlserver.workload_group_stats
    tag_name: workload_group
    use_per_instance_collection: true
    counters:
      - CPU usage %:
          name: cpu_usage_percent
          type: gauge
  1. Verify metrics are non-zero and match typeperf output

Possible Drawbacks / Trade-offs

  • Performance: Per-instance collection requires N API calls instead of 1. Impact is minimal for typical instance counts (<100).
  • Dynamic instances: New instances appearing after Agent startup won't be detected until the next refresh cycle.

Additional Notes

  • Option defaults to false for backward compatibility
  • Only affects MultiCounter objects; single-instance counters log a warning if option is set
  • Unit tests included

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

Some Windows PDH counters return incorrect values (zeros) when using the
bulk PdhGetFormattedCounterArrayW API, but work correctly with per-instance
GetFormattedCounterValue calls.

This adds a new object-level configuration option that enumerates instances
via EnumObjectItems and creates individual counter handles, then collects
each using the per-instance API.

Known affected counters:
- SQLServer:Workload Group Stats\CPU usage %
- MSSQL$<instance>:Workload Group Stats\CPU usage %
@codecov
Copy link

codecov bot commented Jan 11, 2026

Codecov Report

❌ Patch coverage is 53.77358% with 49 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.97%. Comparing base (59c7ab6) to head (3bfabb4).
⚠️ Report is 99 commits behind head on master.

Additional details and impacted files
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ArmentaRoberto
Copy link
Author

CI failures in snmp and mongo are pre-existing flaky tests unrelated to this PR. The datadog_checks_base tests pass.

@ArmentaRoberto ArmentaRoberto marked this pull request as ready for review January 11, 2026 07:45
@ArmentaRoberto ArmentaRoberto requested review from a team as code owners January 11, 2026 07:45
@ArmentaRoberto ArmentaRoberto added the qa/skip-qa Automatically skip this PR for the next QA label Jan 11, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3bfabb471f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ArmentaRoberto ArmentaRoberto changed the title [datadog_checks_base] Add use_per_instance_collection option for PDH counters [TEEP-4505] [datadog_checks_base] Add use_per_instance_collection option for PDH counters Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant