Skip to content

feat(agent): expose Apple Silicon power gauges via powermetrics#334

Merged
Defilan merged 3 commits intomainfrom
feat/apple-power-gauges
Apr 26, 2026
Merged

feat(agent): expose Apple Silicon power gauges via powermetrics#334
Defilan merged 3 commits intomainfrom
feat/apple-power-gauges

Conversation

@Defilan
Copy link
Copy Markdown
Member

@Defilan Defilan commented Apr 26, 2026

Summary

Adds a darwin-only powermetrics sampler that publishes four new Prometheus gauges on the Metal agent's existing AgentRegistry:

  • llmkube_metal_agent_apple_power_combined_watts
  • llmkube_metal_agent_apple_power_gpu_watts
  • llmkube_metal_agent_apple_power_cpu_watts
  • llmkube_metal_agent_apple_power_ane_watts

Designed to be scraped by InferCost's upcoming Metal collector for per-token cost attribution on Apple Silicon, where DCGM (NVIDIA-only) doesn't exist. See InferCost #46 for the cross-repo design.

Why

Today the Apple sample CostProfile in InferCost says: "Apple Silicon has no DCGM exporter; InferCost falls back to TDP-based power estimation. Real-time power metrics will require a Metal-aware exporter (tracked on the roadmap)." This PR is the LLMKube half of closing that gap.

This weekend's M5 Max bench session ran 2.06 tok/s/W of inference and computed $0.015/M tokens marginal cost manually from powermetrics, but InferCost itself couldn't see any of it. After this PR + the matching InferCost PR, the same flow works end-to-end without manual math.

What changed

File Change
pkg/agent/agentmetrics.go Add 4 apple_power_*_watts gauges; register in init()
pkg/agent/power_darwin.go (new) ApplePowerSampler — spawns sudo powermetrics --samplers cpu_power,gpu_power -i <interval>, parses output, updates gauges
pkg/agent/power_other.go (new) No-op stub for non-Darwin builds (CI on Linux still compiles)
pkg/agent/power_darwin_test.go (new) Parser tests against captured fixture, duplicate-GPU-line handling, partial-sample guard, stderr capture
pkg/agent/testdata/powermetrics_sample.txt (new) 7-sample captured fixture from real M5 Max output
pkg/agent/agentmetrics_test.go Register the 4 new gauges in the existing init-coverage test
pkg/agent/agent.go Add ApplePowerEnabled/ApplePowerInterval/PowermetricsBin to MetalAgentConfig; conditionally launch sampler goroutine in Start()
cmd/metal-agent/main.go Add --apple-power-enabled, --apple-power-interval, --powermetrics-bin flags
deployment/macos/sudoers.d/llmkube-powermetrics (new) Sample NOPASSWD sudoers fragment
deployment/macos/README.md Document the enable steps and the 4 new gauges

Privilege handling

powermetrics requires root, but the agent runs as a regular user under launchd. Three options were considered: (1) NOPASSWD sudoers entry for /usr/bin/powermetrics only, (2) run the entire metal-agent as root, (3) external launchd job + tail a shared file.

Picked (1) because it's the smallest privilege grant — powermetrics-only, not full root for the agent — and is well-understood by macOS admins. The flag defaults to disabled so users opt into the sudoers requirement explicitly. Without --apple-power-enabled, behavior is unchanged from today.

Parser quirk handled

powermetrics emits two GPU Power: <n> mW lines per sample — one in the cpu_power section, one in the gpu_power section. Only the first occurrence per sample is published, so the gauge value matches what was actually summed into Combined Power. Test TestParsePowermetricsStream_DuplicateGPULineIgnored guards this.

A second guard (TestParsePowermetricsStream_PartialSampleNotEmitted) ensures samples missing their Combined Power line never emit, so InferCost's per-token math doesn't see stale-zero gauges if powermetrics is killed mid-write.

Test plan

  • go test ./... — all packages green
  • golangci-lint run ./pkg/agent/... ./cmd/metal-agent/... — only the preexisting health.go G118 finding (unrelated)
  • go vet ./... clean
  • gofmt -w clean
  • Parser tests pass against captured fixture from real M5 Max powermetrics output
  • Manual on M5 Max: install sudoers fragment, restart agent with --apple-power-enabled, curl /metrics | grep apple_power, run inference, watch gauge rise; stop, watch fall to ~1 W idle baseline

Follow-up

Phase 2 (separate PR in defilantech/infercost) will add a metal collector type to InferCost that scrapes these gauges. See plan link above.

Closes #335

Adds a darwin-only powermetrics sampler that publishes four new
Prometheus gauges on the existing AgentRegistry:

  llmkube_metal_agent_apple_power_combined_watts
  llmkube_metal_agent_apple_power_gpu_watts
  llmkube_metal_agent_apple_power_cpu_watts
  llmkube_metal_agent_apple_power_ane_watts

Designed to be scraped by InferCost's upcoming Metal collector for
per-token cost attribution on Apple Silicon, where DCGM (NVIDIA-only)
doesn't exist. See InferCost issue #46 for the cross-repo design.

Disabled by default (--apple-power-enabled=false) because powermetrics
requires root. The agent reaches root via a NOPASSWD sudoers entry the
operator installs explicitly — see deployment/macos/sudoers.d/
llmkube-powermetrics. This is the smallest possible privilege grant
(powermetrics-only, not full root for the agent).

The parser handles a powermetrics quirk: each sample emits two
"GPU Power: <n> mW" lines (one in the cpu_power section, one in the
gpu_power section). Only the first occurrence per sample is published,
so the gauge value matches what was actually summed into Combined Power.

Signed-off-by: Christopher Maher <chris@mahercode.io>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 26, 2026

Codecov Report

❌ Patch coverage is 53.12500% with 15 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
cmd/metal-agent/main.go 0.00% 12 Missing ⚠️
pkg/agent/agent.go 76.92% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Defilan added 2 commits April 26, 2026 01:54
Security audit on PR #334 flagged three must-fix-before-merge items in
the powermetrics privilege model. This commit closes all three:

H-1: Pin the argv in the sudoers fragment.
The original spec — `NOPASSWD: /usr/bin/powermetrics` — granted root
for *any* powermetrics flags. powermetrics ships with --output-file,
which combined with root is a write-anywhere primitive (`sudo
powermetrics --output-file=/etc/passwd ...`). The pinned form now
requires the exact `--samplers cpu_power,gpu_power -i <numeric>`
argv the agent uses and nothing else.

H-2: Reject non-default --powermetrics-bin.
The override flag let an operator point the agent at a binary the
sudoers spec doesn't match. NewApplePowerSampler now refuses to
construct a real sampler when bin diverges from /usr/bin/powermetrics
and returns a no-op instead — the agent keeps running without power
data rather than failing closed.

H-3: Use absolute /usr/bin/sudo.
defaultPowermetricsCommand now invokes /usr/bin/sudo by absolute path
rather than relying on $PATH lookup, so a $PATH attacker can't
substitute a sudo wrapper to capture our (root) NOPASSWD invocation.

Plus regression test that asserts the agent's actual argv satisfies
the shipped sudoers spec — if a future change desyncs them, the
test fails before release rather than the gauges silently going zero
on upgrade.

README install one-liner now uses the safe pattern: render to a
tempfile, `visudo -cf` for syntax validation, then `install -m 0440
-o root -g wheel`. Placeholder renamed from USERNAME to
__LLMKUBE_USER__ so a copy-paste with the placeholder still in place
fails sudo loudly instead of silently granting to a real user named
"USERNAME". Adds an Uninstall section.

Signed-off-by: Christopher Maher <chris@mahercode.io>
The codecov report on PR #334 flagged 22 lines of patch as uncovered.
Most of those are flag declarations in cmd/metal-agent/main.go (which
the rest of main() also doesn't cover — declarative wiring nobody
tests in this repo) and the powermetrics subprocess code in
power_darwin.go (which is darwin-only and exercised by the parser
tests + the new argv-vs-sudoers regression test).

The honest gaps were the conditional sampler launch in agent.Start()
and the no-op stub in power_other.go that ships in the Linux CI
binary. Both are now covered:

- Extract maybeStartApplePowerSampler from Start() following the same
  reportHealthServerExit / runWatcherLoop pattern that was already in
  agent.go. The factory is a package var so tests can swap in a fake
  whose Run() is deterministic.
- Define an applePowerRunner interface so tests don't have to construct
  the darwin-only ApplePowerSampler struct from a Linux test binary.
- Add TestMaybeStartApplePowerSampler_DisabledByDefault and
  TestMaybeStartApplePowerSampler_Enabled_LaunchesViaFactory. The
  enabled-path test asserts both that the helper plumbed
  PowermetricsBin/ApplePowerInterval through to the factory and that
  Run was actually invoked by the goroutine.
- Add power_other_test.go (build-tagged !darwin) so CI on Linux
  exercises the no-op stub. Without it the build-tag gate could
  silently rot.

go test on darwin and GOOS=linux go vet both pass.
maybeStartApplePowerSampler hits 100% in the local cover profile.

Signed-off-by: Christopher Maher <chris@mahercode.io>
@Defilan Defilan merged commit 58a94a7 into main Apr 26, 2026
18 checks passed
@Defilan Defilan deleted the feat/apple-power-gauges branch April 26, 2026 19:20
@github-actions github-actions Bot mentioned this pull request Apr 26, 2026
Defilan added a commit to defilantech/infercost that referenced this pull request Apr 26, 2026
…s Macs (#47)

Closes #46.

## Summary

InferCost's existing power-data path reads NVIDIA GPU power from the
DCGM exporter. macOS has no DCGM equivalent, so every Apple Silicon
CostProfile silently fell back to TDP estimation — fine for
order-of-magnitude numbers, useless for tracking dynamic load.

This adds a parallel **Metal collector** that scrapes the LLMKube Metal
Agent's `apple_power_*_watts` gauges (defilantech/LLMKube#335 / [PR
#334](defilantech/LLMKube#334)). The agent ships
those gauges from a sudo'd `powermetrics` subprocess. The combined CPU +
GPU + ANE watts is the right scope to feed into the existing cost
calculation — exactly mirroring what DCGM reports for an NVIDIA card.

## Architecture

- **`internal/scraper/metal.go`**: `ScrapeApplePower` mirrors
`ScrapeDCGM`. Returns `ApplePowerReading{Combined, GPU, CPU, ANE}` —
single-row return because the gauges are unlabeled (no per-GPU/per-pod
cardinality on a Mac).
- **`internal/controller/costprofile_controller.go`**: `readMetalPower`
follows the same every-path-emits-a-condition contract as
`readDCGMPower`. New `readPower` dispatcher (10 lines) keys off
`MetalEndpoint != \"\" && looksApple(gpuModel)` to pick Metal vs DCGM.
- **New `ConditionMetalReachable`** with reasons `MetalHealthy /
MetalNotConfigured / MetalScrapeError / MetalSamplerOff` — operators on
a Mac don't see a confusing \"DCGM unreachable\" message when DCGM was
never the right tool.
- **The `MetalSamplerOff` path is the important one**: agent reachable
but `apple_power_combined_watts` is zero → operator forgot
`--apple-power-enabled`, the sudoers entry is missing, or they overrode
`--powermetrics-bin` (which the agent's Phase 1 security hardening now
refuses). The condition message names all three.

## Plumbing

- `cmd/main.go`: new `--metal-endpoint` flag, plumbed to both
reconcilers.
- `internal/controller/usagereport_controller.go`: `MetalEndpoint` field
added for symmetry. Currently unused at runtime (UsageReport reads from
profile status + the shared `Sampler`), but keeps a future direct-scrape
variant a one-liner.
- `charts/infercost/values.yaml` + `templates/deployment.yaml`: parallel
`metal:` block next to `dcgm:`.
- `config/samples/costprofiles/apple-m5-max.yaml` (new): real M5 Max
numbers — \$4500 / 3yr amortization, 90W TDP fallback,
`idleWattsThreshold: 12` so the M-series ~6W idle floor doesn't register
as active.
- `config/samples/costprofiles/apple-m2-ultra.yaml`: replace the
\"roadmap\" comment with actual setup steps + a callout warning
operators not to override the agent's `--powermetrics-bin`.

## Tests

- **`internal/scraper/metal_test.go`** (5 cases): happy path, sampler
disabled (zeros), empty body, HTTP 503, only-combined-present. Captured
fixture matches what a real M5 Max emits.
- **`internal/controller/costprofile_metal_test.go`** (8 cases): 4
`readMetalPower` paths × condition assertions + 3 `readPower` dispatcher
cases + `TestLooksApple` table. Dispatcher tests use a sentinel
`should-not-be-called.invalid` endpoint on the wrong path so a routing
regression fails loudly.
- All existing DCGM tests untouched and still passing — the dispatcher
refactor is additive.

## Verification

```
go test ./...                                            # all green
helm template charts/infercost --set metal.endpoint=http://x:9090/metrics
  | grep metal-endpoint
  → - --metal-endpoint=http://x:9090/metrics             # rendered correctly
```

## Test plan

- [x] Unit tests for `ScrapeApplePower` (5 cases)
- [x] Unit tests for `readMetalPower` + `readPower` dispatcher (8 cases)
- [x] Existing `readDCGMPower` tests still pass (regression guard)
- [x] `helm template` emits `--metal-endpoint` arg when set
- [x] `go build ./...` and `go vet ./...` clean
- [ ] **Manual E2E on M5 Max**: install LLMKube Metal Agent v0.7.x with
`--apple-power-enabled`, install sudoers entry, deploy InferCost via
Helm with `metal.endpoint=http://localhost:9090/metrics`, apply
`apple-m5-max.yaml`, verify `kubectl get costprofile apple-m5-max -o
yaml` shows `currentPowerDrawWatts: ~42` and
`MetalReachable=True/MetalHealthy`. Then start inference traffic and
watch `infercost_cost_per_million_tokens_usd` populate.

## Follow-ups (separate PRs)

- Auto-discovery of LLMKube Metal Agents via label selector (drop the
explicit `--metal-endpoint` flag for fleet deployments)
- A `dashboards/apple-silicon.json` Grafana board fed from the new
gauges
- `docs/apple-silicon.md` one-pager linking from the sample comments

---------

Signed-off-by: Christopher Maher <chris@mahercode.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metal agent: expose Apple Silicon power gauges via powermetrics for InferCost

1 participant