feat(agent): expose Apple Silicon power gauges via powermetrics#334
Merged
feat(agent): expose Apple Silicon power gauges via powermetrics#334
Conversation
Adds a darwin-only powermetrics sampler that publishes four new Prometheus gauges on the existing AgentRegistry: llmkube_metal_agent_apple_power_combined_watts llmkube_metal_agent_apple_power_gpu_watts llmkube_metal_agent_apple_power_cpu_watts llmkube_metal_agent_apple_power_ane_watts Designed to be scraped by InferCost's upcoming Metal collector for per-token cost attribution on Apple Silicon, where DCGM (NVIDIA-only) doesn't exist. See InferCost issue #46 for the cross-repo design. Disabled by default (--apple-power-enabled=false) because powermetrics requires root. The agent reaches root via a NOPASSWD sudoers entry the operator installs explicitly — see deployment/macos/sudoers.d/ llmkube-powermetrics. This is the smallest possible privilege grant (powermetrics-only, not full root for the agent). The parser handles a powermetrics quirk: each sample emits two "GPU Power: <n> mW" lines (one in the cpu_power section, one in the gpu_power section). Only the first occurrence per sample is published, so the gauge value matches what was actually summed into Combined Power. Signed-off-by: Christopher Maher <chris@mahercode.io>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
14 tasks
Security audit on PR #334 flagged three must-fix-before-merge items in the powermetrics privilege model. This commit closes all three: H-1: Pin the argv in the sudoers fragment. The original spec — `NOPASSWD: /usr/bin/powermetrics` — granted root for *any* powermetrics flags. powermetrics ships with --output-file, which combined with root is a write-anywhere primitive (`sudo powermetrics --output-file=/etc/passwd ...`). The pinned form now requires the exact `--samplers cpu_power,gpu_power -i <numeric>` argv the agent uses and nothing else. H-2: Reject non-default --powermetrics-bin. The override flag let an operator point the agent at a binary the sudoers spec doesn't match. NewApplePowerSampler now refuses to construct a real sampler when bin diverges from /usr/bin/powermetrics and returns a no-op instead — the agent keeps running without power data rather than failing closed. H-3: Use absolute /usr/bin/sudo. defaultPowermetricsCommand now invokes /usr/bin/sudo by absolute path rather than relying on $PATH lookup, so a $PATH attacker can't substitute a sudo wrapper to capture our (root) NOPASSWD invocation. Plus regression test that asserts the agent's actual argv satisfies the shipped sudoers spec — if a future change desyncs them, the test fails before release rather than the gauges silently going zero on upgrade. README install one-liner now uses the safe pattern: render to a tempfile, `visudo -cf` for syntax validation, then `install -m 0440 -o root -g wheel`. Placeholder renamed from USERNAME to __LLMKUBE_USER__ so a copy-paste with the placeholder still in place fails sudo loudly instead of silently granting to a real user named "USERNAME". Adds an Uninstall section. Signed-off-by: Christopher Maher <chris@mahercode.io>
The codecov report on PR #334 flagged 22 lines of patch as uncovered. Most of those are flag declarations in cmd/metal-agent/main.go (which the rest of main() also doesn't cover — declarative wiring nobody tests in this repo) and the powermetrics subprocess code in power_darwin.go (which is darwin-only and exercised by the parser tests + the new argv-vs-sudoers regression test). The honest gaps were the conditional sampler launch in agent.Start() and the no-op stub in power_other.go that ships in the Linux CI binary. Both are now covered: - Extract maybeStartApplePowerSampler from Start() following the same reportHealthServerExit / runWatcherLoop pattern that was already in agent.go. The factory is a package var so tests can swap in a fake whose Run() is deterministic. - Define an applePowerRunner interface so tests don't have to construct the darwin-only ApplePowerSampler struct from a Linux test binary. - Add TestMaybeStartApplePowerSampler_DisabledByDefault and TestMaybeStartApplePowerSampler_Enabled_LaunchesViaFactory. The enabled-path test asserts both that the helper plumbed PowermetricsBin/ApplePowerInterval through to the factory and that Run was actually invoked by the goroutine. - Add power_other_test.go (build-tagged !darwin) so CI on Linux exercises the no-op stub. Without it the build-tag gate could silently rot. go test on darwin and GOOS=linux go vet both pass. maybeStartApplePowerSampler hits 100% in the local cover profile. Signed-off-by: Christopher Maher <chris@mahercode.io>
Merged
Defilan
added a commit
to defilantech/infercost
that referenced
this pull request
Apr 26, 2026
…s Macs (#47) Closes #46. ## Summary InferCost's existing power-data path reads NVIDIA GPU power from the DCGM exporter. macOS has no DCGM equivalent, so every Apple Silicon CostProfile silently fell back to TDP estimation — fine for order-of-magnitude numbers, useless for tracking dynamic load. This adds a parallel **Metal collector** that scrapes the LLMKube Metal Agent's `apple_power_*_watts` gauges (defilantech/LLMKube#335 / [PR #334](defilantech/LLMKube#334)). The agent ships those gauges from a sudo'd `powermetrics` subprocess. The combined CPU + GPU + ANE watts is the right scope to feed into the existing cost calculation — exactly mirroring what DCGM reports for an NVIDIA card. ## Architecture - **`internal/scraper/metal.go`**: `ScrapeApplePower` mirrors `ScrapeDCGM`. Returns `ApplePowerReading{Combined, GPU, CPU, ANE}` — single-row return because the gauges are unlabeled (no per-GPU/per-pod cardinality on a Mac). - **`internal/controller/costprofile_controller.go`**: `readMetalPower` follows the same every-path-emits-a-condition contract as `readDCGMPower`. New `readPower` dispatcher (10 lines) keys off `MetalEndpoint != \"\" && looksApple(gpuModel)` to pick Metal vs DCGM. - **New `ConditionMetalReachable`** with reasons `MetalHealthy / MetalNotConfigured / MetalScrapeError / MetalSamplerOff` — operators on a Mac don't see a confusing \"DCGM unreachable\" message when DCGM was never the right tool. - **The `MetalSamplerOff` path is the important one**: agent reachable but `apple_power_combined_watts` is zero → operator forgot `--apple-power-enabled`, the sudoers entry is missing, or they overrode `--powermetrics-bin` (which the agent's Phase 1 security hardening now refuses). The condition message names all three. ## Plumbing - `cmd/main.go`: new `--metal-endpoint` flag, plumbed to both reconcilers. - `internal/controller/usagereport_controller.go`: `MetalEndpoint` field added for symmetry. Currently unused at runtime (UsageReport reads from profile status + the shared `Sampler`), but keeps a future direct-scrape variant a one-liner. - `charts/infercost/values.yaml` + `templates/deployment.yaml`: parallel `metal:` block next to `dcgm:`. - `config/samples/costprofiles/apple-m5-max.yaml` (new): real M5 Max numbers — \$4500 / 3yr amortization, 90W TDP fallback, `idleWattsThreshold: 12` so the M-series ~6W idle floor doesn't register as active. - `config/samples/costprofiles/apple-m2-ultra.yaml`: replace the \"roadmap\" comment with actual setup steps + a callout warning operators not to override the agent's `--powermetrics-bin`. ## Tests - **`internal/scraper/metal_test.go`** (5 cases): happy path, sampler disabled (zeros), empty body, HTTP 503, only-combined-present. Captured fixture matches what a real M5 Max emits. - **`internal/controller/costprofile_metal_test.go`** (8 cases): 4 `readMetalPower` paths × condition assertions + 3 `readPower` dispatcher cases + `TestLooksApple` table. Dispatcher tests use a sentinel `should-not-be-called.invalid` endpoint on the wrong path so a routing regression fails loudly. - All existing DCGM tests untouched and still passing — the dispatcher refactor is additive. ## Verification ``` go test ./... # all green helm template charts/infercost --set metal.endpoint=http://x:9090/metrics | grep metal-endpoint → - --metal-endpoint=http://x:9090/metrics # rendered correctly ``` ## Test plan - [x] Unit tests for `ScrapeApplePower` (5 cases) - [x] Unit tests for `readMetalPower` + `readPower` dispatcher (8 cases) - [x] Existing `readDCGMPower` tests still pass (regression guard) - [x] `helm template` emits `--metal-endpoint` arg when set - [x] `go build ./...` and `go vet ./...` clean - [ ] **Manual E2E on M5 Max**: install LLMKube Metal Agent v0.7.x with `--apple-power-enabled`, install sudoers entry, deploy InferCost via Helm with `metal.endpoint=http://localhost:9090/metrics`, apply `apple-m5-max.yaml`, verify `kubectl get costprofile apple-m5-max -o yaml` shows `currentPowerDrawWatts: ~42` and `MetalReachable=True/MetalHealthy`. Then start inference traffic and watch `infercost_cost_per_million_tokens_usd` populate. ## Follow-ups (separate PRs) - Auto-discovery of LLMKube Metal Agents via label selector (drop the explicit `--metal-endpoint` flag for fleet deployments) - A `dashboards/apple-silicon.json` Grafana board fed from the new gauges - `docs/apple-silicon.md` one-pager linking from the sample comments --------- Signed-off-by: Christopher Maher <chris@mahercode.io>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a darwin-only powermetrics sampler that publishes four new Prometheus gauges on the Metal agent's existing
AgentRegistry:llmkube_metal_agent_apple_power_combined_wattsllmkube_metal_agent_apple_power_gpu_wattsllmkube_metal_agent_apple_power_cpu_wattsllmkube_metal_agent_apple_power_ane_wattsDesigned to be scraped by InferCost's upcoming Metal collector for per-token cost attribution on Apple Silicon, where DCGM (NVIDIA-only) doesn't exist. See InferCost #46 for the cross-repo design.
Why
Today the Apple sample CostProfile in InferCost says: "Apple Silicon has no DCGM exporter; InferCost falls back to TDP-based power estimation. Real-time power metrics will require a Metal-aware exporter (tracked on the roadmap)." This PR is the LLMKube half of closing that gap.
This weekend's M5 Max bench session ran 2.06 tok/s/W of inference and computed $0.015/M tokens marginal cost manually from
powermetrics, but InferCost itself couldn't see any of it. After this PR + the matching InferCost PR, the same flow works end-to-end without manual math.What changed
pkg/agent/agentmetrics.goapple_power_*_wattsgauges; register ininit()pkg/agent/power_darwin.go(new)ApplePowerSampler— spawnssudo powermetrics --samplers cpu_power,gpu_power -i <interval>, parses output, updates gaugespkg/agent/power_other.go(new)pkg/agent/power_darwin_test.go(new)pkg/agent/testdata/powermetrics_sample.txt(new)pkg/agent/agentmetrics_test.gopkg/agent/agent.goApplePowerEnabled/ApplePowerInterval/PowermetricsBintoMetalAgentConfig; conditionally launch sampler goroutine inStart()cmd/metal-agent/main.go--apple-power-enabled,--apple-power-interval,--powermetrics-binflagsdeployment/macos/sudoers.d/llmkube-powermetrics(new)deployment/macos/README.mdPrivilege handling
powermetricsrequires root, but the agent runs as a regular user under launchd. Three options were considered: (1) NOPASSWD sudoers entry for/usr/bin/powermetricsonly, (2) run the entire metal-agent as root, (3) external launchd job + tail a shared file.Picked (1) because it's the smallest privilege grant —
powermetrics-only, not full root for the agent — and is well-understood by macOS admins. The flag defaults to disabled so users opt into the sudoers requirement explicitly. Without--apple-power-enabled, behavior is unchanged from today.Parser quirk handled
powermetricsemits twoGPU Power: <n> mWlines per sample — one in the cpu_power section, one in the gpu_power section. Only the first occurrence per sample is published, so the gauge value matches what was actually summed intoCombined Power. TestTestParsePowermetricsStream_DuplicateGPULineIgnoredguards this.A second guard (
TestParsePowermetricsStream_PartialSampleNotEmitted) ensures samples missing theirCombined Powerline never emit, so InferCost's per-token math doesn't see stale-zero gauges if powermetrics is killed mid-write.Test plan
go test ./...— all packages greengolangci-lint run ./pkg/agent/... ./cmd/metal-agent/...— only the preexistinghealth.goG118 finding (unrelated)go vet ./...cleangofmt -wclean--apple-power-enabled,curl /metrics | grep apple_power, run inference, watch gauge rise; stop, watch fall to ~1 W idle baselineFollow-up
Phase 2 (separate PR in defilantech/infercost) will add a
metalcollector type to InferCost that scrapes these gauges. See plan link above.Closes #335