fix(operator): remove per-session monitorPod goroutines to fix memory leak#1004
Conversation
… leak The legacy monitorPod function spawned a long-lived goroutine per session (5s ticker polling API server for pod/session status). With 214 sessions this meant 214 goroutines duplicating work already handled by the controller-runtime reconciler, causing memory that never released after sessions stopped. Remove monitorPod entirely and move its only unique functionality (inactivity auto-stop) into reconcileRunning. The 10-worker controller-runtime pool already handles pod status, phase transitions, and token refresh. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughController reconcile for running sessions now performs an inactivity check after refreshing runner tokens and can trigger an automatic stop; legacy in-process pod-monitoring goroutines were removed, delegating steady-state monitoring to controller-runtime reconcilers. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Controller as Controller (reconciler)
participant Handlers as Handlers (ShouldAutoStop/TriggerInactivityStop)
participant K8s as Kubernetes API / Pod
Controller->>K8s: refresh runner token / inspect session & pod
Controller->>Handlers: ShouldAutoStop(session)
alt session idle
Controller->>Handlers: TriggerInactivityStop(namespace,name)
Handlers->>K8s: update session/request stop
Handlers-->>Controller: success / error
alt success
Controller-->>Controller: return Requeue: true
else error
Controller-->>Controller: return RequeueAfter: 30s
end
else not idle
Controller-->>Controller: continue normal reconcile (requeue after interval)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
components/operator/internal/controller/reconcile_phases.go (1)
1-17:⚠️ Potential issue | 🟡 MinorFix gofmt formatting to pass CI.
Run
gofmt -w .to fix the formatting issues flagged by CI.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/operator/internal/controller/reconcile_phases.go` around lines 1 - 17, The file in package controller (reconcile_phases.go) has formatting issues; run gofmt -w . (or gofmt on this file) to reformat the imports and code so it conforms to gofmt rules and passes CI, then re-stage the changed file (ensure package name "controller" and the import block with symbols like corev1, errors, unstructured, types, ctrl, log, and handlers remain unchanged other than formatting).components/operator/internal/handlers/sessions.go (1)
1-29:⚠️ Potential issue | 🟡 MinorFix gofmt formatting to pass CI.
The pipeline is failing due to unformatted Go files. Run
gofmt -w .from the repository root.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/operator/internal/handlers/sessions.go` around lines 1 - 29, The file has formatting issues causing CI failure; run the Go formatter on the repository (e.g., execute `gofmt -w .` from the repo root) and commit the updated formatting for files including this package (package handlers in components/operator/internal/handlers/sessions.go) so imports and code lines (e.g., the import block and any functions in sessions.go) are properly gofmt'd; ensure you re-run `gofmt` or `go fmt` until `gofmt` reports no changes and push the formatted files.components/operator/internal/handlers/inactivity_test.go (1)
1-16:⚠️ Potential issue | 🟡 MinorFix gofmt formatting to pass CI.
Same formatting issue as the main source file. Run
gofmt -w .to fix.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@components/operator/internal/handlers/inactivity_test.go` around lines 1 - 16, The test file in package "handlers" has gofmt/style issues; run the Go formatter (e.g., execute "gofmt -w ." at the repo root or run gofmt on the file) to reformat inactivity_test.go so imports and spacing conform to gofmt, then re-run CI; this will fix the formatting errors in the handlers package test file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@components/operator/internal/handlers/inactivity.go`:
- Around line 1-5: Run gofmt to fix formatting in the handlers file that
contains the package comment and functions referenced (ShouldAutoStop,
TriggerInactivityStop). From the repo root run `gofmt -w .` (or `gofmt -w
components/operator/internal/handlers/inactivity.go`) and commit the changes so
the file formatting passes CI; ensure the package comment and any surrounding
declarations remain unchanged except for formatting.
---
Outside diff comments:
In `@components/operator/internal/controller/reconcile_phases.go`:
- Around line 1-17: The file in package controller (reconcile_phases.go) has
formatting issues; run gofmt -w . (or gofmt on this file) to reformat the
imports and code so it conforms to gofmt rules and passes CI, then re-stage the
changed file (ensure package name "controller" and the import block with symbols
like corev1, errors, unstructured, types, ctrl, log, and handlers remain
unchanged other than formatting).
In `@components/operator/internal/handlers/inactivity_test.go`:
- Around line 1-16: The test file in package "handlers" has gofmt/style issues;
run the Go formatter (e.g., execute "gofmt -w ." at the repo root or run gofmt
on the file) to reformat inactivity_test.go so imports and spacing conform to
gofmt, then re-run CI; this will fix the formatting errors in the handlers
package test file.
In `@components/operator/internal/handlers/sessions.go`:
- Around line 1-29: The file has formatting issues causing CI failure; run the
Go formatter on the repository (e.g., execute `gofmt -w .` from the repo root)
and commit the updated formatting for files including this package (package
handlers in components/operator/internal/handlers/sessions.go) so imports and
code lines (e.g., the import block and any functions in sessions.go) are
properly gofmt'd; ensure you re-run `gofmt` or `go fmt` until `gofmt` reports no
changes and push the formatted files.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ead46c31-4429-433b-8f2f-f2b571472ea4
📒 Files selected for processing (4)
components/operator/internal/controller/reconcile_phases.gocomponents/operator/internal/handlers/inactivity.gocomponents/operator/internal/handlers/inactivity_test.gocomponents/operator/internal/handlers/sessions.go
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… leak (ambient-code#1004) ## Summary - Remove legacy `monitorPod` goroutine-per-session pattern that caused memory to grow linearly with session count and never release - Move inactivity auto-stop check into the existing controller-runtime `reconcileRunning` reconciler - Net **-275 lines** — the 10-worker controller-runtime pool already handled all pod status monitoring, phase transitions, and token refresh ## Context On UAT with 214 running sessions, operator memory was 161Mi. After stopping 145 sessions (down to 69), memory stayed at 162Mi. Root cause: 214 `monitorPod` goroutines (each polling every 5s) duplicated work already done by the controller-runtime reconciler. Go's runtime retained the heap high-water mark even after goroutines exited. ## Test plan - [x] `go build ./...` passes - [x] `go test ./internal/handlers/ -run "TestShouldAutoStop|TestTriggerInactivityStop"` — all pass - [ ] Deploy to UAT, verify memory drops proportionally when sessions are stopped - [ ] Verify inactivity auto-stop still triggers (check logs for `idle beyond inactivity timeout`) - [ ] Verify session lifecycle (create → run → stop) works end-to-end 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Ambient Code Bot <bot@ambient-code.local> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
monitorPodgoroutine-per-session pattern that caused memory to grow linearly with session count and never releasereconcileRunningreconcilerContext
On UAT with 214 running sessions, operator memory was 161Mi. After stopping 145 sessions (down to 69), memory stayed at 162Mi. Root cause: 214
monitorPodgoroutines (each polling every 5s) duplicated work already done by the controller-runtime reconciler. Go's runtime retained the heap high-water mark even after goroutines exited.Test plan
go build ./...passesgo test ./internal/handlers/ -run "TestShouldAutoStop|TestTriggerInactivityStop"— all passidle beyond inactivity timeout)🤖 Generated with Claude Code