Add multicluster support for the event recorder#619
Add multicluster support for the event recorder#619SoWieMarkus wants to merge 2 commits intomainfrom
Conversation
📝 WalkthroughWalkthroughThe pull request switches event recorder sourcing from the controller-manager to a multicluster client across six scheduling controllers, and introduces a new Changes
Sequence DiagramsequenceDiagram
participant Controller as Scheduling<br/>Controller
participant MCR as MultiCluster<br/>Recorder
participant Client as Multicluster<br/>Client
participant Scheme as Home<br/>Scheme
participant Router as Cluster<br/>Router
participant ClusterRecorder as Per-Cluster<br/>Recorder
Controller->>MCR: Eventf(regarding, ...)
MCR->>Scheme: GVK resolution<br/>(regarding object)
alt GVK resolved successfully
Scheme-->>MCR: GVK
MCR->>Router: clusterForWrite(GVK)
Router-->>MCR: target cluster
MCR->>MCR: lookup pre-cached<br/>recorder for cluster
alt recorder exists for cluster
MCR->>ClusterRecorder: Eventf(...)
ClusterRecorder-->>MCR: event recorded
else fallback to home
MCR->>ClusterRecorder: Eventf(...)<br/>[home recorder]
ClusterRecorder-->>MCR: event recorded
end
else GVK resolution fails or object nil
MCR->>ClusterRecorder: Eventf(...)<br/>[fallback to home]
ClusterRecorder-->>MCR: event recorded
end
MCR-->>Controller: complete
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Test Coverage ReportTest Coverage 📊: 68.5% |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
pkg/multicluster/recorder.go (1)
22-49: Recorder cache is static; new remote clusters will silently fall back to home events
GetEventRecorder()pre-creates a fixed recorder map, andrecorderFor()falls back to home when a cluster key is missing. If remotes are added after initialization, events for those objects are emitted in the wrong cluster.
Details
♻️ Suggested refactor (lazy recorder creation on cache miss)
import ( + "sync" "k8s.io/apimachinery/pkg/runtime" "k8s.io/client-go/tools/events" ctrl "sigs.k8s.io/controller-runtime" "sigs.k8s.io/controller-runtime/pkg/cluster" ) type MultiClusterRecorder struct { client *Client + name string homeRecorder events.EventRecorder recorders map[cluster.Cluster]events.EventRecorder + mu sync.RWMutex } func (c *Client) GetEventRecorder(name string) events.EventRecorder { homeRecorder := c.HomeCluster.GetEventRecorder(name) recorders := make(map[cluster.Cluster]events.EventRecorder) recorders[c.HomeCluster] = homeRecorder @@ return &MultiClusterRecorder{ client: c, + name: name, homeRecorder: homeRecorder, recorders: recorders, } } @@ - recorder, ok := r.recorders[cl] - if !ok { - ctrl.Log.V(1).Info("multi-cluster recorder: no pre-created recorder for cluster, using home recorder", "gvk", gvk) - return r.homeRecorder - } - - return recorder + r.mu.RLock() + recorder, ok := r.recorders[cl] + r.mu.RUnlock() + if ok { + return recorder + } + + r.mu.Lock() + defer r.mu.Unlock() + if recorder, ok = r.recorders[cl]; ok { + return recorder + } + recorder = cl.GetEventRecorder(r.name) + r.recorders[cl] = recorder + return recorder }Also applies to: 76-80
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/multicluster/recorder.go` around lines 22 - 49, GetEventRecorder currently builds a static map of recorders so later-added remote clusters silently fall back to the home recorder in recorderFor; change to lazily create and cache per-cluster recorders on demand instead: update MultiClusterRecorder.recorderFor (and any lookup path) to, under remoteClustersMu (or a dedicated mutex on MultiClusterRecorder), check if a recorder exists for the requested cluster and if not call cluster.GetEventRecorder(name) to create it and store it in the recorders map before returning it, ensuring concurrent access is synchronized and avoiding fallback to homeRecorder for newly added clusters; keep GetEventRecorder only creating the homeRecorder and initial entries if desired but rely on recorderFor for misses.pkg/multicluster/client_test.go (1)
79-97: Prefer interface-typed test recorder field to reduce test coupling
fakeRecordercan be typed asevents.EventRecorderinstead of concrete*fakeEventRecorder; this keeps the fake cluster reusable and decoupled from one specific test helper type.♻️ Minimal refactor
type fakeCluster struct { cluster.Cluster fakeClient client.Client fakeCache *fakeCache - fakeRecorder *fakeEventRecorder + fakeRecorder events.EventRecorder }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/multicluster/client_test.go` around lines 79 - 97, Change the fakeCluster.test field fakeRecorder from concrete *fakeEventRecorder to the interface type events.EventRecorder and update the GetEventRecorder method to return that interface; specifically, make the fakeCluster struct declare fakeRecorder events.EventRecorder, have GetEventRecorder(_ string) return f.fakeRecorder when non-nil, and otherwise return a new &fakeEventRecorder{} (which still implements events.EventRecorder). This reduces coupling while preserving current behavior.pkg/multicluster/recorder_test.go (1)
174-181: Use a single snapshot per assertion block.These assertions call
getCalls()multiple times, which repeatedly locks and copies slices. Taking one snapshot per recorder improves readability and avoids redundant work.♻️ Suggested refactor
- if len(remoteARecorder.getCalls()) != 0 { - t.Errorf("expected 0 calls to remote-a, got %d", len(remoteARecorder.getCalls())) - } - if len(remoteBRecorder.getCalls()) != 1 { - t.Fatalf("expected 1 call to remote-b, got %d", len(remoteBRecorder.getCalls())) - } - if remoteBRecorder.getCalls()[0].reason != "SchedulingFailed" { - t.Errorf("expected reason %q, got %q", "SchedulingFailed", remoteBRecorder.getCalls()[0].reason) - } + remoteACalls := remoteARecorder.getCalls() + if len(remoteACalls) != 0 { + t.Errorf("expected 0 calls to remote-a, got %d", len(remoteACalls)) + } + remoteBCalls := remoteBRecorder.getCalls() + if len(remoteBCalls) != 1 { + t.Fatalf("expected 1 call to remote-b, got %d", len(remoteBCalls)) + } + if remoteBCalls[0].reason != "SchedulingFailed" { + t.Errorf("expected reason %q, got %q", "SchedulingFailed", remoteBCalls[0].reason) + }- if len(homeRecorder.getCalls()) != goroutines { - t.Errorf("expected %d calls, got %d", goroutines, len(homeRecorder.getCalls())) - } + homeCalls := homeRecorder.getCalls() + if len(homeCalls) != goroutines { + t.Errorf("expected %d calls, got %d", goroutines, len(homeCalls)) + }Also applies to: 253-255
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/multicluster/recorder_test.go` around lines 174 - 181, The test repeatedly calls remoteARecorder.getCalls() and remoteBRecorder.getCalls(), causing redundant locking/copying; fix by taking a single snapshot for each recorder (e.g., callsA := remoteARecorder.getCalls() and callsB := remoteBRecorder.getCalls()) and use those snapshots for all subsequent length and content assertions (replace repeated getCalls() calls and use callsB[0].reason for the reason check); apply same change for the similar block at lines 253-255.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/multicluster/client_test.go`:
- Around line 79-97: Change the fakeCluster.test field fakeRecorder from
concrete *fakeEventRecorder to the interface type events.EventRecorder and
update the GetEventRecorder method to return that interface; specifically, make
the fakeCluster struct declare fakeRecorder events.EventRecorder, have
GetEventRecorder(_ string) return f.fakeRecorder when non-nil, and otherwise
return a new &fakeEventRecorder{} (which still implements events.EventRecorder).
This reduces coupling while preserving current behavior.
In `@pkg/multicluster/recorder_test.go`:
- Around line 174-181: The test repeatedly calls remoteARecorder.getCalls() and
remoteBRecorder.getCalls(), causing redundant locking/copying; fix by taking a
single snapshot for each recorder (e.g., callsA := remoteARecorder.getCalls()
and callsB := remoteBRecorder.getCalls()) and use those snapshots for all
subsequent length and content assertions (replace repeated getCalls() calls and
use callsB[0].reason for the reason check); apply same change for the similar
block at lines 253-255.
In `@pkg/multicluster/recorder.go`:
- Around line 22-49: GetEventRecorder currently builds a static map of recorders
so later-added remote clusters silently fall back to the home recorder in
recorderFor; change to lazily create and cache per-cluster recorders on demand
instead: update MultiClusterRecorder.recorderFor (and any lookup path) to, under
remoteClustersMu (or a dedicated mutex on MultiClusterRecorder), check if a
recorder exists for the requested cluster and if not call
cluster.GetEventRecorder(name) to create it and store it in the recorders map
before returning it, ensuring concurrent access is synchronized and avoiding
fallback to homeRecorder for newly added clusters; keep GetEventRecorder only
creating the homeRecorder and initial entries if desired but rely on recorderFor
for misses.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3a3aff50-8a0c-40d3-90df-f471a387434c
📒 Files selected for processing (9)
internal/scheduling/cinder/filter_weigher_pipeline_controller.gointernal/scheduling/machines/filter_weigher_pipeline_controller.gointernal/scheduling/manila/filter_weigher_pipeline_controller.gointernal/scheduling/nova/filter_weigher_pipeline_controller.gointernal/scheduling/pods/filter_weigher_pipeline_controller.gointernal/scheduling/reservations/failover/controller.gopkg/multicluster/client_test.gopkg/multicluster/recorder.gopkg/multicluster/recorder_test.go
There was a problem hiding this comment.
Pull request overview
Adds a multi-cluster-aware Kubernetes EventRecorder that routes events to the correct cluster (home vs. matching remote) using the same routing logic as the multicluster client write path.
Changes:
- Introduce
MultiClusterRecorderandClient.GetEventRecorder(...)to route events by the “regarding” object’s GVK and router match. - Update scheduling controllers to use
mcl.GetEventRecorder(...)instead ofmgr.GetEventRecorder(...). - Add unit tests covering home routing, remote routing, fallback behavior, and concurrency.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| pkg/multicluster/recorder.go | New multi-cluster event recorder + Client.GetEventRecorder implementation. |
| pkg/multicluster/recorder_test.go | New tests validating routing/fallback/concurrency of the recorder. |
| pkg/multicluster/client_test.go | Extend fake cluster.Cluster to provide event recorders for tests. |
| internal/scheduling/reservations/failover/controller.go | Switch controller recorder to multicluster-aware recorder. |
| internal/scheduling/pods/filter_weigher_pipeline_controller.go | Use multicluster-aware recorder for history events. |
| internal/scheduling/nova/filter_weigher_pipeline_controller.go | Use multicluster-aware recorder for history events. |
| internal/scheduling/manila/filter_weigher_pipeline_controller.go | Use multicluster-aware recorder for history events. |
| internal/scheduling/machines/filter_weigher_pipeline_controller.go | Use multicluster-aware recorder for history events. |
| internal/scheduling/cinder/filter_weigher_pipeline_controller.go | Use multicluster-aware recorder for history events. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Changes