-
Notifications
You must be signed in to change notification settings - Fork 105
feat(fodc): support multiple ports for polling metrics in watchdog #945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…into feat/watchdog
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #945 +/- ##
==========================================
+ Coverage 45.97% 47.08% +1.10%
==========================================
Files 328 383 +55
Lines 55505 59470 +3965
==========================================
+ Hits 25520 28002 +2482
- Misses 27909 28854 +945
- Partials 2076 2614 +538
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for polling metrics from multiple ports in the watchdog component and enhances agent identification with pod and container metadata. The changes enable agents to poll metrics from multiple endpoints (e.g., different containers) and automatically enrich metrics with agent identity labels (node_role, pod_name, container_name) for better traceability in distributed environments.
Changes:
- Added
pod-name,container-name, andpoll-metrics-portsflags to the agent for configuring multiple metrics endpoints and identity metadata - Modified watchdog to poll from multiple endpoints and enrich metrics with agent identity labels
- Removed redundant
NodeRoleandagent_idfields from aggregated metrics in favor of using labels - Updated all tests to include the new required identity fields
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| fodc/agent/internal/cmd/root.go | Added new CLI flags and validation for poll-metrics-ports, pod-name, and container-name |
| fodc/agent/internal/watchdog/watchdog.go | Refactored to support polling from multiple endpoints with per-endpoint container names |
| fodc/agent/internal/metrics/parse.go | Added ParseWithAgentLabels function to enrich metrics with agent identity labels |
| fodc/agent/internal/proxy/client.go | Updated to include pod_name and container_names in agent registration |
| fodc/proxy/internal/registry/registry.go | Added PodName and ContainerNames fields to AgentIdentity |
| fodc/proxy/internal/metrics/aggregator.go | Removed redundant NodeRole field from AggregatedMetric struct |
| fodc/proxy/internal/api/server.go | Updated to use labels["node_role"] instead of NodeRole field |
| api/proto/banyandb/fodc/v1/rpc.proto | Added pod_name and container_names fields to RegisterAgentRequest |
| All test files | Updated test cases to provide required pod_name and container_names values |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
| if len(containerNames) > 0 && len(containerNames) != len(pollMetricsPorts) { | ||
| return fmt.Errorf("container-name count (%d) must match poll-metrics-ports count (%d)", len(containerNames), len(pollMetricsPorts)) | ||
| } |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation allows container-name to be optional (length 0), but the watchdog.pollMetrics function at line 206-207 will use an empty string for containerName when the slice is shorter than urls. This could lead to inconsistent behavior where some metrics have container_name labels and others don't. Consider either requiring container-name to always match the count of poll-metrics-ports, or explicitly document that omitting container-name is valid and will result in metrics without container_name labels.
| } | |
| } | |
| if len(containerNames) == 0 { | |
| // When no container names are provided, initialize a slice of the same length | |
| // as pollMetricsPorts so that downstream usage always has matching lengths. | |
| // Each entry will be an empty string, indicating no explicit container name. | |
| containerNames = make([]string, len(pollMetricsPorts)) | |
| } |
| containerName := "" | ||
| if idx < len(w.containerNames) { | ||
| containerName = w.containerNames[idx] | ||
| } |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic silently uses an empty string for containerName when containerNames is shorter than urls. This could lead to confusion when debugging why some metrics lack container_name labels. Consider logging a warning when containerNames length doesn't match urls length, or document this fallback behavior clearly.
| Labels map[string]string | ||
| RegisteredAt time.Time | ||
| LastHeartbeat time.Time | ||
| AgentID string | ||
| NodeRole string | ||
| Status AgentStatus | ||
| AgentIdentity AgentIdentity | ||
| RegisteredAt time.Time | ||
| LastHeartbeat time.Time | ||
| PrimaryAddress Address | ||
| AgentIdentity AgentIdentity |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field ordering in AgentInfo struct has been changed inconsistently - some fields are reordered while others aren't. The reordering doesn't follow any clear pattern (not alphabetical, not by type, not by importance). Consider keeping the original field order or applying a consistent ordering principle to avoid unnecessary diff noise.
| // Verify first metric has agent identity labels | ||
| cpuMetric := metrics[0] | ||
| assert.Equal(t, "cpu_usage", cpuMetric.Name) | ||
| require.Len(t, cpuMetric.Labels, 4) // instance + 3 agent labels |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test verifies label count but uses magic number 4 in multiple places (lines 1086, 1119). If the agent labels change in the future, these tests could become brittle. Consider calculating the expected count dynamically based on the number of provided agent labels plus original labels, or using a constant.
| Timestamp time.Time | ||
| Name string | ||
| AgentID string | ||
| NodeRole string | ||
| Description string | ||
| Value float64 |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AggregatedMetric struct field ordering has been changed, which creates unnecessary diff noise. The fields NodeRole was removed and the remaining fields were reordered. Unless there's a specific reason for reordering (e.g., memory alignment), consider keeping the original field order for fields that weren't removed.
| logger *logger.Logger | ||
| heartbeatTicker *time.Ticker | ||
| flightRecorder *flightrecorder.FlightRecorder | ||
| logger *logger.Logger | ||
| connManager *connManager | ||
| stopCh chan struct{} | ||
| reconnectCh chan struct{} | ||
| labels map[string]string | ||
| client fodcv1.FODCServiceClient | ||
| registrationStream fodcv1.FODCService_RegisterAgentClient | ||
| metricsStream fodcv1.FODCService_StreamMetricsClient | ||
| registrationStream fodcv1.FODCService_RegisterAgentClient | ||
| client fodcv1.FODCServiceClient | ||
|
|
||
| proxyAddr string | ||
| nodeIP string | ||
| nodeRole string | ||
| agentID string | ||
| proxyAddr string | ||
| nodeIP string | ||
| nodeRole string | ||
| podName string | ||
| agentID string | ||
| containerNames []string | ||
|
|
||
| nodePort int | ||
| heartbeatInterval time.Duration | ||
| reconnectInterval time.Duration | ||
| disconnected bool | ||
| streamsMu sync.RWMutex // Protects streams only | ||
| heartbeatWg sync.WaitGroup // Tracks heartbeat goroutine | ||
| nodePort int | ||
| disconnected bool |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Client struct fields have been extensively reordered, creating significant diff noise. The reordering doesn't follow a consistent pattern and makes the changes harder to review. Consider adding new fields (podName, containerNames) without reordering existing fields, unless there's a specific technical reason (e.g., memory alignment optimization) that should be documented.
| nodeRole string | ||
| podName string | ||
| urls []string | ||
| containerNames []string | ||
|
|
||
| interval time.Duration | ||
| retryBackoff time.Duration | ||
| mu sync.RWMutex | ||
| wg sync.WaitGroup | ||
| isRunning bool |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Watchdog struct fields have been reordered, creating unnecessary diff noise. The reordering groups related fields together (which is good) but changes the original structure significantly. Consider adding new fields at the end or in logical groups without extensively reordering existing fields, unless there's a documented reason for the reorganization.
| nodeRole string | |
| podName string | |
| urls []string | |
| containerNames []string | |
| interval time.Duration | |
| retryBackoff time.Duration | |
| mu sync.RWMutex | |
| wg sync.WaitGroup | |
| isRunning bool | |
| interval time.Duration | |
| retryBackoff time.Duration | |
| mu sync.RWMutex | |
| wg sync.WaitGroup | |
| isRunning bool | |
| nodeRole string | |
| podName string | |
| urls []string | |
| containerNames []string |
Add
pod-name,container-name,poll-metrics-portsflags to agent for metric labels.Support multiple ports for polling metrics in watchdog.
Signed-off-by: Qiuxia Fan qiuxiafan@apache.org