Skip to content

Conversation

@Fine0830
Copy link
Member

Add pod-name, container-name, poll-metrics-ports flags to agent for metric labels.
Support multiple ports for polling metrics in watchdog.

Signed-off-by: Qiuxia Fan qiuxiafan@apache.org

@codecov-commenter
Copy link

codecov-commenter commented Jan 16, 2026

Codecov Report

❌ Patch coverage is 77.01863% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 47.08%. Comparing base (3530dd9) to head (a1c7515).
⚠️ Report is 113 commits behind head on main.

Files with missing lines Patch % Lines
fodc/agent/internal/cmd/root.go 20.68% 23 Missing ⚠️
fodc/agent/internal/watchdog/watchdog.go 85.10% 5 Missing and 2 partials ⚠️
fodc/proxy/internal/api/server.go 57.14% 6 Missing ⚠️
fodc/proxy/internal/registry/registry.go 96.87% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #945      +/-   ##
==========================================
+ Coverage   45.97%   47.08%   +1.10%     
==========================================
  Files         328      383      +55     
  Lines       55505    59470    +3965     
==========================================
+ Hits        25520    28002    +2482     
- Misses      27909    28854     +945     
- Partials     2076     2614     +538     
Flag Coverage Δ
banyand 49.53% <ø> (?)
bydbctl 81.91% <ø> (?)
fodc 78.68% <77.01%> (?)
integration-distributed 80.00% <ø> (?)
pkg 29.20% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@wu-sheng wu-sheng requested review from Copilot and hanahmily January 16, 2026 07:16
@wu-sheng wu-sheng added this to the 0.10.0 milestone Jan 16, 2026
@wu-sheng wu-sheng added the enhancement New feature or request label Jan 16, 2026
@wu-sheng wu-sheng added the fodc First Occurance Data Collection label Jan 16, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for polling metrics from multiple ports in the watchdog component and enhances agent identification with pod and container metadata. The changes enable agents to poll metrics from multiple endpoints (e.g., different containers) and automatically enrich metrics with agent identity labels (node_role, pod_name, container_name) for better traceability in distributed environments.

Changes:

  • Added pod-name, container-name, and poll-metrics-ports flags to the agent for configuring multiple metrics endpoints and identity metadata
  • Modified watchdog to poll from multiple endpoints and enrich metrics with agent identity labels
  • Removed redundant NodeRole and agent_id fields from aggregated metrics in favor of using labels
  • Updated all tests to include the new required identity fields

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
fodc/agent/internal/cmd/root.go Added new CLI flags and validation for poll-metrics-ports, pod-name, and container-name
fodc/agent/internal/watchdog/watchdog.go Refactored to support polling from multiple endpoints with per-endpoint container names
fodc/agent/internal/metrics/parse.go Added ParseWithAgentLabels function to enrich metrics with agent identity labels
fodc/agent/internal/proxy/client.go Updated to include pod_name and container_names in agent registration
fodc/proxy/internal/registry/registry.go Added PodName and ContainerNames fields to AgentIdentity
fodc/proxy/internal/metrics/aggregator.go Removed redundant NodeRole field from AggregatedMetric struct
fodc/proxy/internal/api/server.go Updated to use labels["node_role"] instead of NodeRole field
api/proto/banyandb/fodc/v1/rpc.proto Added pod_name and container_names fields to RegisterAgentRequest
All test files Updated test cases to provide required pod_name and container_names values

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}
if len(containerNames) > 0 && len(containerNames) != len(pollMetricsPorts) {
return fmt.Errorf("container-name count (%d) must match poll-metrics-ports count (%d)", len(containerNames), len(pollMetricsPorts))
}
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation allows container-name to be optional (length 0), but the watchdog.pollMetrics function at line 206-207 will use an empty string for containerName when the slice is shorter than urls. This could lead to inconsistent behavior where some metrics have container_name labels and others don't. Consider either requiring container-name to always match the count of poll-metrics-ports, or explicitly document that omitting container-name is valid and will result in metrics without container_name labels.

Suggested change
}
}
if len(containerNames) == 0 {
// When no container names are provided, initialize a slice of the same length
// as pollMetricsPorts so that downstream usage always has matching lengths.
// Each entry will be an empty string, indicating no explicit container name.
containerNames = make([]string, len(pollMetricsPorts))
}

Copilot uses AI. Check for mistakes.
Comment on lines +205 to +208
containerName := ""
if idx < len(w.containerNames) {
containerName = w.containerNames[idx]
}
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic silently uses an empty string for containerName when containerNames is shorter than urls. This could lead to confusion when debugging why some metrics lack container_name labels. Consider logging a warning when containerNames length doesn't match urls length, or document this fallback behavior clearly.

Copilot uses AI. Check for mistakes.
Comment on lines 58 to 67
Labels map[string]string
RegisteredAt time.Time
LastHeartbeat time.Time
AgentID string
NodeRole string
Status AgentStatus
AgentIdentity AgentIdentity
RegisteredAt time.Time
LastHeartbeat time.Time
PrimaryAddress Address
AgentIdentity AgentIdentity
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field ordering in AgentInfo struct has been changed inconsistently - some fields are reordered while others aren't. The reordering doesn't follow any clear pattern (not alphabetical, not by type, not by importance). Consider keeping the original field order or applying a consistent ordering principle to avoid unnecessary diff noise.

Copilot uses AI. Check for mistakes.
// Verify first metric has agent identity labels
cpuMetric := metrics[0]
assert.Equal(t, "cpu_usage", cpuMetric.Name)
require.Len(t, cpuMetric.Labels, 4) // instance + 3 agent labels
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test verifies label count but uses magic number 4 in multiple places (lines 1086, 1119). If the agent labels change in the future, these tests could become brittle. Consider calculating the expected count dynamically based on the number of provided agent labels plus original labels, or using a constant.

Copilot uses AI. Check for mistakes.
Comment on lines 43 to 47
Timestamp time.Time
Name string
AgentID string
NodeRole string
Description string
Value float64
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AggregatedMetric struct field ordering has been changed, which creates unnecessary diff noise. The fields NodeRole was removed and the remaining fields were reordered. Unless there's a specific reason for reordering (e.g., memory alignment), consider keeping the original field order for fields that weren't removed.

Copilot uses AI. Check for mistakes.
Comment on lines 49 to 72
logger *logger.Logger
heartbeatTicker *time.Ticker
flightRecorder *flightrecorder.FlightRecorder
logger *logger.Logger
connManager *connManager
stopCh chan struct{}
reconnectCh chan struct{}
labels map[string]string
client fodcv1.FODCServiceClient
registrationStream fodcv1.FODCService_RegisterAgentClient
metricsStream fodcv1.FODCService_StreamMetricsClient
registrationStream fodcv1.FODCService_RegisterAgentClient
client fodcv1.FODCServiceClient

proxyAddr string
nodeIP string
nodeRole string
agentID string
proxyAddr string
nodeIP string
nodeRole string
podName string
agentID string
containerNames []string

nodePort int
heartbeatInterval time.Duration
reconnectInterval time.Duration
disconnected bool
streamsMu sync.RWMutex // Protects streams only
heartbeatWg sync.WaitGroup // Tracks heartbeat goroutine
nodePort int
disconnected bool
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Client struct fields have been extensively reordered, creating significant diff noise. The reordering doesn't follow a consistent pattern and makes the changes harder to review. Consider adding new fields (podName, containerNames) without reordering existing fields, unless there's a specific technical reason (e.g., memory alignment optimization) that should be documented.

Copilot uses AI. Check for mistakes.
Comment on lines +54 to 63
nodeRole string
podName string
urls []string
containerNames []string

interval time.Duration
retryBackoff time.Duration
mu sync.RWMutex
wg sync.WaitGroup
isRunning bool
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Watchdog struct fields have been reordered, creating unnecessary diff noise. The reordering groups related fields together (which is good) but changes the original structure significantly. Consider adding new fields at the end or in logical groups without extensively reordering existing fields, unless there's a documented reason for the reorganization.

Suggested change
nodeRole string
podName string
urls []string
containerNames []string
interval time.Duration
retryBackoff time.Duration
mu sync.RWMutex
wg sync.WaitGroup
isRunning bool
interval time.Duration
retryBackoff time.Duration
mu sync.RWMutex
wg sync.WaitGroup
isRunning bool
nodeRole string
podName string
urls []string
containerNames []string

Copilot uses AI. Check for mistakes.
hanahmily
hanahmily previously approved these changes Jan 19, 2026
@Fine0830 Fine0830 requested a review from hanahmily January 19, 2026 03:03
@Fine0830 Fine0830 merged commit 0f30073 into apache:main Jan 19, 2026
22 checks passed
@Fine0830 Fine0830 deleted the feat/watchdog branch January 19, 2026 03:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request fodc First Occurance Data Collection

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants