-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Fix Minion task metrics collection gaps #17128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit fixes two scenarios where Minion subtask metrics were not being
reported accurately:
1. Previously in-progress tasks that completed between collection cycles:
- Track tasks that were in-progress in the previous execution cycle
- Include completed tasks in current cycle's metrics even if they're
no longer in getTasksInProgress()
- Ensures final metrics are captured for tasks that complete quickly
2. Short-lived tasks that started and completed between collection cycles:
- Use getTasksStartedAfter() to detect tasks that started after the
previous execution timestamp
- Filter out tasks already tracked to avoid duplicates
- Ensures metrics for very short-lived tasks are captured
Key changes:
- Added _previousInProgressTasks map to track tasks across cycles
- Added getTasksStartedAfter() method in PinotHelixTaskResourceManager
- Implemented per-task-type timestamp tracking for independent lifecycle
- Changed initialization timestamp to use dynamic calculation (currentTime - frequencyMs)
instead of object construction time, handling leader changes correctly
- Updated test cases with contextual naming and comprehensive comments
All tests pass (11/11), including new tests for both scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes gaps in Minion task metrics collection by tracking tasks across collection cycles and detecting short-lived tasks. The implementation ensures no metrics are missed for tasks that complete quickly or between collection cycles, with particular focus on capturing error states.
Key Changes:
- Added tracking of previously in-progress tasks to capture final metrics for tasks that completed between cycles
- Implemented detection of short-lived tasks using job start timestamps from WorkflowContext
- Changed from global to per-task-type timestamp tracking with dynamic initialization
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| TaskMetricsEmitter.java | Added state tracking maps (_previousInProgressTasks, _previousExecutionTimestamps), dynamic timestamp initialization, and logic to include completed and short-lived tasks in metrics |
| PinotHelixTaskResourceManager.java | Added getTasksStartedAfter() method using WorkflowContext.getJobStartTimes() for efficient batch retrieval of tasks by start time |
| TaskMetricsEmitterTest.java | Added comprehensive tests for both previously in-progress tasks that completed and short-lived tasks scenarios |
...ntroller/src/main/java/org/apache/pinot/controller/helix/core/minion/TaskMetricsEmitter.java
Show resolved
Hide resolved
...ntroller/src/main/java/org/apache/pinot/controller/helix/core/minion/TaskMetricsEmitter.java
Outdated
Show resolved
Hide resolved
...ller/src/test/java/org/apache/pinot/controller/helix/core/minion/TaskMetricsEmitterTest.java
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #17128 +/- ##
============================================
- Coverage 63.17% 63.17% -0.01%
- Complexity 1428 1429 +1
============================================
Files 3114 3114
Lines 183908 183945 +37
Branches 28167 28173 +6
============================================
+ Hits 116187 116210 +23
- Misses 58738 58760 +22
+ Partials 8983 8975 -8
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
...ntroller/src/main/java/org/apache/pinot/controller/helix/core/minion/TaskMetricsEmitter.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/controller/helix/core/minion/PinotHelixTaskResourceManager.java
Outdated
Show resolved
Hide resolved
…avoiding duplicate Helix calls and simplifying logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
Fix Minion Task Metrics Collection Gaps
Problem Statement
The Minion subtask metrics collection had two scenarios where metrics were not being reported accurately:
Previously in-progress tasks that completed between collection cycles: Tasks that were in-progress in one collection run but completed before the next run were not included in metrics, causing their final states (especially errors) to be missed.
Short-lived tasks that started and completed between collection cycles: Tasks that started and completed entirely between two 15-minute collection cycles were never captured in metrics.
Solution
1. Track Previously In-Progress Tasks
_previousInProgressTasksmap to maintain state of tasks that were in-progress in the previous execution cycle2. Detect Short-Lived Tasks
getTasksStartedAfter()method inPinotHelixTaskResourceManagerthat usesWorkflowContext.getJobStartTimes()to efficiently find tasks that started after a timestamp3. Dynamic Timestamp Initialization
_initializationTimestamp(set at object construction) with dynamic calculation usingcurrentTime - frequencyMson first executionKey Changes
Files Modified
TaskMetricsEmitter.java
_previousInProgressTasksmap to track tasks across cycles_previousExecutionTimestampto per-task-type_previousExecutionTimestampsmap_taskMetricsEmitterFrequencyMsPinotHelixTaskResourceManager.java
getTasksStartedAfter(taskType, afterTimestampMs)methodWorkflowContext.getJobStartTimes()for efficient batch retrievalTaskMetricsEmitterTest.java
taskCompletedBetweenRuns,taskShortLived)Testing
Benefits
Backward Compatibility
✅ Fully backward compatible - existing functionality is preserved, only extended to cover the missing scenarios.