Skip to content

[FLINK-Web][Web Dashboard] Add Top N Metrics Dashboard feature#27771

Closed
featzhang wants to merge 1 commit intoapache:masterfrom
featzhang:feature/FLINK-top-n-metrics-dashboard
Closed

[FLINK-Web][Web Dashboard] Add Top N Metrics Dashboard feature#27771
featzhang wants to merge 1 commit intoapache:masterfrom
featzhang:feature/FLINK-top-n-metrics-dashboard

Conversation

@featzhang
Copy link
Member

Purpose

This PR adds a Top N Metrics Dashboard to the Flink Web UI, providing operators with enhanced visibility into resource-intensive components in their Flink jobs. The dashboard displays three critical metric categories in real-time:

  1. Top N CPU Consumers - Tasks consuming the most CPU resources
  2. Top N Backpressure Operators - Operators experiencing the highest backpressure ratios
  3. Top N GC Intensive Tasks - Tasks with the highest garbage collection overhead

This feature enables faster identification of performance bottlenecks and helps operators optimize job execution by quickly locating problematic components.

Change log

Backend Changes:

  • Added TopNMetricsHandler.java - REST API handler for Top N metrics aggregation
  • Added TopNMetricsHeaders.java - REST endpoint definition and response type specification
  • Added TopNMetricsMessageParameters.java - Request parameter definitions
  • Added TopNMetricsResponseBody.java - Response body structure with three inner classes

Frontend Changes:

  • Added TopNMetricsService.ts - Service to fetch Top N metrics from backend
  • Added topn-metrics.ts - TypeScript interface definitions
  • Added TopNMetricsComponent - Angular component with HTML template and Less styles
  • Added OverviewDemoComponent - Demo page showcasing the feature

New REST API:

  • GET /jobs/:jobid/metrics/top-n - Returns Top N metrics for a specified job

Verifying

  1. Build the Flink project: mvn clean install -DskipTests
  2. Build the web dashboard: cd flink-runtime-web/web-dashboard && npm install && npm run build
  3. Start a local Flink cluster with a sample job
  4. Access the demo page at /overview-demo to see the Top N Metrics Dashboard
  5. For integration testing, navigate to /jobs/{jobId}/metrics/top-n to verify API returns valid data

Impact

Scope:

  • New feature addition to the Flink Web UI
  • Does not modify existing API behavior
  • Only adds a new REST endpoint and frontend components

Performance:

  • Minimal overhead as metrics are already being collected and cached
  • No impact on job execution performance

Compatibility:

  • Fully backward compatible
  • Optional feature

Documentation

Documentation updates needed for:

  • Web UI documentation
  • REST API reference
  • Monitoring and debugging guides

This commit introduces a Top N Metrics Dashboard to the Flink Web UI,
providing visibility into resource-intensive components:

- Top N CPU Consumers: Identify tasks with highest CPU usage
- Top N Backpressure Operators: Highlight operators experiencing backpressure
- Top N GC Intensive Tasks: Show tasks with highest GC overhead

The implementation includes:
- REST API endpoint: /jobs/:jobid/metrics/top-n
- Response body with three metric categories
- Angular components for displaying metrics
- Demo page showcasing the feature

This feature helps operators quickly identify performance bottlenecks
and optimize job execution.
@flinkbot
Copy link
Collaborator

flinkbot commented Mar 15, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants