Skip to content

Add TABLE_TENANT_INFO controller gauge for table-to-tenant mapping#18823

Merged
KKcorps merged 2 commits into
apache:masterfrom
arunkumarucet:feature/table-tenant-info-metric
Jun 26, 2026
Merged

Add TABLE_TENANT_INFO controller gauge for table-to-tenant mapping#18823
KKcorps merged 2 commits into
apache:masterfrom
arunkumarucet:feature/table-tenant-info-metric

Conversation

@arunkumarucet

@arunkumarucet arunkumarucet commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds a new TABLE_TENANT_INFO controller gauge emitted by SegmentStatusChecker that encodes both the tenant type and tenant name as key segments in the JMX metric name: pinot.controller.tableTenantInfo.<tableNameWithType>.<tenantType>.<tenantName> = 1
  • Covers all three tenant types per table: server (server tenant), broker (broker tenant), and tier (tier server tenant, when tier configs exist)
  • Adds a dedicated JMX exporter rule in controller.yml that extracts table, tableType, tenantType, tenant, and database as Prometheus labels
  • Enables tenant-scoped aggregation of any existing table-level metric in Prometheus via a group_left(tenant) join — no changes to broker/server metric pipelines required

Motivation

Previously there was no way to aggregate table-scoped metrics (e.g. numDocsScanned, segment counts) by tenant in Prometheus/Grafana without scattered, disruptive changes to add a tenant tag throughout the metrics pipeline. This approach exposes the table→tenant mapping as a standalone info metric that Prometheus can join against.

Aggregate across all tenants:

sum by (tenant) (
  sum by (table) (pinot_server_numDocsScanned_OneMinuteRate{...})
  * on(table) group_left(tenant)
  pinot_controller_tableTenantInfo
)

Filter to a specific tenant (e.g. DefaultTenant):

sum by (tenant) (
  sum by (table) (pinot_server_numDocsScanned_OneMinuteRate{...})
  * on(table) group_left(tenant)
  pinot_controller_tableTenantInfo{tenant="DefaultTenant"}
)

Filter by tenant type (e.g. only server tenants):

sum by (tenant) (
  sum by (table) (pinot_server_numDocsScanned_OneMinuteRate{...})
  * on(table) group_left(tenant)
  pinot_controller_tableTenantInfo{tenantType="server"}
)

The tenant and tenantType labels can be used in any label matcher (=, !=, =~, !~) wherever PromQL label selectors are supported — in dashboards, alerts, and recording rules.

Implementation

JMX metric name pattern:

pinot.controller.tableTenantInfo.<tableNameWithType>.<tenantType>.<tenantName>

Prometheus output (via JMX exporter):

pinot_controller_tableTenantInfo_Value{table="airlineStats", tableType="OFFLINE", tenantType="server", tenant="DefaultTenant"} 1
pinot_controller_tableTenantInfo_Value{table="airlineStats", tableType="OFFLINE", tenantType="broker", tenant="DefaultTenant"} 1
pinot_controller_tableTenantInfo_Value{table="airlineStats", tableType="OFFLINE", tenantType="tier",   tenant="tierTenant"}    1

Emission strategy:

  • Gauges are written only once per (table, tenantType, tenantName) tuple — on first registration or when the tenant assignment changes. Not re-emitted on every 5-minute SegmentStatusChecker cycle (early-return when the key set is unchanged).
  • _tableTenantMap tracks the current set of compound keys per table so stale gauges are removed on: tenant change, null table config, and table removal (nonLeaderCleanup).
  • New gauges are registered before removing stale ones on an assignment change, to avoid a scrape-window gap.

Test plan

  • tableTenantInfoGaugeNamedTenantTest — named server and broker tenants are both registered
  • tableTenantInfoGaugeDefaultTenantFallbackTest — server and broker fall back to DefaultTenant when unconfigured
  • tableTenantInfoGaugeTierTenantTest — tier server tenant is extracted from the tier's server tag
  • tableTenantInfoGaugeTenantChangeCleansStaleGaugeTest — stale gauge removed when server tenant changes
  • tableTenantInfoGaugeTableRemovedCleansUpTest — all gauges cleaned up via nonLeaderCleanup
  • tableTenantInfoGaugeRealtimeTableTest — REALTIME table type covered
  • Verified locally via batch quickstart: 7 MBeans registered across 3 tables (server + broker + tier where applicable), all value=1

…ing via JMX

Emit a per-table `tableTenantInfo` gauge from `SegmentStatusChecker` with the
server tenant name embedded as an extra key segment in the metric name:

  pinot.controller.tableTenantInfo.<tableNameWithType>.<serverTenant> = 1

This lets Prometheus scrape the metric via the JMX exporter and use a
`group_left(tenant)` join to attach the tenant label to any existing
table-scoped metric without modifying the core metrics pipeline.

Implementation details:
- The gauge is registered only on first encounter or when the tenant changes,
  avoiding redundant writes on every 5-minute SegmentStatusChecker cycle.
- Stale gauges are cleaned up on tenant change, null config, and table removal,
  tracked via an internal `_tableTenantMap`.
- A dedicated JMX exporter rule in `controller.yml` extracts `table`,
  `tableType`, `tenant`, and `database` labels. The rule is placed before the
  generic tableNameWithType rules to ensure the tenant segment is captured.
@codecov-commenter

codecov-commenter commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 77.50000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.78%. Comparing base (14bc147) to head (1a7a283).
⚠️ Report is 22 commits behind head on master.

Files with missing lines Patch % Lines
...e/pinot/controller/helix/SegmentStatusChecker.java 76.92% 4 Missing and 5 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18823      +/-   ##
============================================
+ Coverage     64.76%   64.78%   +0.02%     
- Complexity     1319     1322       +3     
============================================
  Files          3392     3393       +1     
  Lines        210949   211275     +326     
  Branches      33119    33220     +101     
============================================
+ Hits         136611   136884     +273     
- Misses        63323    63332       +9     
- Partials      11015    11059      +44     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.78% <77.50%> (+0.02%) ⬆️
temurin 64.78% <77.50%> (+0.02%) ⬆️
unittests 64.78% <77.50%> (+0.02%) ⬆️
unittests1 57.00% <100.00%> (+0.04%) ⬆️
unittests2 37.15% <77.50%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

if (serverTenant.equals(previousTenant)) {
return;
}
_controllerMetrics.setOrUpdateTableGauge(tableNameWithType, serverTenant, ControllerGauge.TABLE_TENANT_INFO, 1L);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we not tracking broker tenant? Also we may need to consider the tiered tenants as well. Checkout TableConfigUtils.isRelevantToTenant which pulls all the relevant tenants for a table. We can build on this util to expose a label tenantType (server, broker, tier, etc) on the metric. Wdyt?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks! I hadn't considered the broker and tier tenants. Updated the gauge to emit one series per tenant type using a compound key <tenantType>.<tenantName> embedded in the JMX metric name — so the JMX exporter now produces three Prometheus label combinations per table: tenantType=server, tenantType=broker, and tenantType=tier (the last only when tier configs exist). Verified locally — 7 MBeans across the batch quickstart tables, all value=1.

Per reviewer feedback, extend the tableTenantInfo gauge to track all
relevant tenants for a table, not just the server tenant.

The compound key "<tenantType>.<tenantName>" is now embedded in the JMX
metric name, giving three gauge series per table:
  - server.<serverTenant> — server tenant from TenantConfig
  - broker.<brokerTenant> — broker tenant from TenantConfig
  - tier.<tierTenant>     — per tier's server tenant (when tier configs exist)

The JMX exporter rule in controller.yml now extracts both `tenantType`
and `tenant` as Prometheus labels. Existing PromQL group_left(tenant)
queries continue to work; `tenantType` is available as an additional
filter dimension.

Verified locally: 7 MBeans registered across airlineStats_OFFLINE,
baseballStats_OFFLINE, clickstreamFunnel_OFFLINE — all value=1.

@shounakmk219 shounakmk219 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!
Please update the PR description with the new changes before merging

@KKcorps KKcorps merged commit 3be6236 into apache:master Jun 26, 2026
13 checks passed
xiangfu0 added a commit to pinot-contrib/pinot-docs that referenced this pull request Jun 26, 2026
## What changed for readers
- documents the new controller `TABLE_TENANT_INFO` gauge used for
table-to-tenant attribution
- explains which tenant mappings Pinot emits and which Prometheus labels
the bundled controller exporter exposes

## Structural changes
- updates `reference/configuration-reference/controller.md`
- updates `reference/configuration-reference/monitoring-metrics.md`

## Source cross-check
- verified against merged apache/pinot source for
`SegmentStatusChecker`, `ControllerGauge`, and the bundled controller
JMX exporter config in PR #18823

## Validation
- `git diff --check`
- targeted text checks for the new metric references
@xiangfu0

Copy link
Copy Markdown
Contributor

Docs follow-up merged in pinot-contrib/pinot-docs#890: pinot-contrib/pinot-docs#890

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants