Add TABLE_TENANT_INFO controller gauge for table-to-tenant mapping#18823
Conversation
…ing via JMX Emit a per-table `tableTenantInfo` gauge from `SegmentStatusChecker` with the server tenant name embedded as an extra key segment in the metric name: pinot.controller.tableTenantInfo.<tableNameWithType>.<serverTenant> = 1 This lets Prometheus scrape the metric via the JMX exporter and use a `group_left(tenant)` join to attach the tenant label to any existing table-scoped metric without modifying the core metrics pipeline. Implementation details: - The gauge is registered only on first encounter or when the tenant changes, avoiding redundant writes on every 5-minute SegmentStatusChecker cycle. - Stale gauges are cleaned up on tenant change, null config, and table removal, tracked via an internal `_tableTenantMap`. - A dedicated JMX exporter rule in `controller.yml` extracts `table`, `tableType`, `tenant`, and `database` labels. The rule is placed before the generic tableNameWithType rules to ensure the tenant segment is captured.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18823 +/- ##
============================================
+ Coverage 64.76% 64.78% +0.02%
- Complexity 1319 1322 +3
============================================
Files 3392 3393 +1
Lines 210949 211275 +326
Branches 33119 33220 +101
============================================
+ Hits 136611 136884 +273
- Misses 63323 63332 +9
- Partials 11015 11059 +44
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
| if (serverTenant.equals(previousTenant)) { | ||
| return; | ||
| } | ||
| _controllerMetrics.setOrUpdateTableGauge(tableNameWithType, serverTenant, ControllerGauge.TABLE_TENANT_INFO, 1L); |
There was a problem hiding this comment.
Are we not tracking broker tenant? Also we may need to consider the tiered tenants as well. Checkout TableConfigUtils.isRelevantToTenant which pulls all the relevant tenants for a table. We can build on this util to expose a label tenantType (server, broker, tier, etc) on the metric. Wdyt?
There was a problem hiding this comment.
Good catch, thanks! I hadn't considered the broker and tier tenants. Updated the gauge to emit one series per tenant type using a compound key <tenantType>.<tenantName> embedded in the JMX metric name — so the JMX exporter now produces three Prometheus label combinations per table: tenantType=server, tenantType=broker, and tenantType=tier (the last only when tier configs exist). Verified locally — 7 MBeans across the batch quickstart tables, all value=1.
Per reviewer feedback, extend the tableTenantInfo gauge to track all relevant tenants for a table, not just the server tenant. The compound key "<tenantType>.<tenantName>" is now embedded in the JMX metric name, giving three gauge series per table: - server.<serverTenant> — server tenant from TenantConfig - broker.<brokerTenant> — broker tenant from TenantConfig - tier.<tierTenant> — per tier's server tenant (when tier configs exist) The JMX exporter rule in controller.yml now extracts both `tenantType` and `tenant` as Prometheus labels. Existing PromQL group_left(tenant) queries continue to work; `tenantType` is available as an additional filter dimension. Verified locally: 7 MBeans registered across airlineStats_OFFLINE, baseballStats_OFFLINE, clickstreamFunnel_OFFLINE — all value=1.
shounakmk219
left a comment
There was a problem hiding this comment.
Thanks for working on this!
Please update the PR description with the new changes before merging
## What changed for readers - documents the new controller `TABLE_TENANT_INFO` gauge used for table-to-tenant attribution - explains which tenant mappings Pinot emits and which Prometheus labels the bundled controller exporter exposes ## Structural changes - updates `reference/configuration-reference/controller.md` - updates `reference/configuration-reference/monitoring-metrics.md` ## Source cross-check - verified against merged apache/pinot source for `SegmentStatusChecker`, `ControllerGauge`, and the bundled controller JMX exporter config in PR #18823 ## Validation - `git diff --check` - targeted text checks for the new metric references
|
Docs follow-up merged in pinot-contrib/pinot-docs#890: pinot-contrib/pinot-docs#890 |
Summary
TABLE_TENANT_INFOcontroller gauge emitted bySegmentStatusCheckerthat encodes both the tenant type and tenant name as key segments in the JMX metric name:pinot.controller.tableTenantInfo.<tableNameWithType>.<tenantType>.<tenantName> = 1server(server tenant),broker(broker tenant), andtier(tier server tenant, when tier configs exist)controller.ymlthat extractstable,tableType,tenantType,tenant, anddatabaseas Prometheus labelsgroup_left(tenant)join — no changes to broker/server metric pipelines requiredMotivation
Previously there was no way to aggregate table-scoped metrics (e.g.
numDocsScanned, segment counts) by tenant in Prometheus/Grafana without scattered, disruptive changes to add atenanttag throughout the metrics pipeline. This approach exposes the table→tenant mapping as a standalone info metric that Prometheus can join against.Aggregate across all tenants:
Filter to a specific tenant (e.g.
DefaultTenant):Filter by tenant type (e.g. only server tenants):
The
tenantandtenantTypelabels can be used in any label matcher (=,!=,=~,!~) wherever PromQL label selectors are supported — in dashboards, alerts, and recording rules.Implementation
JMX metric name pattern:
Prometheus output (via JMX exporter):
Emission strategy:
(table, tenantType, tenantName)tuple — on first registration or when the tenant assignment changes. Not re-emitted on every 5-minuteSegmentStatusCheckercycle (early-return when the key set is unchanged)._tableTenantMaptracks the current set of compound keys per table so stale gauges are removed on: tenant change, null table config, and table removal (nonLeaderCleanup).Test plan
tableTenantInfoGaugeNamedTenantTest— named server and broker tenants are both registeredtableTenantInfoGaugeDefaultTenantFallbackTest— server and broker fall back toDefaultTenantwhen unconfiguredtableTenantInfoGaugeTierTenantTest— tier server tenant is extracted from the tier's server tagtableTenantInfoGaugeTenantChangeCleansStaleGaugeTest— stale gauge removed when server tenant changestableTenantInfoGaugeTableRemovedCleansUpTest— all gauges cleaned up vianonLeaderCleanuptableTenantInfoGaugeRealtimeTableTest— REALTIME table type covered