Skip to content

Initialize ServerMetrics in PredownloadScheduler#18608

Open
pradeeee wants to merge 3 commits into
apache:masterfrom
pradeeee:predownload-server-metrics
Open

Initialize ServerMetrics in PredownloadScheduler#18608
pradeeee wants to merge 3 commits into
apache:masterfrom
pradeeee:predownload-server-metrics

Conversation

@pradeeee
Copy link
Copy Markdown

@pradeeee pradeeee commented May 28, 2026

Problem

The PredownloadScheduler runs in a separate predownload container that starts before the main Pinot server process. Its purpose is to pre-download segments from deep storage so the server can start faster.

Previously, the initializeMetricsReporter() method in PredownloadScheduler only created PredownloadMetrics but never initialized the ServerMetrics registry. As a result:

  1. PredownloadMetrics internally depends on ServerMetrics.get() for reporting server-level meters and timers.
  2. Since ServerMetrics was never initialized in the predownload container lifecycle, ServerMetrics.get() always returned the NOOP instance.
  3. This meant no server-level metrics (download latency, failure counts, etc.) were emitted from the predownload container — metrics were silently dropped.

Call chain showing the gap

PredownloadScheduler.start()
  → initializeMetricsReporter()
      → new PredownloadMetrics()       // created
      → ServerMetrics was never set up // missing — falls back to NOOP

In contrast, the normal server startup path initializes ServerMetrics via PinotMetricUtils.getPinotMetricsRegistry() before any metrics are used.

Changes

This PR adds ServerMetrics initialization to PredownloadScheduler.initializeMetricsReporter():

  1. Creates a ServerConf from the existing _pinotConfig
  2. Initializes PinotMetricsRegistry via PinotMetricUtils.getPinotMetricsRegistry()
  3. Creates and registers a ServerMetrics instance with the proper prefix, table-level metrics config, and allowed tables

Graceful fallback

The initialization is wrapped in a try-catch(Throwable) so that if the configured PinotMetricsFactory cannot be created (e.g., because a vendor-specific metrics client is not yet available during the predownload lifecycle), the predownload process continues with the default NOOP ServerMetrics rather than crashing. This ensures the predownload container remains resilient regardless of the metrics backend availability.

Before (no ServerMetrics)

void initializeMetricsReporter() {
    _predownloadMetrics = new PredownloadMetrics();
    PredownloadStatusRecorder.registerMetrics(_predownloadMetrics);
}

After (ServerMetrics initialized with fallback)

void initializeMetricsReporter() {
    try {
        ServerConf serverConf = new ServerConf(_pinotConfig);
        PinotMetricsRegistry metricsRegistry =
            PinotMetricUtils.getPinotMetricsRegistry(serverConf.getMetricsConfig());
        ServerMetrics serverMetrics = new ServerMetrics(...);
        serverMetrics.initializeGlobalMeters();
        ServerMetrics.register(serverMetrics);
    } catch (Throwable t) {
        LOGGER.error("Failed to initialize ServerMetrics; "
            + "continuing with NOOP instance", t);
    }

    _predownloadMetrics = new PredownloadMetrics();
    PredownloadStatusRecorder.registerMetrics(_predownloadMetrics);
}

Test Plan

  • Verified that the predownload container starts successfully with and without a metrics factory on the classpath
  • When the metrics factory is available, ServerMetrics is properly initialized and metrics are emitted
  • When the metrics factory is unavailable, the catch block logs the error and the container continues with NOOP metrics without crashing
image

psinghnegi and others added 2 commits May 28, 2026 07:49
…ainer

The PredownloadScheduler runs in a separate predownload container that
starts before the main Pinot server. Previously, it only initialized
PredownloadMetrics but did not set up the ServerMetrics registry. This
caused PredownloadMetrics (which internally depends on ServerMetrics)
to silently fall back to the NOOP metrics instance, resulting in no
server-level metrics being emitted from the predownload container.

This change initializes ServerMetrics via PinotMetricUtils in the
initializeMetricsReporter() method. The initialization is wrapped in a
try-catch so that if the underlying metrics factory cannot be created
(e.g., because a vendor-specific metrics client is not yet available
in the predownload lifecycle), the predownload process continues with
the default NOOP ServerMetrics rather than crashing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three test cases covering the initializeMetricsReporter() change:

1. testInitializeMetricsReporterRegistersServerMetrics - verifies that
   ServerMetrics is properly initialized and registered when the
   metrics factory is available

2. testInitializeMetricsReporterFallsBackOnFailure - verifies that
   when PinotMetricUtils throws (e.g., vendor metrics client not
   ready), the scheduler continues with the NOOP ServerMetrics
   instance instead of crashing

3. testInitializeMetricsReporterAlwaysCreatesPredownloadMetrics -
   verifies that PredownloadMetrics is always created and registered
   regardless of whether ServerMetrics initialization succeeds or fails

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Changed catch(Throwable) to catch(Exception) to avoid swallowing
  JVM-level Errors like OutOfMemoryError
- Check ServerMetrics.register() return value and log success or
  error if an instance was already registered
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants