Skip to content

Core: Add table-name filter for MetricsReporter#16574

Open
moomindani wants to merge 1 commit into
apache:mainfrom
moomindani:moomindani/metrics-reporter-table-filter
Open

Core: Add table-name filter for MetricsReporter#16574
moomindani wants to merge 1 commit into
apache:mainfrom
moomindani:moomindani/metrics-reporter-table-filter

Conversation

@moomindani
Copy link
Copy Markdown
Contributor

Closes #16573.

Adds an optional filtering layer above any MetricsReporter implementation that drops ScanReport and CommitReport instances whose tableName() does not pass the configured include / exclude regex. The filter applies uniformly to LoggingMetricsReporter, RESTMetricsReporter, and custom user-supplied reporters. The proposal surfaced in the dev@ DISCUSS thread for #16250 (per-table cardinality of the OTel reporter) and is intentionally scoped as cross-reporter, not OTel-specific.

Design

CatalogUtil.loadMetricsReporter wraps the resolved reporter in a FilteringMetricsReporter when either of the new properties is set. When neither is set, the resolved reporter is returned unchanged — no wrapper instantiated, no runtime overhead on the default path. MetricsReport subtypes that do not expose a table name (anything other than ScanReport / CommitReport) are forwarded without filtering.

Configuration

Two new catalog properties:

metrics-reporter.table-name.include=prod_db\..*
metrics-reporter.table-name.exclude=.*\.tmp_.*

Values are Java regex patterns matched against the table name. When both are set, exclude wins over include (an explicit deny overrides an include). Empty values are treated as not set to avoid accidentally silencing all metrics on misconfiguration. Invalid regex values fail fast at catalog initialization with a clear error pointing at the offending property.

Behavior:

  • include only: forward reports whose table name matches; drop others.
  • exclude only: drop reports whose table name matches; forward others.
  • Both set: drop if exclude matches; otherwise forward only if include matches.
  • Neither set: forward everything (current behavior).

This mirrors the existing route-regex pattern used in iceberg-kafka-connect (IcebergSinkConfig), where a user-supplied regex from configuration is compiled via Pattern.compile() and matched against incoming data. Same trust model: catalog property = admin-controlled.

Disclosure

Per the project's AI-assisted contribution guidelines, I used Claude Code to help draft this work. I reviewed every change by hand and ran the full test/lint loop locally before opening this PR. The design and motivation discussion is in #16573.

cc @ebyhr @jbonofre — happy to address any feedback.

Add an optional filtering layer above any MetricsReporter implementation
that drops ScanReports and CommitReports whose tableName() does not pass
the configured include / exclude regex. Two new catalog properties
control the filter: metrics-reporter.table-name.include and
metrics-reporter.table-name.exclude. Both are Java regex patterns
matched against the table name; when both are set, exclude wins over
include.

When neither property is set, CatalogUtil.loadMetricsReporter returns
the underlying reporter unchanged, so the default code path incurs no
runtime overhead. Empty values are treated as not set to avoid
accidentally silencing all metrics on misconfiguration. Invalid regex
values fail fast at catalog initialization with a clear error pointing
at the offending property.

The filter applies uniformly across all reporter implementations
(LoggingMetricsReporter, RESTMetricsReporter, and custom user-supplied
ones). Reports whose subtype does not expose a table name are forwarded
without filtering.

Closes apache#16573
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Core: Add table-level filtering for MetricsReporter implementations

1 participant