Skip to content

Conversation

@obelix74
Copy link

@obelix74 obelix74 commented Jan 2, 2026

##IMPORTANT - DO NOT MERGE

This draft PR currently contains commits from #3327 as well. When that PR is merged, this PR will be rebased against main.

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

Summary

This PR implements a flexible, configurable metrics persistence system for Apache Polaris that captures Iceberg ScanReport and CommitReport data. The implementation provides multiple storage options to accommodate different use cases, from simple audit logging to advanced analytics.

Motivation

Compute engines (Spark, Trino, Flink) send metrics reports to Polaris after query execution, including:

  • ScanReport: Files scanned, bytes read, planning duration, filter information
  • CommitReport: Records added/deleted, files modified, operation type, duration

Previously, these metrics were logged but not persisted, making it impossible to:

  • Analyze query patterns over time
  • Attribute data access costs to teams or users
  • Correlate metrics with OpenTelemetry traces
  • Build dashboards for operational visibility

Implementation

Metrics Storage Options

This PR introduces four configurable reporter types:

Reporter Type Storage Location Use Case
default Console logs only Development, no persistence needed
events Events table (JSON) Simple audit trail, unified event storage
persistence Dedicated tables Analytics, efficient queries, typed columns
composite Multiple targets Audit + analytics, migration scenarios

New Components

1. EventsMetricsReporter

Persists metrics to the existing events table as JSON, providing a unified audit trail.

  • Stores ScanReport as event_type ScanReport
  • Stores CommitReport as event_type CommitReport
  • Full metrics data serialized as JSON in additional_properties
  • Leverages existing events infrastructure

2. PersistingMetricsReporter

Persists metrics to dedicated tables with typed columns for efficient querying.

  • Writes to scan_metrics_reports and commit_metrics_reports tables
  • Includes OpenTelemetry trace/span IDs for correlation
  • Supports principal attribution for cost analysis
  • Requires relational-jdbc persistence backend

3. CompositeMetricsReporter

Delegates to multiple reporters simultaneously, enabling flexible deployment patterns.

  • Calls each configured target reporter in order
  • Continues with remaining reporters if one fails
  • Enables audit + analytics use cases

4. MetricsReportCleanupService

Scheduled service for automatic cleanup of old metrics data.

  • Configurable retention period (default: 30 days)
  • Configurable cleanup interval (default: 6 hours)
  • Batch size limit to prevent long transactions
  • Only operates with relational-jdbc backend

Configuration Examples

Option 1: Logging Only (Default)

polaris:
  iceberg-metrics:
    reporting:
      type: default

Option 2: Events Table

polaris:
  iceberg-metrics:
    reporting:
      type: events

Option 3: Dedicated Tables

polaris:
  iceberg-metrics:
    reporting:
      type: persistence

Option 4: Composite (Multiple Targets)

polaris:
  iceberg-metrics:
    reporting:
      type: composite
      targets:
        - events
        - persistence

With Retention Policy

polaris:
  iceberg-metrics:
    reporting:
      type: persistence
      retention:
        enabled: true
        retention-period: P30D
        cleanup-interval: PT6H
        batch-size: 10000

Benefits by Storage Option

Events Table (type: events)

  • Simple: Single table for all event types
  • Unified: Metrics alongside catalog operations
  • Audit-friendly: Complete chronological record
  • No schema changes: Uses existing events infrastructure

Dedicated Tables (type: persistence)

  • Efficient queries: Typed columns, proper indexes
  • Analytics-ready: Easy aggregation and filtering
  • OpenTelemetry correlation: otel_trace_id and otel_span_id columns
  • Cost attribution: Query by principal, catalog, namespace

Composite (type: composite)

  • Best of both worlds: Audit trail + fast analytics
  • Migration support: Run old and new systems in parallel
  • Redundancy: Multiple storage locations for critical data

Testing

  • EventsMetricsReporterTest: Verifies events are correctly created and persisted
  • CompositeMetricsReporterTest: Verifies delegation to multiple reporters
  • MetricsReportPersistenceTest: End-to-end persistence with H2 database

Example Queries

Data scanned by user:

SELECT principal_name, SUM(total_file_size_bytes) / 1e9 as gb_scanned
FROM scan_metrics_reports
WHERE timestamp_ms > EXTRACT(EPOCH FROM NOW() - INTERVAL '7 days') * 1000
GROUP BY principal_name;

Correlate with OpenTelemetry:

SELECT * FROM scan_metrics_reports
WHERE otel_trace_id = '0af7651916cd43dd8448eb211c80319c';

Migration Notes

  • Existing deployments default to type: default (logging only)
  • No breaking changes to existing behavior
  • Events table storage requires no schema migration
  • Dedicated tables require schema v4 migration

Related PRs

Configuration

No new configuration required. The feature uses existing infrastructure:

# Enable event persistence (already required for audit)
polaris.event-listener.type=persistence-in-memory-buffer

# Enable session tags for CloudTrail correlation (optional)
polaris.features."INCLUDE_SESSION_TAGS_IN_SUBSCOPED_CREDENTIAL"=true

@github-project-automation github-project-automation bot moved this to PRs In Progress in Basic Kanban Board Jan 2, 2026
@obelix74 obelix74 marked this pull request as draft January 2, 2026 22:24

### Metrics Event Data

The `AfterReportMetricsEvent` captures the following data in `additional_properties`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure if persisting the metrics is technically an event, my understanding of event is something that happened while process for example :

  • log request
  • processing in the middle
  • log response

here there is no processing except persisting the metrics to persistence and then response is nothing but 200 OK, have we considered just persisting the metrics reports in seperate table and then using the join of the both the tables ?

Copy link
Author

@obelix74 obelix74 Jan 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having this as an event opens up the possibility of firing this off to Kafka or any other external source if want to gather the metrics there. Likewise, we can plugin in another listener to send it to CloudWatch logs. Also, storing it in the single events table allows a single query to capture all audit trail.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The big negative of a separate metrics table is trace correlation complexity. Correlating metrics with other events (e.g., "show me everything that happened for this commit") requires explicit JOINs on otel_trace_id across tables. There is also schema migration overhead and introducing new Java classes with duplication effort to persist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants