-
Notifications
You must be signed in to change notification settings - Fork 347
(feat) #3337 REST catalog metrics table #3348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
(feat) #3337 REST catalog metrics table #3348
Conversation
|
|
||
| ### Metrics Event Data | ||
|
|
||
| The `AfterReportMetricsEvent` captures the following data in `additional_properties`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not sure if persisting the metrics is technically an event, my understanding of event is something that happened while process for example :
- log request
- processing in the middle
- log response
here there is no processing except persisting the metrics to persistence and then response is nothing but 200 OK, have we considered just persisting the metrics reports in seperate table and then using the join of the both the tables ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having this as an event opens up the possibility of firing this off to Kafka or any other external source if want to gather the metrics there. Likewise, we can plugin in another listener to send it to CloudWatch logs. Also, storing it in the single events table allows a single query to capture all audit trail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The big negative of a separate metrics table is trace correlation complexity. Correlating metrics with other events (e.g., "show me everything that happened for this commit") requires explicit JOINs on otel_trace_id across tables. There is also schema migration overhead and introducing new Java classes with duplication effort to persist.
##IMPORTANT - DO NOT MERGE
This draft PR currently contains commits from #3327 as well. When that PR is merged, this PR will be rebased against main.
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)Summary
This PR implements a flexible, configurable metrics persistence system for Apache Polaris that captures Iceberg ScanReport and CommitReport data. The implementation provides multiple storage options to accommodate different use cases, from simple audit logging to advanced analytics.
Motivation
Compute engines (Spark, Trino, Flink) send metrics reports to Polaris after query execution, including:
Previously, these metrics were logged but not persisted, making it impossible to:
Implementation
Metrics Storage Options
This PR introduces four configurable reporter types:
defaulteventspersistencecompositeNew Components
1. EventsMetricsReporter
Persists metrics to the existing events table as JSON, providing a unified audit trail.
ScanReportCommitReportadditional_properties2. PersistingMetricsReporter
Persists metrics to dedicated tables with typed columns for efficient querying.
scan_metrics_reportsandcommit_metrics_reportstables3. CompositeMetricsReporter
Delegates to multiple reporters simultaneously, enabling flexible deployment patterns.
4. MetricsReportCleanupService
Scheduled service for automatic cleanup of old metrics data.
Configuration Examples
Option 1: Logging Only (Default)
Option 2: Events Table
Option 3: Dedicated Tables
Option 4: Composite (Multiple Targets)
With Retention Policy
Benefits by Storage Option
Events Table (
type: events)Dedicated Tables (
type: persistence)otel_trace_idandotel_span_idcolumnsComposite (
type: composite)Testing
EventsMetricsReporterTest: Verifies events are correctly created and persistedCompositeMetricsReporterTest: Verifies delegation to multiple reportersMetricsReportPersistenceTest: End-to-end persistence with H2 databaseExample Queries
Data scanned by user:
Correlate with OpenTelemetry:
Migration Notes
type: default(logging only)Related PRs
Configuration
No new configuration required. The feature uses existing infrastructure: