(Proposal) for REST API to access events and metrics#3924
(Proposal) for REST API to access events and metrics#3924obelix74 wants to merge 12 commits intoapache:mainfrom
Conversation
| **Response:** | ||
| ```json | ||
| { | ||
| "nextPageToken": "eyJ0cyI6MTcwOTMzNzYxMjM0NSwiaWQiOiI1NTBlODQwMCJ9", |
There was a problem hiding this comment.
Let's make sure that any potential Events APIs are in line with the Iceberg Events API documentations. The Iceberg Events API is close to consensus and we should ensure that we do not merge anything that would not be close-to-future spec incompatible.
There was a problem hiding this comment.
There was a problem hiding this comment.
Pushed a commit to bring this in sync with the PR you linked above.
| |--------|------|-------------| | ||
| | GET | `/api/management/v1/catalogs/{catalogName}/events` | List events for a catalog | | ||
| | GET | `/api/management/v1/catalogs/{catalogName}/events/{eventId}` | Get a specific event | | ||
| | GET | `/api/management/v1/catalogs/{catalogName}/namespaces/{namespace}/tables/{table}/scan-metrics` | List scan metrics for a table | |
There was a problem hiding this comment.
/api/management appears to be the prefix for the API that "changes" the Polaris Catalog and RBAC data at user's requests.
Scan reports appear to be about accessing pre-populated records without "managing" anything.
In other words a server that does not support end users to manage catalogs, may still expose scan metrics (etc.) for access.
WDYT about /api/metrics-reports/v1/... and /api/events/v1/... for events?
There was a problem hiding this comment.
+1 My initial though too was that /management API is the right place for these endpoints, but moving it to /events and /metrics-reports instead is more convincing indeed.
There was a problem hiding this comment.
@dimas-b and @nandorKollar Please see https://docs.google.com/document/d/1WtIsNGVX75-_MsQIOJhXLAWg6IbplV4-DkLllQEiFT8/edit?tab=t.0 and apache/iceberg#12584.
Looks like the Iceberg events API is /v1/{prefix}/events:. I have updated the markdown to be in sync. Is this ok?
There was a problem hiding this comment.
Thanks, I wasn't aware of the aforementioned doc. Since events will hopefully be part of Iceberg REST spec, /api/catalog/v1/{prefix}/events sounds right for me. Nonetheless, the other two endpoints are still questionable for me, I'd rather take @dimas-b 's recommendation, and use /metrics-reports instead.
There was a problem hiding this comment.
@nandorKollar ah yes, missed that. Pushed a commit with that change.
There was a problem hiding this comment.
It's totally fine for Polaris to support the events endpoint from the IRC spec when it is approved.
However, we should not assume a particular IRC spec ahead of time.
If Polaris is to expose events before any similar endpoint is defined in IRC, I believe Polaris should use a spec of its own.
With that in mind /api/catalog is the prefix for the IRC API in Polaris. I do not think it would be wise to reuse it for Polaris' own endpoints.
There was a problem hiding this comment.
@dimas-b so you prefer /api/events/v1 correct? @adnanhemani are you ok with this?
There was a problem hiding this comment.
yes (/api/events/v1), I believe it is best to avoid URI path clashes with IRC events API until that spec is approved.
|
|
||
| | Endpoint | Required Privilege | Scope | | ||
| |----------|-------------------|-------| | ||
| | List/Get Events | `CATALOG_MANAGE_METADATA` | Catalog | |
There was a problem hiding this comment.
"Manage" access may be too broad a grant for someone who only needs to "read" events. Why not add a new permission?
... same for reading scan/commit metrics?
There was a problem hiding this comment.
I've updated the proposal to introduce dedicated read-only privileges:
- CATALOG_READ_EVENTS - for read-only access to catalog events
- TABLE_READ_METRICS - for read-only access to scan/commit metrics
|
|
||
| ### 5.2 Rationale | ||
|
|
||
| - **Events** contain catalog-wide audit information and should require catalog-level administrative access |
There was a problem hiding this comment.
What about separation of duties? Do all people who are authorized to "see" events have to be authorized to make changes to (manage) the catalog?
There was a problem hiding this comment.
I've updated the proposal to address this:
- A new CATALOG_READ_EVENTS privilege is introduced specifically for read-only event/audit access
- CATALOG_MANAGE_METADATA will imply CATALOG_READ_EVENTS (so admins can still see events)
- But CATALOG_READ_EVENTS does not imply management access
This allows security auditors or compliance tools to read events without having any catalog management capabilities.
| ### 5.2 Rationale | ||
|
|
||
| - **Events** contain catalog-wide audit information and should require catalog-level administrative access | ||
| - **Metrics** are table-specific and align with read access since they describe query patterns and commit history |
There was a problem hiding this comment.
I agree that whoever can read data, should be able to read scan/commit reports. However, is the reverse true? I suppose separation of duties applies here too. For example, a tool that reads those reports does not necessarily have to be authorized to read tables. WDYT?
There was a problem hiding this comment.
I've updated the proposal:
- A new TABLE_READ_METRICS privilege is introduced for metrics-only access
- TABLE_READ_DATA will imply TABLE_READ_METRICS (users who can read data can see metrics about their queries)
- But TABLE_READ_METRICS does not imply data access
site/content/in-dev/unreleased/proposals/observability-rest-api.md
Outdated
Show resolved
Hide resolved
I sent an email last evening (or so I thought), it was sitting in my Outbox (MS Exchange!) till now. |
Updates the Observability REST API proposal to align with the emerging Iceberg Events API specification (apache/iceberg#12584) per review feedback. Key changes: - Changed Events endpoint from GET to POST with request body - Moved Events API to Iceberg REST Catalog path (/api/catalog/v1/{prefix}/events) - Adopted Iceberg event structure (event-id, request-id, timestamp-ms, operation) - Added standard operation types (create-table, update-table, drop-table, etc.) - Added Polaris custom operation types with x-polaris-* prefix convention - Updated OpenAPI schemas for Iceberg compatibility - Added Section 8 documenting Iceberg alignment rationale - Added mapping table from Polaris internal events to Iceberg operations References: - Iceberg Events API PR: apache/iceberg#12584 - Iceberg Events API Doc: https://docs.google.com/document/d/1WtIsNGVX75-_MsQIOJhXLAWg6IbplV4-DkLllQEiFT8
Updates the Authorization section to address separation of duties concerns: - Replace CATALOG_MANAGE_METADATA with new CATALOG_READ_EVENTS privilege - Replace TABLE_READ_DATA with new TABLE_READ_METRICS privilege - Add detailed rationale for separation of duties - Document privilege hierarchy (higher privileges imply lower ones) - Add implementation notes for new privileges This ensures: - Read-only audit access does not require management permissions - Monitoring tools can access metrics without requiring data read access - Fine-grained access control is possible for different operational roles
Address reviewer feedback to clarify that OpenAPI YAML should be in separate files upon approval for ease of processing.
Address reviewer feedback that metrics reports are read-only access to pre-populated data, not catalog management operations: - Change base path from /api/management/v1/ to /api/metrics-reports/v1/ - Update all example requests with new path - Specify new spec file: spec/metrics-reports-service.yml - Update Files to Modify section with new service locations - Add rationale explaining separation from management API This allows servers that don't support catalog management to still expose metrics reports for monitoring and observability.
| discriminator: | ||
| propertyName: operation-type | ||
| mapping: | ||
| create-table: "#/components/schemas/CreateTableOperation" |
There was a problem hiding this comment.
We should think about how to map Polaris event types (PolarisEventType enum values) to these operation-types. In Polaris we separate events to before and after operation (BEFORE_CREATE_CATALOG and AFTER_CREATE_CATALOG) how can we map these to an operation-type? Or they're independent, should we use completely separate events here? Probably everything fits in CustomOperation, but if possible, we should use these standardised event types, though in many cases it isn't possible, role and principal related events can only be mapped to CustomOperation.
There was a problem hiding this comment.
@nandorKollar I took a pass on this in a separate commit. I am most likely wrong, but let me know what you think.
Address review comment from nandorKollar regarding how Polaris internal event types (BEFORE_*/AFTER_*) map to Iceberg Events API operation-types. New section 7.5 covers: - Design decision: Only AFTER_* events exposed (completed operations) - Complete mapping table from PolarisEventType to operation-type - Standard Iceberg ops map directly, Polaris-specific use x-polaris-* prefix - Read-only operations (GET/LIST) are excluded - Implementation guidance with event filtering code sample
|
Couple of more higher-level thoughts:
|
@dimas-b would you be ok with a single @adnanhemani this is the 3rd proposal I am writing and I find markdown and github better for a collaboration perspective. I can reason about my changes as can the person who asks the changes. In proposals that grow quite big, I could barely keep up. The cognitive effort to make sure all comments were addressed was very high. |
I do not think any objections to that "in principle". Let's continue this discussion on the specific OpenAPI spec text / diff ... I have not looked thought it end-to-end yet :) |
Address feedback from adnanhemani to use a single metrics endpoint with
metricType as a query parameter instead of separate /scan-metrics and
/commit-metrics endpoints.
Changes:
- Single endpoint: /catalogs/{catalogName}/namespaces/{namespace}/tables/{table}/metrics
- Required query parameter: metricType (enum: scan, commit)
- Unified response schema: ListMetricsResponse with metricType discriminator
- Future-extensible: new metric types can be added without new endpoints
- operation parameter only applicable when metricType=commit
The base path /api/metrics-reports/v1/ already establishes the context,
so the trailing /metrics is redundant.
Before: /api/metrics-reports/v1/catalogs/{catalog}/namespaces/{ns}/tables/{table}/metrics
After: /api/metrics-reports/v1/catalogs/{catalog}/namespaces/{ns}/tables/{table}
Pushed a commit with this change. |
dimas-b
left a comment
There was a problem hiding this comment.
LGTM with just a few minor comments
|
|
||
| | Custom Operation Type | Description | | ||
| |----------------------|-------------| | ||
| | `x-polaris-create-catalog-role` | Catalog role created | |
There was a problem hiding this comment.
I believe the x- prefix is mostly obsolete these days... HTTP headers no longer promote its use, AFAIK.
The polaris- prefix alone is probably sufficient to disambiguate.
| - **Operation-centric model**: Events are structured around operations (create-table, update-table, etc.) | ||
| - **Custom extensions**: Support for `x-` prefixed custom operation types for Polaris-specific events | ||
|
|
||
| #### Request Body (`QueryEventsRequest`) |
There was a problem hiding this comment.
just to confirm: this matches the Iceberg events API proposal, right?
There was a problem hiding this comment.
The request body properties (continuation-token, page-size, after-timestamp-ms, operation-types, catalog-objects-by-name, catalog-objects-by-id, object-types, custom-filters) already match the Iceberg Events API proposal from the Google Doc.
| - For metricType=scan: ScanMetricsReport objects | ||
| - For metricType=commit: CommitMetricsReport objects | ||
| items: | ||
| oneOf: |
There was a problem hiding this comment.
I believe this definition allows each array element to be either a scan report or a commit report and the type is independent of other elements.... or am I mistaken?
There was a problem hiding this comment.
WDYT about making ListMetricsResponse polymorphic at the top level?.. or at least in the property the contains report data?
There was a problem hiding this comment.
Thank you! When I replaced the two endpoints with one, this broke. I followed your suggestion for ListMetricsResponse - that is an elegant solution. Thank you.
|
|
||
| --- | ||
|
|
||
| ## 3. Design Principles |
There was a problem hiding this comment.
nit: maybe briefly mention that the new API will process Polaris realms the same way as the existing APIs
Changes based on reviewer comments:
1. Use /api/events/v1/{prefix} instead of /api/catalog/v1/{prefix}/events
to avoid URI path clashes with Iceberg REST Catalog events API until
that spec is approved (dimas-b)
2. Change x-polaris-* to polaris-* for custom operation types since the
x- prefix is obsolete (dimas-b)
3. Add note confirming QueryEventsRequest matches Iceberg Events API
specification for ecosystem compatibility (dimas-b)
4. Update spec file references from rest-catalog-open-api.yaml to new
events-service.yml
Address dimas-b feedback: the previous oneOf at item level incorrectly allowed mixing scan and commit reports in a single response. Now using discriminator at response level: - ListMetricsResponse is polymorphic with metricType discriminator - ListScanMetricsResponse: metricType=scan, reports array of ScanMetricsReport - ListCommitMetricsResponse: metricType=commit, reports array of CommitMetricsReport This correctly expresses that all items in a response are of the same type.
Address dimas-b feedback: mention that the new APIs process Polaris realms consistently with existing APIs.
|
I think the persistence layer is a precondition for this API, it seems that we don't persist most of the events? @dimas-b do you happen to know, am I looking the right point where we persist the events? |
Address nandorKollar's observation that PolarisPersistenceEventListener currently only persists AFTER_CREATE_TABLE and AFTER_CREATE_CATALOG events. Added new section 7.1 documenting: - Current state: only 2 event types are persisted - Required changes: list of all AFTER_* mutation events to add - Implementation approach: phased rollout (Iceberg ops, then Polaris-specific) This is a prerequisite for the Events API to be useful.
You are correct. I added a listener that extends InMemoryBufferEventListener and persists many more events: Table Operations: Namespace Operations: View Operations: Catalog Operations: I have not persisted principal / role management, policy events and a few others (I should add them). Adding a section to the proposal highlighting this. |
|
|
||
| #### Implementation Approach | ||
|
|
||
| 1. **Phase 1**: Extend `PolarisPersistenceEventListener` to handle all Iceberg-standard operations (tables, views, namespaces) |
There was a problem hiding this comment.
I am not sure if this should be made a Polaris standard implementation. Users can implement their own listeners to persist the kind of events they are interested in.
There was a problem hiding this comment.
Definitely what to persist is outside the scope of this API proposal, which is about how to retrieve what has been persisted :)
Did you already extend the |
|
Event persistence was added in #1844 , I think... was something missing? |
|
|
||
| | Operation Type | Description | | ||
| |----------------|-------------| | ||
| | `create-table` | Table created and committed | |
There was a problem hiding this comment.
Mapping these type code from PolarisEventType is non-trivial. Polaris has before/after events and exact names do not match. Could you explicitly list how we will deduce API-level type codes from the java enum?
There was a problem hiding this comment.
TBH, this may be a bit of an obstacle to supporting the "expected" Iceberg events API spec.
I would not mind having Polaris-specific type codes for now and dealing with the IRC events API when it gets approved. WDYT?
There was a problem hiding this comment.
@dimas-b I think there's a section in the doc which talks about this mapping: https://github.com/apache/polaris/pull/3924/changes#diff-4067fb5547bab712380d93507445f600e2ab164380e5cbe5a8e54011289f0d65R1211
Before events are not taken into account, which sounds like a reasonable choice for me. The proposal in the Iceberg spec has a Custom event category, I think if we need to list BEFORE* events too, then we can use that.
There was a problem hiding this comment.
ok, I'll leave it up to you then :)
From my personal POV, it would be preferable to have a "solid" set of event types based on current Polaris code than hypothetical types from an in-progress spec... but I do not mind using the proposed types from IRC as long as we're prepared for spec changes here.
A side benefit to using Polaris type codes is that the data in the database, in events SPI, and REST API would use the same type codes. It could be significant in deployments that use several of those sources, but I do not have a concrete use case in mind.
There was a problem hiding this comment.
Event serialization (JSON) would also be easier to handle with Polaris's own data structures and type codes.
There was a problem hiding this comment.
The mapping is: only AFTER_* events are exposed via the API (the BEFORE_* events are internal and not persisted to the EVENTS table). The translation from Java enum to API-level type code strips the AFTER_ prefix and lowercases with hyphens — e.g., AFTER_CREATE_TABLE → create-table, AFTER_DROP_TABLE → drop-table. I'll add an explicit mapping table to the proposal to make this unambiguous.
oh, I did not commit it to Polaris, for our deployment, we depend on runtime jars and we build our own distribution. I added my custom event listener (which was also creating new prometheus metrics). Please take on that subtask. That will help. |
yes, but only two events were persisted. Looking at PolarisPersistenceEventListener.java, the onEvent switch only handles:
|
Yes, but seems that the event lister implementation, which could be the bridge between the event source and the persistence layer handles only a handful events. Or everyone is supposed to write it's own event lister to call the persistence layer and to map the right lifecycle events to event entities? |
|
I think we should also think about catalog federation too. I think we should not federate events, Polaris should be the source of truth for each event (details about federation are documented here: https://docs.google.com/document/d/1Q6eEytxb0btpOPcL8RtkULskOlYUCo_3FLvFRnHkzBY/edit?tab=t.0) both in IRC and HMS federation scenarios. |
dimas-b
left a comment
There was a problem hiding this comment.
The PR LGTM as a proposal. I suppose specific API PRs are still going to be open for further comments and nitpicking when an end-to-end OpenAPI yaml is created :)
This is to say that from my POV the approach outlined in this proposal is good to proceed with.
If other reviews what to take more time on this PR, it's fine too, from my POV.
| |-----------|------|----------|---------|-------------| | ||
| | `metricType` | string | **Yes** | - | Type of metrics to retrieve: `scan` or `commit` | | ||
| | `pageToken` | string | No | - | Cursor for pagination | | ||
| | `pageSize` | integer | No | 100 | Results per page (max: 1000) | |
There was a problem hiding this comment.
IIRC, the IRC API uses hyphenated param names, like page-size. WDYT about using the same pattern here?
nit: I personally just like hyphenated query param names too. I think there are nice on curl CLI 😉
I assume this proposal is limited to events stored in Polaris. In general (future), if Polaris federates catalogs, why would we not want to federate events (which are going to the part of the catalog API in Iceberg)?.. but this is a different topic, probably worth a separate discussion 🤔 |
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)