HIVE-29593: Server-side Metrics Reporting for Iceberg operations#6461
HIVE-29593: Server-side Metrics Reporting for Iceberg operations#6461okumin merged 7 commits intoapache:masterfrom
Conversation
43a146c to
0b7b00d
Compare
0b7b00d to
1e82794
Compare
| * @param report the event | ||
| * @param receivedAt the timestamp when the Iceberg REST Catalog received the event | ||
| */ | ||
| void report(String catalog, TableIdentifier identifier, MetricsReport report, Instant receivedAt); |
There was a problem hiding this comment.
FYI: Polaris and Gravitino's SPI
- https://github.com/apache/polaris/blob/apache-polaris-1.4.1/runtime/service/src/main/java/org/apache/polaris/service/reporting/PolarisMetricsReporter.java
- https://github.com/apache/gravitino/blob/v1.2.0/iceberg/iceberg-rest-server/src/main/java/org/apache/gravitino/iceberg/service/metrics/IcebergMetricsStore.java#L38-L47
| .responseCode(400) | ||
| .withType(e.getClass().getSimpleName()) | ||
| .withMessage(e.getMessage()) | ||
| .build()); |
There was a problem hiding this comment.
I happened to find an invalid body throws a 500 error
| warehouseDir = Path.of(MetaStoreTestUtils.getTestWarehouseDir(uniqueTestKey)); | ||
| Files.createDirectories(warehouseDir); | ||
| System.setProperty("derby.stream.error.file", | ||
| warehouseDir.resolve("derby.log").toAbsolutePath().toString()); |
There was a problem hiding this comment.
This log was generated in standalone-metastore/metastore-rest-catalog/derby.log originally, which is not git-ignored.
| final var constructor = clazz.getDeclaredConstructor(Configuration.class); | ||
| return (IcebergMetricsReporter) constructor.newInstance(configuration); | ||
| } catch (NoSuchMethodException | InstantiationException | IllegalAccessException | InvocationTargetException e) { | ||
| throw new IllegalArgumentException("Failed to instantiate IcebergMetricsReporter: {}" + clazz.getName(), e); |
There was a problem hiding this comment.
Remove the {}
throw new IllegalArgumentException("Failed to instantiate IcebergMetricsReporter: " + clazz.getName(), e);
There was a problem hiding this comment.
Thank you. I addressed and tested it.
49704bf
$ docker run --rm -p 9001:9001 --env SERVICE_OPTS='-Dmetastore.iceberg.catalog.metrics.reporters=java.lang.String' apache/hive:standalone-metastore-4.3.0-SNAPSHOT
...
Caused by: java.lang.IllegalArgumentException: Failed to instantiate IcebergMetricsReporter: java.lang.String
| public void close() { | ||
| public void close() throws IOException { | ||
| // The caller is responsible for closing the underlying catalog backing this REST catalog. | ||
| for (IcebergMetricsReporter reporter : metricsReporters) { |
There was a problem hiding this comment.
In case if multiple reported are present and there is exception in closing 1 reporter, rest all will not be closed. Can we do something like this ?
There was a problem hiding this comment.
True. I found the close method would not be invoked, so I added a Servlet lifecycle method and logging for safety.
69c2efd
| // nothing to do here other than checking that we're getting the correct request | ||
| castRequest(ReportMetricsRequest.class, body); | ||
| private RESTResponse reportMetrics(Map<String, String> vars, Object body) { | ||
| final var ident = identFromPathVars(vars); |
There was a problem hiding this comment.
Just nitpicking, can we provide explicit type instead of var here? In reflection code below that is totally understandable but what identFromPathVars returns makes it difficult for me to infer without IDE
| throw new RuntimeException("The default class " + var.defaultVal + " does not exist"); | ||
| } | ||
| String val = conf.get(var.varname); | ||
| return val == null ? conf.getClasses(var.hiveName, defaultClass) : conf.getClasses(var.varname, defaultClass); |
There was a problem hiding this comment.
Added this method so that HMS can pick up fallback variables.
% docker run --rm -p 9001:9001 --env SERVICE_OPTS='-Dmetastore.iceberg.catalog.metrics.reporters=org.apache.iceberg.rest.metrics.LoggingMetricsReporter,org.apache.iceberg.rest.metrics.LoggingMetricsReporter' apache/hive:standalone-metastore-4.3.0-SNAPSHOT
...
2026-05-05T10:41:32,238 INFO [qtp515442419-63] metrics.LoggingMetricsReporter: Event reported at 2026-05-05T10:41:32.238290053Z: catalog=hive, table=default.test, report=ScanReport{tableName=default.test, snapshotId=1000000000001, filter=true, schemaId=0, projectedFieldIds=[1], projectedFieldNames=[id], scanMetrics=ScanMetricsResult{totalPlanningDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT2.644235116S, count=1}, resultDataFiles=CounterResult{unit=COUNT, value=3}, resultDeleteFiles=CounterResult{unit=COUNT, value=0}, totalDataManifests=CounterResult{unit=COUNT, value=1}, totalDeleteManifests=CounterResult{unit=COUNT, value=0}, scannedDataManifests=CounterResult{unit=COUNT, value=1}, skippedDataManifests=CounterResult{unit=COUNT, value=0}, totalFileSizeInBytes=null, totalDeleteFileSizeInBytes=null, skippedDataFiles=null, skippedDeleteFiles=null, scannedDeleteManifests=null, skippedDeleteManifests=null, indexedDeleteFiles=null, equalityDeleteFiles=null, positionalDeleteFiles=null, dvs=null}, metadata={source=dummy-curl, user=test-user, trace-id=dummy-trace-001}}
2026-05-05T10:41:32,244 INFO [qtp515442419-63] metrics.LoggingMetricsReporter: Event reported at 2026-05-05T10:41:32.238290053Z: catalog=hive, table=default.test, report=ScanReport{tableName=default.test, snapshotId=1000000000001, filter=true, schemaId=0, projectedFieldIds=[1], projectedFieldNames=[id], scanMetrics=ScanMetricsResult{totalPlanningDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT2.644235116S, count=1}, resultDataFiles=CounterResult{unit=COUNT, value=3}, resultDeleteFiles=CounterResult{unit=COUNT, value=0}, totalDataManifests=CounterResult{unit=COUNT, value=1}, totalDeleteManifests=CounterResult{unit=COUNT, value=0}, scannedDataManifests=CounterResult{unit=COUNT, value=1}, skippedDataManifests=CounterResult{unit=COUNT, value=0}, totalFileSizeInBytes=null, totalDeleteFileSizeInBytes=null, skippedDataFiles=null, skippedDeleteFiles=null, scannedDeleteManifests=null, skippedDeleteManifests=null, indexedDeleteFiles=null, equalityDeleteFiles=null, positionalDeleteFiles=null, dvs=null}, metadata={source=dummy-curl, user=test-user, trace-id=dummy-trace-001}}
...
2026-05-05T10:41:36,132 INFO [JettyShutdownThread] metrics.LoggingMetricsReporter: Closing org.apache.iceberg.rest.metrics.LoggingMetricsReporter
2026-05-05T10:41:36,133 INFO [JettyShutdownThread] metrics.LoggingMetricsReporter: Closing org.apache.iceberg.rest.metrics.LoggingMetricsReporter
$ docker run --rm -p 9001:9001 --env SERVICE_OPTS='-Dhive.metastore.iceberg.catalog.metrics.reporters=org.apache.iceberg.rest.metrics.LoggingMetricsReporter,org.apache.iceberg.rest.metrics.LoggingMetricsReporter' apache/hive:standalone-metastore-4.3.0-SNAPSHOT
...
2026-05-05T10:50:10,614 INFO [qtp515442419-63] metrics.LoggingMetricsReporter: Event reported at 2026-05-05T10:50:10.614012292Z: catalog=hive, table=default.test, report=ScanReport{tableName=default.test, snapshotId=1000000000001, filter=true, schemaId=0, projectedFieldIds=[1], projectedFieldNames=[id], scanMetrics=ScanMetricsResult{totalPlanningDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT2.644235116S, count=1}, resultDataFiles=CounterResult{unit=COUNT, value=3}, resultDeleteFiles=CounterResult{unit=COUNT, value=0}, totalDataManifests=CounterResult{unit=COUNT, value=1}, totalDeleteManifests=CounterResult{unit=COUNT, value=0}, scannedDataManifests=CounterResult{unit=COUNT, value=1}, skippedDataManifests=CounterResult{unit=COUNT, value=0}, totalFileSizeInBytes=null, totalDeleteFileSizeInBytes=null, skippedDataFiles=null, skippedDeleteFiles=null, scannedDeleteManifests=null, skippedDeleteManifests=null, indexedDeleteFiles=null, equalityDeleteFiles=null, positionalDeleteFiles=null, dvs=null}, metadata={source=dummy-curl, user=test-user, trace-id=dummy-trace-001}}
2026-05-05T10:50:10,619 INFO [qtp515442419-63] metrics.LoggingMetricsReporter: Event reported at 2026-05-05T10:50:10.614012292Z: catalog=hive, table=default.test, report=ScanReport{tableName=default.test, snapshotId=1000000000001, filter=true, schemaId=0, projectedFieldIds=[1], projectedFieldNames=[id], scanMetrics=ScanMetricsResult{totalPlanningDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT2.644235116S, count=1}, resultDataFiles=CounterResult{unit=COUNT, value=3}, resultDeleteFiles=CounterResult{unit=COUNT, value=0}, totalDataManifests=CounterResult{unit=COUNT, value=1}, totalDeleteManifests=CounterResult{unit=COUNT, value=0}, scannedDataManifests=CounterResult{unit=COUNT, value=1}, skippedDataManifests=CounterResult{unit=COUNT, value=0}, totalFileSizeInBytes=null, totalDeleteFileSizeInBytes=null, skippedDataFiles=null, skippedDeleteFiles=null, scannedDeleteManifests=null, skippedDeleteManifests=null, indexedDeleteFiles=null, equalityDeleteFiles=null, positionalDeleteFiles=null, dvs=null}, metadata={source=dummy-curl, user=test-user, trace-id=dummy-trace-001}}
...
2026-05-05T10:50:13,075 INFO [JettyShutdownThread] metrics.LoggingMetricsReporter: Closing org.apache.iceberg.rest.metrics.LoggingMetricsReporter
2026-05-05T10:50:13,075 INFO [JettyShutdownThread] metrics.LoggingMetricsReporter: Closing org.apache.iceberg.rest.metrics.LoggingMetricsReporter
|
|
Merged. @deniskuzZ @Aggarwal-Raghav Thanks for your reviews! |



What changes were proposed in this pull request?
Complete implementing
/v1/{prefix}/namespaces/{namespace}/tables/{table}/metrics.https://issues.apache.org/jira/browse/HIVE-29593
Why are the changes needed?
The current endpoint is NOOP, meaning we can't leverage the reported metrics for further optimization.
Does this PR introduce any user-facing change?
No. By default, HMS administrators can see more logs.
How was this patch tested?
Precisely, the status code should be 204. It is an existing problem, and I'll address it later.
https://issues.apache.org/jira/browse/HIVE-29594