Skip to content

HIVE-29593: Server-side Metrics Reporting for Iceberg operations#6461

Merged
okumin merged 7 commits intoapache:masterfrom
okumin:HIVE-29593-iceberg-metrics-reporter
May 6, 2026
Merged

HIVE-29593: Server-side Metrics Reporting for Iceberg operations#6461
okumin merged 7 commits intoapache:masterfrom
okumin:HIVE-29593-iceberg-metrics-reporter

Conversation

@okumin
Copy link
Copy Markdown
Contributor

@okumin okumin commented May 4, 2026

What changes were proposed in this pull request?

Complete implementing /v1/{prefix}/namespaces/{namespace}/tables/{table}/metrics.

https://issues.apache.org/jira/browse/HIVE-29593

Why are the changes needed?

The current endpoint is NOOP, meaning we can't leverage the reported metrics for further optimization.

Does this PR introduce any user-facing change?

No. By default, HMS administrators can see more logs.

How was this patch tested?

  1. Launch HMS.
$ docker run --rm -p 9001:9001 apache/hive:standalone-metastore-4.3.0-SNAPSHOT
  1. Send events.
$ curl -i -X POST \
  "http://localhost:9001/iceberg/v1/namespaces/default/tables/test/metrics" \
  -H "Content-Type: application/json" \
  --data-binary @- <<'JSON'
{
  "report-type": "scan-report",
  "table-name": "default.test",
  "snapshot-id": 1000000000001,
  "filter": true,
  "schema-id": 0,
  "projected-field-ids": [1],
  "projected-field-names": ["id"],
  "metrics": {
    "total-planning-duration": {
      "count": 1,
      "time-unit": "nanoseconds",
      "total-duration": 2644235116
    },
    "result-data-files": {
      "unit": "count",
      "value": 3
    },
    "result-delete-files": {
      "unit": "count",
      "value": 0
    },
    "total-data-manifests": {
      "unit": "count",
      "value": 1
    },
    "total-delete-manifests": {
      "unit": "count",
      "value": 0
    },
    "scanned-data-manifests": {
      "unit": "count",
      "value": 1
    },
    "skipped-data-manifests": {
      "unit": "count",
      "value": 0
    },
    "total-file-size-bytes": {
      "unit": "bytes",
      "value": 1048576
    },
    "total-delete-file-size-bytes": {
      "unit": "bytes",
      "value": 0
    }
  },
  "metadata": {
    "source": "dummy-curl",
    "user": "test-user",
    "trace-id": "dummy-trace-001"
  }
}
JSON

$ curl -i -X POST \
  "http://localhost:9001/iceberg/v1/namespaces/default/tables/test/metrics" \
  -H "Content-Type: application/json" \
  --data-binary @- <<'JSON'
{
  "report-type": "commit-report",
  "table-name": "default.test",
  "snapshot-id": 1000000000002,
  "sequence-number": 1,
  "operation": "append",
  "metrics": {
    "total-duration": {
      "count": 1,
      "time-unit": "nanoseconds",
      "total-duration": 1234567890
    },
    "attempts": {
      "unit": "count",
      "value": 1
    },
    "added-data-files": {
      "unit": "count",
      "value": 1
    },
    "removed-data-files": {
      "unit": "count",
      "value": 0
    },
    "added-records": {
      "unit": "count",
      "value": 100
    },
    "added-files-size-bytes": {
      "unit": "bytes",
      "value": 4096
    }
  },
  "metadata": {
    "source": "dummy-curl",
    "user": "test-user",
    "trace-id": "dummy-trace-002"
  }
}
JSON
  1. I can see the logs
2026-05-04T09:20:12,525  INFO [qtp1859459396-63] metrics.LoggingMetricsReporter: Event reported at 2026-05-04T09:20:12.524872717Z: catalog=hive, table=default.test, report=ScanReport{tableName=default.test, snapshotId=1000000000001, filter=true, schemaId=0, projectedFieldIds=[1], projectedFieldNames=[id], scanMetrics=ScanMetricsResult{totalPlanningDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT2.644235116S, count=1}, resultDataFiles=CounterResult{unit=COUNT, value=3}, resultDeleteFiles=CounterResult{unit=COUNT, value=0}, totalDataManifests=CounterResult{unit=COUNT, value=1}, totalDeleteManifests=CounterResult{unit=COUNT, value=0}, scannedDataManifests=CounterResult{unit=COUNT, value=1}, skippedDataManifests=CounterResult{unit=COUNT, value=0}, totalFileSizeInBytes=null, totalDeleteFileSizeInBytes=null, skippedDataFiles=null, skippedDeleteFiles=null, scannedDeleteManifests=null, skippedDeleteManifests=null, indexedDeleteFiles=null, equalityDeleteFiles=null, positionalDeleteFiles=null, dvs=null}, metadata={source=dummy-curl, user=test-user, trace-id=dummy-trace-001}}
...
2026-05-04T09:20:24,841  INFO [qtp1859459396-59] metrics.LoggingMetricsReporter: Event reported at 2026-05-04T09:20:24.841145417Z: catalog=hive, table=default.test, report=CommitReport{tableName=default.test, snapshotId=1000000000002, sequenceNumber=1, operation=append, commitMetrics=CommitMetricsResult{totalDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT1.23456789S, count=1}, attempts=CounterResult{unit=COUNT, value=1}, addedDataFiles=CounterResult{unit=COUNT, value=1}, removedDataFiles=CounterResult{unit=COUNT, value=0}, totalDataFiles=null, addedDeleteFiles=null, addedEqualityDeleteFiles=null, addedPositionalDeleteFiles=null, addedDVs=null, removedDeleteFiles=null, removedEqualityDeleteFiles=null, removedPositionalDeleteFiles=null, removedDVs=null, totalDeleteFiles=null, addedRecords=CounterResult{unit=COUNT, value=100}, removedRecords=null, totalRecords=null, addedFilesSizeInBytes=CounterResult{unit=BYTES, value=4096}, removedFilesSizeInBytes=null, totalFilesSizeInBytes=null, addedPositionalDeletes=null, removedPositionalDeletes=null, totalPositionalDeletes=null, addedEqualityDeletes=null, removedEqualityDeletes=null, totalEqualityDeletes=null, manifestsCreated=null, manifestsReplaced=null, manifestsKept=null, manifestEntriesProcessed=null}, metadata={source=dummy-curl, user=test-user, trace-id=dummy-trace-002}}

Precisely, the status code should be 204. It is an existing problem, and I'll address it later.
https://issues.apache.org/jira/browse/HIVE-29594

@okumin okumin force-pushed the HIVE-29593-iceberg-metrics-reporter branch from 43a146c to 0b7b00d Compare May 4, 2026 09:22
@okumin okumin force-pushed the HIVE-29593-iceberg-metrics-reporter branch from 0b7b00d to 1e82794 Compare May 4, 2026 10:02
* @param report the event
* @param receivedAt the timestamp when the Iceberg REST Catalog received the event
*/
void report(String catalog, TableIdentifier identifier, MetricsReport report, Instant receivedAt);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.responseCode(400)
.withType(e.getClass().getSimpleName())
.withMessage(e.getMessage())
.build());
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I happened to find an invalid body throws a 500 error

warehouseDir = Path.of(MetaStoreTestUtils.getTestWarehouseDir(uniqueTestKey));
Files.createDirectories(warehouseDir);
System.setProperty("derby.stream.error.file",
warehouseDir.resolve("derby.log").toAbsolutePath().toString());
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log was generated in standalone-metastore/metastore-rest-catalog/derby.log originally, which is not git-ignored.

@okumin okumin marked this pull request as ready for review May 4, 2026 10:04
Copy link
Copy Markdown
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

final var constructor = clazz.getDeclaredConstructor(Configuration.class);
return (IcebergMetricsReporter) constructor.newInstance(configuration);
} catch (NoSuchMethodException | InstantiationException | IllegalAccessException | InvocationTargetException e) {
throw new IllegalArgumentException("Failed to instantiate IcebergMetricsReporter: {}" + clazz.getName(), e);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the {}
throw new IllegalArgumentException("Failed to instantiate IcebergMetricsReporter: " + clazz.getName(), e);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I addressed and tested it.
49704bf

$ docker run --rm -p 9001:9001 --env SERVICE_OPTS='-Dmetastore.iceberg.catalog.metrics.reporters=java.lang.String' apache/hive:standalone-metastore-4.3.0-SNAPSHOT
...
Caused by: java.lang.IllegalArgumentException: Failed to instantiate IcebergMetricsReporter: java.lang.String

public void close() {
public void close() throws IOException {
// The caller is responsible for closing the underlying catalog backing this REST catalog.
for (IcebergMetricsReporter reporter : metricsReporters) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case if multiple reported are present and there is exception in closing 1 reporter, rest all will not be closed. Can we do something like this ?

public void close() {
threadLocalMap.values().forEach(this::closeQuietly);
}
private void closeQuietly(AutoCloseable autoCloseable) {
try {
autoCloseable.close();
} catch (Exception e) {
LOG.warn("Error while closing resource.", e);
}
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I found the close method would not be invoked, so I added a Servlet lifecycle method and logging for safety.
69c2efd

// nothing to do here other than checking that we're getting the correct request
castRequest(ReportMetricsRequest.class, body);
private RESTResponse reportMetrics(Map<String, String> vars, Object body) {
final var ident = identFromPathVars(vars);
Copy link
Copy Markdown
Contributor

@Aggarwal-Raghav Aggarwal-Raghav May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just nitpicking, can we provide explicit type instead of var here? In reflection code below that is totally understandable but what identFromPathVars returns makes it difficult for me to infer without IDE

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a fair opinion: d896dfa

throw new RuntimeException("The default class " + var.defaultVal + " does not exist");
}
String val = conf.get(var.varname);
return val == null ? conf.getClasses(var.hiveName, defaultClass) : conf.getClasses(var.varname, defaultClass);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this method so that HMS can pick up fallback variables.

% docker run --rm -p 9001:9001 --env SERVICE_OPTS='-Dmetastore.iceberg.catalog.metrics.reporters=org.apache.iceberg.rest.metrics.LoggingMetricsReporter,org.apache.iceberg.rest.metrics.LoggingMetricsReporter' apache/hive:standalone-metastore-4.3.0-SNAPSHOT
...
2026-05-05T10:41:32,238  INFO [qtp515442419-63] metrics.LoggingMetricsReporter: Event reported at 2026-05-05T10:41:32.238290053Z: catalog=hive, table=default.test, report=ScanReport{tableName=default.test, snapshotId=1000000000001, filter=true, schemaId=0, projectedFieldIds=[1], projectedFieldNames=[id], scanMetrics=ScanMetricsResult{totalPlanningDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT2.644235116S, count=1}, resultDataFiles=CounterResult{unit=COUNT, value=3}, resultDeleteFiles=CounterResult{unit=COUNT, value=0}, totalDataManifests=CounterResult{unit=COUNT, value=1}, totalDeleteManifests=CounterResult{unit=COUNT, value=0}, scannedDataManifests=CounterResult{unit=COUNT, value=1}, skippedDataManifests=CounterResult{unit=COUNT, value=0}, totalFileSizeInBytes=null, totalDeleteFileSizeInBytes=null, skippedDataFiles=null, skippedDeleteFiles=null, scannedDeleteManifests=null, skippedDeleteManifests=null, indexedDeleteFiles=null, equalityDeleteFiles=null, positionalDeleteFiles=null, dvs=null}, metadata={source=dummy-curl, user=test-user, trace-id=dummy-trace-001}}
2026-05-05T10:41:32,244  INFO [qtp515442419-63] metrics.LoggingMetricsReporter: Event reported at 2026-05-05T10:41:32.238290053Z: catalog=hive, table=default.test, report=ScanReport{tableName=default.test, snapshotId=1000000000001, filter=true, schemaId=0, projectedFieldIds=[1], projectedFieldNames=[id], scanMetrics=ScanMetricsResult{totalPlanningDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT2.644235116S, count=1}, resultDataFiles=CounterResult{unit=COUNT, value=3}, resultDeleteFiles=CounterResult{unit=COUNT, value=0}, totalDataManifests=CounterResult{unit=COUNT, value=1}, totalDeleteManifests=CounterResult{unit=COUNT, value=0}, scannedDataManifests=CounterResult{unit=COUNT, value=1}, skippedDataManifests=CounterResult{unit=COUNT, value=0}, totalFileSizeInBytes=null, totalDeleteFileSizeInBytes=null, skippedDataFiles=null, skippedDeleteFiles=null, scannedDeleteManifests=null, skippedDeleteManifests=null, indexedDeleteFiles=null, equalityDeleteFiles=null, positionalDeleteFiles=null, dvs=null}, metadata={source=dummy-curl, user=test-user, trace-id=dummy-trace-001}}
...
2026-05-05T10:41:36,132  INFO [JettyShutdownThread] metrics.LoggingMetricsReporter: Closing org.apache.iceberg.rest.metrics.LoggingMetricsReporter
2026-05-05T10:41:36,133  INFO [JettyShutdownThread] metrics.LoggingMetricsReporter: Closing org.apache.iceberg.rest.metrics.LoggingMetricsReporter
$ docker run --rm -p 9001:9001 --env SERVICE_OPTS='-Dhive.metastore.iceberg.catalog.metrics.reporters=org.apache.iceberg.rest.metrics.LoggingMetricsReporter,org.apache.iceberg.rest.metrics.LoggingMetricsReporter' apache/hive:standalone-metastore-4.3.0-SNAPSHOT
...
2026-05-05T10:50:10,614  INFO [qtp515442419-63] metrics.LoggingMetricsReporter: Event reported at 2026-05-05T10:50:10.614012292Z: catalog=hive, table=default.test, report=ScanReport{tableName=default.test, snapshotId=1000000000001, filter=true, schemaId=0, projectedFieldIds=[1], projectedFieldNames=[id], scanMetrics=ScanMetricsResult{totalPlanningDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT2.644235116S, count=1}, resultDataFiles=CounterResult{unit=COUNT, value=3}, resultDeleteFiles=CounterResult{unit=COUNT, value=0}, totalDataManifests=CounterResult{unit=COUNT, value=1}, totalDeleteManifests=CounterResult{unit=COUNT, value=0}, scannedDataManifests=CounterResult{unit=COUNT, value=1}, skippedDataManifests=CounterResult{unit=COUNT, value=0}, totalFileSizeInBytes=null, totalDeleteFileSizeInBytes=null, skippedDataFiles=null, skippedDeleteFiles=null, scannedDeleteManifests=null, skippedDeleteManifests=null, indexedDeleteFiles=null, equalityDeleteFiles=null, positionalDeleteFiles=null, dvs=null}, metadata={source=dummy-curl, user=test-user, trace-id=dummy-trace-001}}
2026-05-05T10:50:10,619  INFO [qtp515442419-63] metrics.LoggingMetricsReporter: Event reported at 2026-05-05T10:50:10.614012292Z: catalog=hive, table=default.test, report=ScanReport{tableName=default.test, snapshotId=1000000000001, filter=true, schemaId=0, projectedFieldIds=[1], projectedFieldNames=[id], scanMetrics=ScanMetricsResult{totalPlanningDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT2.644235116S, count=1}, resultDataFiles=CounterResult{unit=COUNT, value=3}, resultDeleteFiles=CounterResult{unit=COUNT, value=0}, totalDataManifests=CounterResult{unit=COUNT, value=1}, totalDeleteManifests=CounterResult{unit=COUNT, value=0}, scannedDataManifests=CounterResult{unit=COUNT, value=1}, skippedDataManifests=CounterResult{unit=COUNT, value=0}, totalFileSizeInBytes=null, totalDeleteFileSizeInBytes=null, skippedDataFiles=null, skippedDeleteFiles=null, scannedDeleteManifests=null, skippedDeleteManifests=null, indexedDeleteFiles=null, equalityDeleteFiles=null, positionalDeleteFiles=null, dvs=null}, metadata={source=dummy-curl, user=test-user, trace-id=dummy-trace-001}}
...
2026-05-05T10:50:13,075  INFO [JettyShutdownThread] metrics.LoggingMetricsReporter: Closing org.apache.iceberg.rest.metrics.LoggingMetricsReporter
2026-05-05T10:50:13,075  INFO [JettyShutdownThread] metrics.LoggingMetricsReporter: Closing org.apache.iceberg.rest.metrics.LoggingMetricsReporter

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 5, 2026

@okumin okumin merged commit bbf73e9 into apache:master May 6, 2026
4 checks passed
@okumin okumin deleted the HIVE-29593-iceberg-metrics-reporter branch May 6, 2026 03:38
@okumin
Copy link
Copy Markdown
Contributor Author

okumin commented May 6, 2026

Merged. @deniskuzZ @Aggarwal-Raghav Thanks for your reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants