Display Spark read metrics on Spark SQL UI #7447

karuppayya · 2023-04-27T13:43:20Z

nastra · 2023-04-28T06:16:12Z

spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReadMetrics.java

+        seqAsJavaListConverter(df.queryExecution().executedPlan().collectLeaves()).asJava();
+    Map<String, SQLMetric> metrics = sparkPlans.get(0).metrics();
+
+    Assert.assertTrue(metrics.contains(TotalFileSize.name));


Suggested change

Assert.assertTrue(metrics.contains(TotalFileSize.name));

Assertions.assertThat(metrics).contains(TotalFileSize.name);

as that will show the content of the map in case the assertion ever fails

nastra · 2023-04-28T13:46:27Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java

+  public CustomTaskMetric[] reportDriverMetrics() {
+    List<CustomTaskMetric> customTaskMetrics = Lists.newArrayList();
+    MetricsReport metricsReport = sparkReadMetricReporter.getMetricsReport();
+    ScanReport scanReport = (ScanReport) metricsReport;


rather than casting here I think what would be better is to provide the necessary type of metrics report in the SparkReadMetricsReporter? This is because report(..) doesn't guarantee that you will only get a ScanReport

nastra · 2023-05-08T13:53:38Z

.../v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TotalPlanningDuration.java

+
+public class TotalPlanningDuration implements CustomMetric {
+
+  static final String name = "planningDuration";


should this be totalPlanningDuration instead? Same in the description

nastra · 2023-05-08T13:57:24Z

...3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/SparkReadMetricReporter.java

+  }
+
+  public Optional<ScanReport> getScanReport() {
+    return Optional.ofNullable((ScanReport) metricsReport);


Suggested change

return Optional.ofNullable((ScanReport) metricsReport);

return metricsReport instanceof ScanReport

? Optional.of((ScanReport) metricsReport)

: Optional.empty();

aokolnychyi · 2023-06-07T02:25:45Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java

@@ -256,4 +275,56 @@ public String toString() {
        runtimeFilterExpressions,
        caseSensitive());
  }
+
+  @Override
+  public CustomTaskMetric[] reportDriverMetrics() {


Why implement this only in SparkBatchQueryScan? Can we do this in SparkScan?

aokolnychyi · 2023-06-07T02:28:49Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java

@@ -65,6 +82,7 @@ class SparkBatchQueryScan extends SparkPartitioningAwareScan<PartitionScanTask>
  private final Long endSnapshotId;
  private final Long asOfTimestamp;
  private final String tag;
+  private final SparkReadMetricReporter sparkReadMetricReporter;


What about using Supplier<ScanReport> scanReportSupplier to make this independent from a particular metrics reporter? We can use metricsReporter::scanReport closure to construct it.

InMemoryMetricsReporter metricsReporter = new InMemoryMetricsReporter(); ... scan.metricsReporter(metricsReporter) ... return new SparkBatchQueryScan(..., metricsReporter::scanReport);

aokolnychyi · 2023-06-07T02:43:12Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java

+  @Override
+  public CustomTaskMetric[] reportDriverMetrics() {
+    List<CustomTaskMetric> customTaskMetrics = Lists.newArrayList();
+    Optional<ScanReport> scanReportOptional = sparkReadMetricReporter.getScanReport();


Iceberg historically uses null instead of Optional and I think we should continue to follow that. Also, Spotless makes these closures really hard to heard.

What about something like this?

@Override public CustomTaskMetric[] reportDriverMetrics() { ScanReport scanReport = scanReportSupplier.get(); if (scanReport == null) { return new CustomTaskMetric[0]; } List<CustomTaskMetric> driverMetrics = Lists.newArrayList(); driverMetrics.add(TaskTotalFileSize.from(scanReport)); ... return driverMetrics.toArray(new CustomTaskMetric[0]); }

Where TaskTotalFileSize would be defined as follows:

public class TaskTotalFileSize implements CustomTaskMetric { private final long value; public static TaskTotalFileSize from(ScanReport scanReport) { CounterResult counter = scanReport.scanMetrics().totalFileSizeInBytes(); long value = counter != null ? counter.value() : -1; return new TotalFileSizeTaskMetric(value); } private TaskTotalFileSize(long value) { this.value = value; } @Override public String name() { return TotalFileSize.NAME; } @Override public long value() { return value; } }

aokolnychyi · 2023-06-07T02:46:33Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java

+  }
+
+  @Override
+  public CustomMetric[] supportedCustomMetrics() {


This overrides supportedCustomMetrics() in SparkScan and breaks it. Can we move this logic there and combine with existing metrics?

aokolnychyi · 2023-06-07T02:48:20Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java

@@ -420,12 +423,15 @@ private Scan buildBatchScan() {
  private Scan buildBatchScan(Long snapshotId, Long asOfTimestamp, String branch, String tag) {
    Schema expectedSchema = schemaWithMetadataColumns();

+    sparkReadMetricReporter = new SparkReadMetricReporter();


Why init it here again? We should either use local vars or not init it here.

aokolnychyi · 2023-06-07T02:50:56Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/ResultDataFiles.java

+
+  @Override
+  public String description() {
+    return "Result data files";


We should follow what Spark does in other places. Specifically, its name does not with capital letters.

aokolnychyi · 2023-06-07T02:51:30Z

...k/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/ScannedDataManifests.java

+import java.util.Arrays;
+import org.apache.spark.sql.connector.metric.CustomMetric;
+
+public class ScannedDataManifests implements CustomMetric {


Comments above apply to all metrics below.

aokolnychyi · 2023-06-07T02:52:13Z

...3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/SparkReadMetricReporter.java

+import org.apache.iceberg.metrics.MetricsReporter;
+import org.apache.iceberg.metrics.ScanReport;
+
+public class SparkReadMetricReporter implements MetricsReporter {


Is there a better name? Shall we call it InMemoryMetricsReporter and more to core?

aokolnychyi · 2023-06-07T02:53:03Z

...3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/SparkReadMetricReporter.java

+    this.metricsReport = report;
+  }
+
+  public Optional<ScanReport> getScanReport() {


Let's not use Optional and getXXX prefix.

public ScanReport scanReport() { return (ScanReport) metricsReport; }

We can check if the value is null later.

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TaskResultDataFiles.java

nastra · 2023-06-26T15:14:16Z

.../spark/src/main/java/org/apache/iceberg/spark/source/metrics/InMemoryReadMetricReporter.java

+
+  @Override
+  public ScanReport get() {
+    return (ScanReport) metricsReport;


there's no guarantee that this will always return a ScanReport. There are also other types of reports

Added code to check type

aokolnychyi

This seems close. I did another round.

core/src/main/java/org/apache/iceberg/metrics/InMemoryReadMetricReporter.java

aokolnychyi · 2023-06-30T20:26:58Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatchQueryScan.java

-
-    super(spark, table, scan, readConf, expectedSchema, filters);
+      List<Expression> filters,
+      Supplier<ScanReport> metricsReportSupplier) {


What about calling git scanReportSupplier to be a bit specific?

It is actually called scanReportSupplier in other places, let's make it consistent.

aokolnychyi · 2023-06-30T20:42:23Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java

-      List<Expression> filters) {
+      List<Expression> filters,
+      Supplier<ScanReport> metricsReportSupplier) {
+    this.metricsReportSupplier = metricsReportSupplier;


Can we add this assignment as the last line in the constructors to follow the existing style?
Also, let's call it scanReportSupplier.

aokolnychyi · 2023-06-30T20:50:22Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkStagedScan.java

+      Table table,
+      SparkReadConf readConf,
+      Supplier<ScanReport> scanReportSupplier) {
+    super(spark, table, readConf, table.schema(), ImmutableList.of(), scanReportSupplier);


Shall we simply pass null here and remove the supplier from the SparkStagedScan constructor? We know there would be no metrics report available as it is a staged scan.

aokolnychyi · 2023-06-30T20:51:25Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkStagedScanBuilder.java

@@ -39,6 +40,7 @@ class SparkStagedScanBuilder implements ScanBuilder {

  @Override
  public Scan build() {
-    return new SparkStagedScan(spark, table, readConf);
+    return new SparkStagedScan(
+        spark, table, readConf, (new InMemoryReadMetricReporter())::scanReport);


This change would not be needed if we pass null to parent constructor in SparkStagedScan.

aokolnychyi · 2023-06-30T21:01:23Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TotalFileSize.java

+import java.util.Arrays;
+import org.apache.spark.sql.connector.metric.CustomMetric;
+
+public class TotalFileSize implements CustomMetric {


I am not sure whether I already asked, can we extend CustomSumMetric and rely on built-in method for aggregating the result? It applies to all our CustomMetric implementations.

aokolnychyi · 2023-06-30T21:05:27Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java

+    }
+
+    List<CustomTaskMetric> driverMetrics = Lists.newArrayList();
+    driverMetrics.add(TaskTotalFileSize.from(scanReport));


Is there a reason why we don't include all metrics from ScanMetricsResult?

I had added all metrics from ScanMetricsResult at the point in time when the change was done,
Should we add the remaining now or take in a different PR(since this cange is already reviewed almost)?

Here is a list of the metrics:

TimerResult totalPlanningDuration(); // DONE CounterResult resultDataFiles(); // DONE CounterResult resultDeleteFiles(); // MISSING CounterResult totalDataManifests(); // MISSING CounterResult totalDeleteManifests(); // MISSING CounterResult scannedDataManifests(); // DONE CounterResult skippedDataManifests(); // DONE CounterResult totalFileSizeInBytes(); // DONE CounterResult totalDeleteFileSizeInBytes(); // MISSING CounterResult skippedDataFiles(); // DONE CounterResult skippedDeleteFiles(); // MISSING CounterResult scannedDeleteManifests(); // MISSING CounterResult skippedDeleteManifests(); // MISSING CounterResult indexedDeleteFiles(); // MISSING CounterResult equalityDeleteFiles(); // MISSING CounterResult positionalDeleteFiles(); // MISSING

Can be done in a follow-up.

aokolnychyi · 2023-06-30T21:06:38Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java

@@ -96,6 +97,7 @@ public class SparkScanBuilder
  private boolean caseSensitive;
  private List<Expression> filterExpressions = null;
  private Filter[] pushedFilters = NO_FILTERS;
+  private final InMemoryReadMetricReporter metricsReporter;


Minor: This should go to the block with final variables above, you may init it in in the definition (up to you).

private final SparkReadConf readConf; private final List<String> metaColumns = Lists.newArrayList(); private final InMemoryMetricsReporter metricsReporter = new InMemoryMetricsReporter();

I think this still applies.

aokolnychyi · 2023-06-30T21:09:34Z

spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReadMetrics.java

+  }
+
+  @Test
+  public void testReadMetrics() throws NoSuchTableException {


Can we add proper tests? I think the approach is correct, so we can add tests now.

nastra · 2023-07-06T16:58:07Z

core/src/main/java/org/apache/iceberg/metrics/InMemoryReadMetricReporter.java

+
+  @Override
+  public void report(MetricsReport report) {
+    this.metricsReport = (ScanReport) report;


I think this cast needs to be removed

+1, this seems to be a generic container allowing to intercept a report.

I'd also call it InMemoryMetricsReporter, there is nothing specific to reads right now.

We also should use Metrics vs Metric in the name to match the interface.

+1 to both suggestions

nastra · 2023-07-06T17:03:03Z

....4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TaskScannedDataManifests.java

+
+  public static TaskScannedDataManifests from(ScanReport scanReport) {
+    CounterResult counter = scanReport.scanMetrics().scannedDataManifests();
+    long value = counter != null ? counter.value() : -1;


is -1 typical for Spark metrics to indicate that no data was available?

I dont see any CustomTaskMetricImpl in Spark handling no data.
Should we send 0 here
(I think "N/A" might not be a good idea and might convey that this data scan manifest was not releveant, WDYT)

We would probably need to check built-in Spark metrics as a reference. It will probably be 0 in other cases as Spark passes an empty array of task values and then does sum on it.

Making it 0 based on this

nastra · 2023-07-06T17:03:41Z

....4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TaskScannedDataManifests.java

+public class TaskScannedDataManifests implements CustomTaskMetric {
+  private final long value;
+
+  public TaskScannedDataManifests(long value) {


should be private because there's the from(ScanReport) method I would have assumed? Same applies for all the other constructors.

nastra · 2023-07-06T17:05:45Z

spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReadMetrics.java

+        seqAsJavaListConverter(df.queryExecution().executedPlan().collectLeaves()).asJava();
+    Map<String, SQLMetric> metricsMap =
+        JavaConverters.mapAsJavaMapConverter(sparkPlans.get(0).metrics()).asJava();
+    Assertions.assertEquals(1, metricsMap.get("scannedDataManifests").value());


we're currently moving away from Junit assertions to AssertJ, can you please update this to Assertions.assertThat(...).isEqualTo(1)

also would it make sense to add some more metric checks here?

Changed it to use org.assertj.core.api.Assertions
Added more checks

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java

aokolnychyi · 2023-07-13T16:11:18Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkPartitioningAwareScan.java

@@ -74,9 +76,10 @@ abstract class SparkPartitioningAwareScan<T extends PartitionScanTask> extends S
      Scan<?, ? extends ScanTask, ? extends ScanTaskGroup<?>> scan,
      SparkReadConf readConf,
      Schema expectedSchema,
-      List<Expression> filters) {
+      List<Expression> filters,
+      Supplier<ScanReport> metricsReportSupplier) {


Minor: metricsReportSupplier -> scanReportSupplier

aokolnychyi · 2023-07-13T16:12:20Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java

@@ -67,7 +84,8 @@ abstract class SparkScan implements Scan, SupportsReportStatistics {
      Table table,
      SparkReadConf readConf,
      Schema expectedSchema,
-      List<Expression> filters) {
+      List<Expression> filters,
+      Supplier<ScanReport> metricsReportSupplier) {


Minor: metricsReportSupplier -> scanReportSupplier

aokolnychyi · 2023-07-13T17:29:41Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/ResultDataFiles.java

+
+  @Override
+  public String description() {
+    return "result data files";


Would it be easier to understand if we show number of scanned data files in the UI?

aokolnychyi · 2023-07-13T17:30:41Z

...k/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/ScannedDataManifests.java

+
+  @Override
+  public String description() {
+    return "num scanned data manifests";


Is Spark using number of ... for output records? If so, can we match whatever Spark does?

based on DataSourceScanExec, changing the description

karuppayya · 2023-07-13T16:56:24Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java

+    }
+
+    List<CustomTaskMetric> driverMetrics = Lists.newArrayList();
+    driverMetrics.add(TaskTotalFileSize.from(scanReport));


I had added all metrics from ScanMetricsResult at the point in time when the change was done,
Should we add the remaining now or take in a different PR(since this cange is already reviewed almost)?

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java

karuppayya · 2023-07-13T17:23:01Z

....4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TaskScannedDataManifests.java

+
+  public static TaskScannedDataManifests from(ScanReport scanReport) {
+    CounterResult counter = scanReport.scanMetrics().scannedDataManifests();
+    long value = counter != null ? counter.value() : -1;


I dont see any CustomTaskMetricImpl in Spark handling no data.
Should we send 0 here
(I think "N/A" might not be a good idea and might convey that this data scan manifest was not releveant, WDYT)

karuppayya · 2023-07-13T17:31:21Z

spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReadMetrics.java

+        seqAsJavaListConverter(df.queryExecution().executedPlan().collectLeaves()).asJava();
+    Map<String, SQLMetric> metricsMap =
+        JavaConverters.mapAsJavaMapConverter(sparkPlans.get(0).metrics()).asJava();
+    Assertions.assertEquals(1, metricsMap.get("scannedDataManifests").value());


Changed it to use org.assertj.core.api.Assertions
Added more checks

karuppayya · 2023-07-13T17:33:57Z

spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReadMetrics.java

+import org.junit.jupiter.api.Assertions;
+import scala.collection.JavaConverters;
+
+public class TestSparkReadMetrics extends SparkTestBaseWithCatalog {


Added test for v1 and v2 tables.
Metatdata tables(apart from positional deletes table) , dont update org.apache.iceberg.SnapshotScan#scanMetrics.
Since we dont have delete manifest metric in this PR, skipping the test for positional delete metadata table.

aokolnychyi

Final minor comments and should be good to go. We will need to add tests for CoW and MoR plans in a separate PR.

aokolnychyi · 2023-07-14T01:19:03Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkPartitioningAwareScan.java

@@ -74,9 +76,10 @@ abstract class SparkPartitioningAwareScan<T extends PartitionScanTask> extends S
      Scan<?, ? extends ScanTask, ? extends ScanTaskGroup<?>> scan,
      SparkReadConf readConf,
      Schema expectedSchema,
-      List<Expression> filters) {
+      List<Expression> filters,
+      Supplier<ScanReport> scanReportSupplier) {



Same comment about the empty line here.

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkCopyOnWriteScan.java

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java

aokolnychyi · 2023-07-14T01:26:46Z

...k/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/ScannedDataManifests.java

+  }
+
+  @Override
+  public String aggregateTaskMetrics(long[] taskMetrics) {


This seems redundant as we extend CustomSumMetric?

Added for some debugging, removed.

aokolnychyi · 2023-07-14T01:29:10Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/scannedDataFiles.java

+
+import org.apache.spark.sql.connector.metric.CustomSumMetric;
+
+public class scannedDataFiles extends CustomSumMetric {


Seems like a typo? Should start with a capital letter?

aokolnychyi · 2023-07-14T01:47:42Z

...4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TaskTotalPlanningDuration.java

+
+  public static TaskTotalPlanningDuration from(ScanReport scanReport) {
+    TimerResult timerResult = scanReport.scanMetrics().totalPlanningDuration();
+    long value = timerResult != null ? timerResult.count() : -1;


Shouldn't this be totalDuration().toMillis()? Shall we also use 0 as the default value?

Keeping the default at -1 based on Spark default

I think it would be helpful to add a comment why this particular default value was chosen - with a reference to the spark default (here and at the other place)

aokolnychyi · 2023-07-14T01:48:45Z

.../v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TotalPlanningDuration.java

+
+  @Override
+  public String description() {
+    return "total planning duration";


What about adding info on what values we show?

total planning duration (ms)

aokolnychyi · 2023-07-14T01:49:06Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/metrics/TotalFileSize.java

+
+  @Override
+  public String description() {
+    return "total file size";


What about total file size (bytes)?

aokolnychyi · 2023-07-14T01:49:27Z

spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReadMetrics.java

+        JavaConverters.mapAsJavaMapConverter(sparkPlans.get(0).metrics()).asJava();
+    Assertions.assertThat(metricsMap.get("skippedDataFiles").value()).isEqualTo(1);
+    Assertions.assertThat(metricsMap.get("scannedDataManifests").value()).isEqualTo(2);
+    Assertions.assertThat(metricsMap.get("resultDataFiles").value()).isEqualTo(1);


I guess some of these were renamed.

karuppayya · 2023-07-14T21:26:30Z

The test failures doesn't seem to be related

nastra · 2023-07-17T13:37:57Z

core/src/main/java/org/apache/iceberg/metrics/InMemoryMetricsReporter.java

+  public ScanReport scanReport() {
+    Preconditions.checkArgument(
+        metricsReport == null || metricsReport instanceof ScanReport,
+        "Metric report is not a scan report");


Suggested change

"Metric report is not a scan report");

"Metrics report is not a scan report");

nastra · 2023-07-17T13:42:26Z

spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReadMetrics.java

+        JavaConverters.mapAsJavaMapConverter(sparkPlans.get(0).metrics()).asJava();
+    Assertions.assertThat(metricsMap.get("skippedDataFiles").value()).isEqualTo(1);
+    Assertions.assertThat(metricsMap.get("scannedDataManifests").value()).isEqualTo(2);
+    Assertions.assertThat(metricsMap.get("scannedDataFiles").value()).isEqualTo(1);


can you add checks for all the other metrics here as well (even if they are 0)?

nastra · 2023-07-17T13:42:39Z

spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestSparkReadMetrics.java

+        JavaConverters.mapAsJavaMapConverter(sparkPlans.get(0).metrics()).asJava();
+    Assertions.assertThat(metricsMap.get("skippedDataFiles").value()).isEqualTo(1);
+    Assertions.assertThat(metricsMap.get("scannedDataManifests").value()).isEqualTo(2);
+    Assertions.assertThat(metricsMap.get("scannedDataFiles").value()).isEqualTo(1);


same as above

nastra · 2023-07-20T08:26:34Z

This LGTM once CI passes, @karuppayya could you rebase onto latest master please?

aokolnychyi

LGTM. @karuppayya, could you rebase to fix CI?

Can we also follow up to add missing metrics discussed here and add tests for row-level operations?

aokolnychyi · 2023-07-20T20:37:25Z

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScanBuilder.java

@@ -449,8 +453,14 @@ private Scan buildBatchScan(Long snapshotId, Long asOfTimestamp, String branch,
    }

    scan = configureSplitPlanning(scan);
-


Can we keep this empty line?

aokolnychyi · 2023-07-20T23:05:18Z

It is awesome to get this done, @karuppayya! Thanks for reviewing, @nastra!

puchengy · 2023-07-21T14:49:24Z

@karuppayya thanks for the contributions, would you mind porting these to lower spark versions? (I particularly interested in spark 3.2). Thanks

karuppayya · 2023-07-21T16:11:43Z

@puchengy This cannot be ported as it is dependent on Spark changes to support driver metrics, which is availble from Spark 3.4

puchengy · 2023-07-21T16:37:42Z

@karuppayya Thanks for sharing! It seems to much work to first to backport spark changes to spark 3.2, make a release, and backport this change back to spark 3.2 iceberg repo.

frankliee · 2023-07-26T02:58:41Z

Thanks for this pr. Could I backport this to Spark 3.3. @karuppayya

github-actions bot added build spark labels Apr 27, 2023

karuppayya force-pushed the spark_metrics branch from b27ab80 to 8fdd705 Compare April 27, 2023 13:45

nastra reviewed Apr 28, 2023

View reviewed changes

karuppayya force-pushed the spark_metrics branch from 2746273 to c829a0f Compare May 8, 2023 13:44

nastra reviewed May 8, 2023

View reviewed changes

aokolnychyi reviewed Jun 7, 2023

View reviewed changes

nastra reviewed Jun 26, 2023

View reviewed changes

github-actions bot added the core label Jun 26, 2023

karuppayya force-pushed the spark_metrics branch from 940a6c2 to 268d9fe Compare June 27, 2023 12:51

aokolnychyi reviewed Jun 30, 2023

View reviewed changes

nastra reviewed Jul 6, 2023

View reviewed changes

spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkScan.java Show resolved Hide resolved

aokolnychyi reviewed Jul 13, 2023

View reviewed changes

karuppayya force-pushed the spark_metrics branch from 8c4652f to aeb9569 Compare July 13, 2023 17:34

karuppayya commented Jul 13, 2023

View reviewed changes

aokolnychyi reviewed Jul 14, 2023

View reviewed changes

karuppayya force-pushed the spark_metrics branch from 0b31a7b to 6d55476 Compare July 14, 2023 18:36

karuppayya requested review from aokolnychyi and nastra July 17, 2023 12:35

nastra reviewed Jul 17, 2023

View reviewed changes

nastra approved these changes Jul 20, 2023

View reviewed changes

aokolnychyi approved these changes Jul 20, 2023

View reviewed changes

karuppayya added 13 commits July 20, 2023 11:55

Show spark metrics on SQL UI

a729952

Add additional fixes, bug fixes

14d7649

Address review comments

b3ec389

Fix spotless error

4663cc2

Address review comments

13547ea

Address review comments

7a113cd

Rename

2735506

Address review comments

ef0dc67

Address review comments

f9afa31

Address review commenst

01fef74

Address review comments

57ce79f

Add review comments

760b67f

Address review comments

831a170

karuppayya force-pushed the spark_metrics branch from 1682dde to 831a170 Compare July 20, 2023 18:55

aokolnychyi reviewed Jul 20, 2023

View reviewed changes

Fix test- Thanks Anton

58ec63e

aokolnychyi approved these changes Jul 20, 2023

View reviewed changes

aokolnychyi merged commit 8e89f6b into apache:master Jul 20, 2023
41 checks passed

karuppayya mentioned this pull request Oct 4, 2023

Add Spark UI metrics from Iceberg scan metrics #8717

Merged

	Assert.assertTrue(metrics.contains(TotalFileSize.name));
	Assertions.assertThat(metrics).contains(TotalFileSize.name);


		public class TotalPlanningDuration implements CustomMetric {

		static final String name = "planningDuration";

-    return Optional.ofNullable((ScanReport) metricsReport);
+    return metricsReport instanceof ScanReport
+        ? Optional.of((ScanReport) metricsReport)
+        : Optional.empty();


		import org.apache.spark.sql.connector.metric.CustomSumMetric;

		public class scannedDataFiles extends CustomSumMetric {

	"Metric report is not a scan report");
	"Metrics report is not a scan report");

		@@ -449,8 +453,14 @@ private Scan buildBatchScan(Long snapshotId, Long asOfTimestamp, String branch,
		}

		scan = configureSplitPlanning(scan);

Display Spark read metrics on Spark SQL UI #7447

Display Spark read metrics on Spark SQL UI #7447

Conversation

karuppayya commented Apr 27, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aokolnychyi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nastra Jul 6, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aokolnychyi Jul 13, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nastra Jul 6, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aokolnychyi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karuppayya commented Jul 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nastra Jul 6, 2023 •

edited

aokolnychyi Jul 13, 2023 •

edited

nastra Jul 6, 2023 •

edited