Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -350,9 +350,15 @@ case class ReplaceDataExec(
// One of the metrics couldn't be found, also mark numDeletedRows as not found.
-1L
}

// SQLMetric.set call is a no-op if value is -1. Override numDeletedRows value in summary.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, does it mean the values in UI would differ from the commit info?
Take a closer look at SQLMetric. Shall we create them with -1 to indicate they are invalid? Today, we create them with 0 initial value making them valid metrics by default.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class SQLMetric(
    val metricType: String,
    initValue: Long = 0L) extends AccumulatorV2[Long, Long] {
  // initValue defines the initial value of the metric. 0 is the lowest value considered valid.
  // If a SQLMetric is invalid, it is set to 0 upon receiving any updates, and it also reports
  // 0 as its value to avoid exposing it to the user programmatically.
  //
  // For many SQLMetrics, we use initValue = -1 to indicate that the metric is by default invalid.
  // At the end of a task, we will update the metric making it valid, and the invalid metrics will
  // be filtered out when calculating min, max, etc. as a workaround
  // for SPARK-11013.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my understanding is correct, setting the initial value as -1 will result it in the final numDeletedRows metric value being equal to -1 x num_of_tasks as no task calls .add() on it and each task will send -1 as the value and finally sum these up. I'm not sure how it behaves with task failures etc.

There are also places where we legitimately rely on metric values being initialized to 0 unless updated. For instance, MERGE numTargetRowsCopied should stay as 0, unless we have a row that is being copied and then we increment it.

Since the meaning of -1 is only documented at Summary classes as not found, I don't think we have to make the metrics behave the same way. It looks like metrics were not built to support negative values properly.

metrics("numDeletedRows").set(numDeletedRows)
super.getWriteSummary(query).map {
case d: DeleteSummaryImpl => d.copy(numDeletedRows = numDeletedRows)
}
} else {
super.getWriteSummary(query)
}
super.getWriteSummary(query)
}
}

Expand Down