-
Notifications
You must be signed in to change notification settings - Fork 29.2k
[SPARK-56509][SQL] SparkSQL Last Attempt Metrics #55371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
384497b
4dd356a
8b6dc34
38437af
f045410
79452c0
c0cb791
6dcf2ff
dd8de29
b0a0646
6f703dd
f9c21e2
16c91ee
c43d878
234d8c9
8d4a2a2
0beedd5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1858,6 +1858,11 @@ private[spark] class DAGScheduler( | |
| throw SparkCoreErrors.accessNonExistentAccumulatorError(id) | ||
| } | ||
| acc.merge(updates.asInstanceOf[AccumulatorV2[Any, Any]]) | ||
| if (acc.isInstanceOf[LastAttemptAccumulator[_, _, _]]) { | ||
| acc.asInstanceOf[LastAttemptAccumulator[_, _, _]].mergeLastAttempt( | ||
| updates, stage.rdd, event.taskInfo, | ||
| task.stageId, task.stageAttemptId, task.localProperties) | ||
| } | ||
| // To avoid UI cruft, ignore cases where value wasn't updated | ||
| if (acc.name.isDefined && !updates.isZero) { | ||
| stage.latestInfo.accumulables(id) = acc.toInfo(None, Some(acc.value)) | ||
|
|
@@ -2333,6 +2338,19 @@ private[spark] class DAGScheduler( | |
| // The epoch of the task is acceptable (i.e., the task was launched after the most | ||
| // recent failure we're aware of for the executor), so mark the task's output as | ||
| // available. | ||
| // For testing purposes, inject fetch failures controlled from the driver-side by | ||
| // supplying an invalid location. | ||
| if (Utils.isTesting && | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Injecting invalid
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are about 100 code paths in Spark that invoke extra checks or trigger extra failures when |
||
| sc.conf.get(config.Tests.INJECT_SHUFFLE_FETCH_FAILURES) && | ||
| task.stageAttemptId == 0) { | ||
| val currentLocation = status.location | ||
| val invalidLocation = BlockManagerId( | ||
| execId = BlockManagerId.INVALID_EXECUTOR_ID, | ||
| host = currentLocation.host, | ||
| port = currentLocation.port, | ||
| topologyInfo = currentLocation.topologyInfo) | ||
| status.updateLocation(invalidLocation) | ||
| } | ||
| val isChecksumMismatched = mapOutputTracker.registerMapOutput( | ||
| shuffleStage.shuffleDep.shuffleId, smt.partitionId, status) | ||
| if (isChecksumMismatched) { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduces coupling between DAGScheduler and the SLAM concept. An alternative: add an overridable method to
AccumulatorV2likedef mergeWithTaskMetadata(other, rdd, taskInfo, stageId, stageAttemptId, props): Unit = {}that SLAM overrides. Then DAGScheduler calls it unconditionally (no-op for regular accumulators) without the instanceof check.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LastAttemptAccumulatorfollows a mixin pattern, where the LastAttemptAccumulator mixin adds the extra functionality to the accumulator. The code doesn't change AccumulatorV2 at all, and only adds extra visibility to two fields inSQLMetrics, and the plugin into existing production code is limited to these few lines in DAGScheduler, plus some testing utils, plus tiny fixes to RDD scoping in collector and shuffle.If we added this method to the base class, it would be a noop for any metric other than a LastAttemptAccumulator.
This kind of mixin follows the same pattern as e.g. most of DSv2 interfaces, where also instead of adding empty methods to many interfaces, we have mixins like SupportsDelta, RequiresDistributionAndOrdering etc. etc., and various places that interact with it are plugging in the awareness based on the type check.
I think it's a good pattern.