-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3288] All fields in TaskMetrics should be private and use getters/setters #4020
Conversation
Test build #25473 has started for PR 4020 at commit
|
Test build #25473 has finished for PR 4020 at commit
|
Test PASSed. |
|
||
private var _hostname: String = _ | ||
def hostname = _hostname | ||
def setHostname(value : String) = _hostname = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be made private[spark]
?
Thanks for helping with this clean-up. I made some small suggestions. |
Test build #25619 has started for PR 4020 at commit
|
Test build #25619 has finished for PR 4020 at commit
|
Test FAILed. |
@@ -257,8 +257,8 @@ private[spark] class Executor( | |||
val serviceTime = System.currentTimeMillis() - taskStart | |||
val metrics = attemptedTask.flatMap(t => t.metrics) | |||
for (m <- metrics) { | |||
m.executorRunTime = serviceTime | |||
m.jvmGCTime = gcTime - startGCTime | |||
m.incExecutorRunTime(serviceTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this replace =
with +=
? This applies in a couple places above as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether the original behavior is necessarily correct. If the goal is to track total run time for the task, why does it make sense to do an assignment anywhere instead of an accumulation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Task GC time is measured by looking at ((total time spent in GC since process started) - (total time spent in GC since process started, snapshotted when task started)). Incrementing every time we update the metric would end up double-counting certain time regions. A few other metrics work this way as well. If you do notice issues with the logic, it might be good to propose fixes in a separate JIRA, so that we can give them careful consideration on their own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. Do you think it would be best to just add a setter function to address these cases directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of this original patch was to just lock down the visibility, so I think it's fine to have -package-private setters in cases where we don't simply increment/decrement. We could also just only have setters (and not inc/dec). Basically as long as they are private, I don't mind. Maybe there is some value to restricting certain ones to inc/dec to provide more safety.
Test build #25621 has started for PR 4020 at commit
|
Had a look over, and this mostly looks good, but it looks like there are many places where the patch replaces assigning with incrementing. It would be good to take a close look and pull all these out. |
/** | ||
* Time the task spent blocking on writes to disk or buffer cache, in nanoseconds | ||
*/ | ||
@volatile var shuffleWriteTime: Long = _ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make these AtomicLong's so that the incrementing can be threadsafe. I have a pr out that does this:
https://github.com/apache/spark/pull/3120/files#diff-1bd3dc38f6306e0a822f93d62c32b1d0R226 for input metrics. I'd be good to do this throughout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_shuffleWriteTime
can only get incremented from a single thread. It's marked volatile so that other threads can read from it. Using an AtomicLong
would unnecessarily bloat the size of the task results. I probably should have documented this access pattern better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, but it is hard to guarantee that only one thread will increment the value. We could mark the class as not thread safe by docs but it might be a ticking time bomb. Is the overhead of AtomicLong that concerning to risk concurrency issues down the line?
Speaking of shuffleMetrics, can we get rid of the array of shuffleReadMetrics (depsShuffleReadMetrics) and the merge step in favor of using AtomicLongs in ShuffleReadMetrics? That way the TaskMetrics can just return the same threadsafe ShuffleReadMetrics to the tasks and there wouldn't need to be a need to call updateShuffleReadMetrics periodically in the Executor heartbeat code.
Btw - this may conflict with #3120 so maybe we should hold off until that's merge. |
Test build #25621 has finished for PR 4020 at commit
|
Test PASSed. |
Conflicts abound! |
Hi Patrick - I did look over 3120. That one will definitely need to be merged first and then we can finish this. Thanks. |
Test build #25675 has started for PR 4020 at commit
|
Updated to resolve merge conflicts with #3120, fix some small style issues, and correctly use setters instead of inc/dec in certain places in the code. |
Test build #25675 has finished for PR 4020 at commit
|
Test PASSed. |
|
||
private var _hostname: String = _ | ||
def hostname = _hostname | ||
private[spark] def setHostname(value : String) = _hostname = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be "value: String
" with no space (not sure why our style checks didn't catch this).
LGTM - @sryza and @ksakellis look okay to you? |
Okay I'm pulling this in with the minor style fix. |
@pwendell sorry, was out for the weekend, but this LGTM. |
@pwendell yeah, looks good. Sorry I was also away. |
I've updated the fields and all usages of these fields in the Spark code. I've verified that this did not break anything on my local repo.