-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-3179. Add task OutputMetrics. #2968
Conversation
Test build #22314 has started for PR 2968 at commit
|
Test build #22314 has finished for PR 2968 at commit
|
Test PASSed. |
1a793e8
to
94b7c3f
Compare
Updated patch rebases on master and adds output metrics to a couple places on the UI they were missing. |
Verified that output metrics show up every (and while tasks are running) on a pseudo-distributed cluster. |
Test build #22657 has started for PR 2968 at commit
|
Test build #22657 has finished for PR 2968 at commit
|
Test PASSed. |
/** | ||
* Total bytes written | ||
*/ | ||
var bytesWritten: Long = 0L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you anticipate adding other things here later (like time blocked on writing output data)? Or should this just be a long inside the metrics class (rather than having those whole output metrics class)?
@sryza this mostly looks good and super helpful!! One more comment I discussed with @pwendell: I think it would be good to add an output type field to output metrics (similar to what we have for input metrics) that currently can only be Hadoop. The motivation for this is that there's some annoying overhead associated with changing stuff we write out to Json, and it seems like we're likely to want to add metrics for other output types (e.g., storing output data in the block manager) later on. |
Thanks for taking a look @kayousterhout . I'll add in an output type. |
94b7c3f
to
92c772c
Compare
Test build #22997 has started for PR 2968 at commit
|
Test build #22997 has finished for PR 2968 at commit
|
Test FAILed. |
Jenkins, retest this please |
Test build #23018 has started for PR 2968 at commit
|
Test build #23018 has finished for PR 2968 at commit
|
Test PASSed. |
: Option[() => Long] = { | ||
try { | ||
val threadStats = getFileSystemThreadStatistics(path, conf) | ||
val getBytesReadMethod = getFileSystemThreadStatisticsMethod("getBytesWritten") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getBytesWrittenMethod? And same with the below naming?
Sorry for not noticing the two other things earlier!! |
val fs = FileSystem.getLocal(new Configuration()) | ||
val outPath = new Path(fs.getWorkingDirectory, "outdir") | ||
|
||
if (SparkHadoopUtil.get.getFSBytesWrittenOnThreadCallback(outPath, fs.getConf).isDefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is a little hairy because it could break without anyone noticing (since I don't think Jenkins runs a new enough version for this code to be run?). But I don't have a great solution -- you could mock out a bunch of the function calls, but then you're left not really testing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I had this same thought, but I can't think of any good way to prevent this. Note that the other tests in this class have the same problem.
92c772c
to
dce4784
Compare
Test build #23107 has started for PR 2968 at commit
|
Test build #23107 has finished for PR 2968 at commit
|
Test PASSed. |
Thanks Sandy! I've merged this into master and 1.2. |
Author: Sandy Ryza <sandy@cloudera.com> This patch had conflicts when merged, resolved by Committer: Kay Ousterhout <kayousterhout@gmail.com> Closes #2968 from sryza/sandy-spark-3179 and squashes the following commits: dce4784 [Sandy Ryza] More review feedback 8d350d1 [Sandy Ryza] Fix test against Hadoop 2.5+ e7c74d0 [Sandy Ryza] More review feedback 6cff9c4 [Sandy Ryza] Review feedback fb2dde0 [Sandy Ryza] SPARK-3179 (cherry picked from commit 3c2cff4) Signed-off-by: Kay Ousterhout <kayousterhout@gmail.com>
No description provided.