Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-3179. Add task OutputMetrics. #2968

Closed
wants to merge 5 commits into from

Conversation

sryza
Copy link
Contributor

@sryza sryza commented Oct 28, 2014

No description provided.

@SparkQA
Copy link

SparkQA commented Oct 28, 2014

Test build #22314 has started for PR 2968 at commit 1a793e8.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 28, 2014

Test build #22314 has finished for PR 2968 at commit 1a793e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class OutputMetrics()

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22314/
Test PASSed.

@sryza
Copy link
Contributor Author

sryza commented Oct 31, 2014

Updated patch rebases on master and adds output metrics to a couple places on the UI they were missing.

@sryza
Copy link
Contributor Author

sryza commented Oct 31, 2014

Verified that output metrics show up every (and while tasks are running) on a pseudo-distributed cluster.

@SparkQA
Copy link

SparkQA commented Oct 31, 2014

Test build #22657 has started for PR 2968 at commit 94b7c3f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 1, 2014

Test build #22657 has finished for PR 2968 at commit 94b7c3f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22657/
Test PASSed.

/**
* Total bytes written
*/
var bytesWritten: Long = 0L
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you anticipate adding other things here later (like time blocked on writing output data)? Or should this just be a long inside the metrics class (rather than having those whole output metrics class)?

@kayousterhout
Copy link
Contributor

@sryza this mostly looks good and super helpful!! One more comment I discussed with @pwendell: I think it would be good to add an output type field to output metrics (similar to what we have for input metrics) that currently can only be Hadoop. The motivation for this is that there's some annoying overhead associated with changing stuff we write out to Json, and it seems like we're likely to want to add metrics for other output types (e.g., storing output data in the block manager) later on.

@sryza
Copy link
Contributor Author

sryza commented Nov 6, 2014

Thanks for taking a look @kayousterhout . I'll add in an output type.

@SparkQA
Copy link

SparkQA commented Nov 6, 2014

Test build #22997 has started for PR 2968 at commit 92c772c.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 6, 2014

Test build #22997 has finished for PR 2968 at commit 92c772c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class OutputMetrics(writeMethod: DataWriteMethod.Value)

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22997/
Test FAILed.

@kayousterhout
Copy link
Contributor

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Nov 6, 2014

Test build #23018 has started for PR 2968 at commit 92c772c.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 6, 2014

Test build #23018 has finished for PR 2968 at commit 92c772c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class OutputMetrics(writeMethod: DataWriteMethod.Value)

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23018/
Test PASSed.

: Option[() => Long] = {
try {
val threadStats = getFileSystemThreadStatistics(path, conf)
val getBytesReadMethod = getFileSystemThreadStatisticsMethod("getBytesWritten")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getBytesWrittenMethod? And same with the below naming?

@kayousterhout
Copy link
Contributor

Sorry for not noticing the two other things earlier!!

val fs = FileSystem.getLocal(new Configuration())
val outPath = new Path(fs.getWorkingDirectory, "outdir")

if (SparkHadoopUtil.get.getFSBytesWrittenOnThreadCallback(outPath, fs.getConf).isDefined) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is a little hairy because it could break without anyone noticing (since I don't think Jenkins runs a new enough version for this code to be run?). But I don't have a great solution -- you could mock out a bunch of the function calls, but then you're left not really testing anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had this same thought, but I can't think of any good way to prevent this. Note that the other tests in this class have the same problem.

@SparkQA
Copy link

SparkQA commented Nov 8, 2014

Test build #23107 has started for PR 2968 at commit dce4784.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 9, 2014

Test build #23107 has finished for PR 2968 at commit dce4784.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class OutputMetrics(writeMethod: DataWriteMethod.Value)

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23107/
Test PASSed.

@kayousterhout
Copy link
Contributor

Thanks Sandy! I've merged this into master and 1.2.

asfgit pushed a commit that referenced this pull request Nov 10, 2014
Author: Sandy Ryza <sandy@cloudera.com>

This patch had conflicts when merged, resolved by
Committer: Kay Ousterhout <kayousterhout@gmail.com>

Closes #2968 from sryza/sandy-spark-3179 and squashes the following commits:

dce4784 [Sandy Ryza] More review feedback
8d350d1 [Sandy Ryza] Fix test against Hadoop 2.5+
e7c74d0 [Sandy Ryza] More review feedback
6cff9c4 [Sandy Ryza] Review feedback
fb2dde0 [Sandy Ryza] SPARK-3179

(cherry picked from commit 3c2cff4)
Signed-off-by: Kay Ousterhout <kayousterhout@gmail.com>
@asfgit asfgit closed this in 3c2cff4 Nov 10, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants