Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2778] [yarn] Add yarn integration tests. #2257

Closed
wants to merge 9 commits into from

Conversation

vanzin
Copy link
Contributor

@vanzin vanzin commented Sep 3, 2014

This patch adds a couple of, currently, very simple integration tests
to make sure both client and cluster modes are working. The tests don't
do much yet other than run a simple job, but the plan is to enhance
them after we get the framework in.

The cluster tests are noisy, so redirect all log output to a file
like other tests do. Copying the conf around sucks but it's less
work than messing with maven/sbt and having to clean up other
projects.

Note the test is only added for yarn-stable. The code compiles
against yarn-alpha but there are two issues I ran into that I
could not overcome:

  • an old netty dependency kept creeping into the classpath and
    causing akka to not work, when using sbt; the old netty was
    correctly suppressed under maven.
  • MiniYARNCluster kept failing to execute containers because it
    did not create the NM's local dir itself; this is apparently
    a known behavior, but I'm not sure how to work around it.

None of those issues are present with the stable Yarn.

Also, these tests are a little slow to run. Apparently Spark doesn't
yet tag tests (so that these could be isolated in a "slow" batch),
so this is something to keep in mind.

This patch adds a couple of, currently, very simple integration tests
to make sure both client and cluster modes are working. The tests don't
do much yet other than run a simple job, but the plan is to enhance
them after we get the framework in.

The cluster tests are noisy, so redirect all log output to a file
like other tests do. Copying the conf around sucks but it's less
work than messing with maven/sbt and having to clean up other
projects.

Note the test is only added for yarn-stable. The code compiles
against yarn-alpha but there are two issues I ran into that I
could not overcome:
- and old netty dependency kept creeping into the classpath and
  causing akka to not work, when using sbt; the old netty was
  correctly suppressed under maven.
- MiniYARNCluster kept failing to execute containers because it
  did not create the NM's local dir itself; this is apparently
  a known behavior, but I'm not sure how to work around it.

None of those issues are present with the stable Yarn.

Also, these tests are a little slow to run. Apparently Spark doesn't
yet tag tests (so that these could be isolated in a "slow" batch),
so this is something to keep in mind.
@vanzin
Copy link
Contributor Author

vanzin commented Sep 3, 2014

Suggestions about making this work in yarn-alpha are welcome; I was hoping this would allow me to test yarn-alpha changes, but I failed. :-/

@vanzin
Copy link
Contributor Author

vanzin commented Sep 5, 2014

Jenkins, test this please.

1 similar comment
@vanzin
Copy link
Contributor Author

vanzin commented Sep 6, 2014

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Sep 6, 2014

QA tests have started for PR 2257 at commit add8416.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 6, 2014

QA tests have finished for PR 2257 at commit add8416.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class YarnClusterSuite extends FunSuite with BeforeAndAfterAll with Matchers

@vanzin
Copy link
Contributor Author

vanzin commented Sep 8, 2014

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Sep 8, 2014

QA tests have started for PR 2257 at commit add8416.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 8, 2014

QA tests have finished for PR 2257 at commit add8416.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor Author

vanzin commented Sep 9, 2014

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Sep 9, 2014

QA tests have started for PR 2257 at commit 68fbbbf.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 9, 2014

QA tests have finished for PR 2257 at commit 68fbbbf.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tgravescs
Copy link
Contributor

I think its ok to not have them for yarn-alpha.

how slow are they? You couldn't find a way to have them run separately then and not with normal unit tests?

@vanzin
Copy link
Contributor Author

vanzin commented Sep 9, 2014

They take about 1min to run on my machine. Most of the time is setting up and tearing down the MiniYARNCluster instance, which is only done once, so I hope adding more tests won't add so much on top of that.

scalatest has tags that can be used to choose which tests to run, but Spark doesn't use them anywhere. I can look at adding them to these (and disabling them in the default run) if people think that's too long.

@andrewor14
Copy link
Contributor

Thanks @vanzin, this is really cool. If it takes a long time to run this test, we can always enable it only if yarn files are modified (SQL already does this). Then we don't have to worry about further inflating the test time if we want to test more features.

@tgravescs
Copy link
Contributor

@vanzin this mostly looks good. I've been trying to run it locally but I have been having lots of issues with the unit tests lately. Hopefully have this done and check in later today.

Thanks for working on this, we definitely need more tests on the yarn side. We may need to be careful about adding more in the future if they take to longer, perhaps we can look at supporting the different types as you mention.

@tgravescs
Copy link
Contributor

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Sep 12, 2014

QA tests have started for PR 2257 at commit 68fbbbf.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 12, 2014

QA tests have finished for PR 2257 at commit 68fbbbf.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class YarnClusterSuite extends FunSuite with BeforeAndAfterAll with Matchers

yarnCluster.start()

val sysProps = sys.props.map { case (k, v) => (k, v) }
sysProps.foreach { case (k, v) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just making a copy of sys.props? Why not just do toMap?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, why not just take a snapshot of sys.props here and restore it later? Might be simpler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I was trying to be paranoid and clean up any "spark.*" options from sys.props before running these tests. That may not be the best idea (since I'd be cleaning up "spark.test" properties set by the build scripts), so I went with the copy instead.

@pwendell
Copy link
Contributor

It's great to see us adding tests here. @vanzin how long do these tests take, roughly? We might have to only run these in certain situations if they take a long time.

@tgravescs
Copy link
Contributor

@pwendell see the comments above:

They take about 1min to run on my machine. Most of the time is setting up and tearing down the MiniYARNCluster instance, which is only done once, so I hope adding more tests won't add so much on top of that.

scalatest has tags that can be used to choose which tests to run, but Spark doesn't use them anywhere. I can look at adding them to these (and disabling them in the default run) if people think that's too long.

@vanzin
Copy link
Contributor Author

vanzin commented Sep 15, 2014

I reduced the number of executors from the cluster test (from 4 to 1, something I meant to do before but forgot) and shaved ~15s on my machine. Tests still run sort of slow, but it's a little better now.

@SparkQA
Copy link

SparkQA commented Sep 15, 2014

QA tests have started for PR 2257 at commit f01517c.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 15, 2014

QA tests have finished for PR 2257 at commit f01517c.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val status = new File(args(1))
var result = "failure"
try {
val data = sc.parallelize(1 to 4).map(i => i).collect().toSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the map to identity here do anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just a way to trigger an actual job. I guess I could do parallelize(1 to 4, 4).collect() to achieve the same thing.

@andrewor14
Copy link
Contributor

Hey @vanzin I left a few minor comments, but this LGTM overall. I haven't verified the dependency logic, however.

@SparkQA
Copy link

SparkQA commented Sep 23, 2014

QA tests have started for PR 2257 at commit 67f5b02.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 23, 2014

QA tests have finished for PR 2257 at commit 67f5b02.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 23, 2014

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20729/

@SparkQA
Copy link

SparkQA commented Sep 23, 2014

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20731/

@vanzin
Copy link
Contributor Author

vanzin commented Sep 23, 2014

Jenkins, retest this please.

Is there any way to access unit-tests.log from jenkins?

@andrewor14
Copy link
Contributor

I think you have to SSH into the machines to get them, but even if you have them they're kinda jumbled because we log everything to the same file. Actually here we might want to do something like what we did in #2108 so we know what's wrong when the test fails.

@vanzin
Copy link
Contributor Author

vanzin commented Sep 23, 2014

Having everything in the same file is ok. I think the trick in this case is to convince the child processes launched by Yarn to not use the common log4j configuration, and instead log to stderr (so that Yarn will take care of the logs). Let me look at doing that.

@SparkQA
Copy link

SparkQA commented Sep 23, 2014

QA tests have started for PR 2257 at commit 5c2b56f.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 24, 2014

QA tests have finished for PR 2257 at commit 5c2b56f.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 24, 2014

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20733/

@vanzin
Copy link
Contributor Author

vanzin commented Sep 24, 2014

I'll merge with master and see if I can reproduce the failure...

@vanzin
Copy link
Contributor Author

vanzin commented Sep 24, 2014

Yep, fails locally too after the merge. Let me look.

This was added by the fix to SPARK-2668: a stray equal sign was
creating a bad system property, and the Jetty initialization
code was tripping on it.

Also fixed a "MatchError" that could be hit in ApplicationMaster.
@vanzin
Copy link
Contributor Author

vanzin commented Sep 24, 2014

I found the problem - it was caused by a recent PR that basically broke yarn-cluster mode...

@SparkQA
Copy link

SparkQA commented Sep 24, 2014

QA tests have started for PR 2257 at commit 6d5b84e.

  • This patch merges cleanly.

@andrewor14
Copy link
Contributor

Ah good catch. The latest changes LGTM if you get the tests to pass.

@SparkQA
Copy link

SparkQA commented Sep 24, 2014

QA tests have finished for PR 2257 at commit 6d5b84e.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 24, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20758/

@vanzin
Copy link
Contributor Author

vanzin commented Sep 24, 2014

Yay, and the test already found a bug before even being checked in.

@pwendell
Copy link
Contributor

Thanks Marcelo and @andrewor14 for review - I'll merge this.

@asfgit asfgit closed this in b848771 Sep 25, 2014
@vanzin vanzin deleted the yarn-tests branch September 30, 2014 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants