Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4194] [core] Make SparkContext initialization exception-safe. #5335

Closed
wants to merge 12 commits into from

Conversation

vanzin
Copy link
Contributor

@vanzin vanzin commented Apr 2, 2015

SparkContext has a very long constructor, where multiple things are
initialized, multiple threads are spawned, and multiple opportunities
for exceptions to be thrown exist. If one of these happens at an
innoportune time, lots of garbage tends to stick around.

This patch re-organizes SparkContext so that its internal state is
initialized in a big "try" block. The fields keeping state are now
completely private to SparkContext, and are "vars", because Scala
doesn't allow you to initialize a val later. The existing API interface
is kept by turning vals into defs (which works because Scala guarantees
the same binary interface for those).

On top of that, a few things in other areas were changed to avoid more
things leaking:

  • Executor was changed to explicitly wait for the heartbeat thread to
    stop. LocalBackend was changed to wait for the "StopExecutor"
    message to be received, since otherwise there could be a race
    between that message arriving and the actor system being shut down.
  • ConnectionManager could possibly hang during shutdown, because an
    interrupt at the wrong moment could cause the selector thread to
    still call select and then wait forever. So also wake up the
    selector so that this situation is avoided.

Marcelo Vanzin added 5 commits April 2, 2015 12:27
This fixes the thread leak. I also changed the unit test to keep track
of allocated contexts and making sure they're closed after tests are
run; this is needed since some tests use this pattern:

    val sc = createContext()
    doSomethingThatMayThrow()
    sc.stop()
SparkContext has a very long constructor, where multiple things are
initialized, multiple threads are spawned, and multiple opportunities
for exceptions to be thrown exist. If one of these happens at an
innoportune time, lots of garbage tends to stick around.

This patch re-organizes SparkContext so that its internal state is
initialized in a big "try" block. The fields keeping state are now
completely private to SparkContext, and are "vars", because Scala
doesn't allow you to initialize a val later. The existing API interface
is kept by turning vals into defs (which works because Scala guarantees
the same binary interface for those).

On top of that, a few things in other areas were changed to avoid more
things leaking:

- Executor was changed to explicitly wait for the heartbeat thread to
  stop. LocalBackend was changed to wait for the "StopExecutor"
  message to be received, since otherwise there could be a race
  between that message arriving and the actor system being shut down.
- ConnectionManager could possibly hang during shutdown, because an
  interrupt at the wrong moment could cause the selector thread to
  still call select and then wait forever. So also wake up the
  selector so that this situation is avoided.
@vanzin
Copy link
Contributor Author

vanzin commented Apr 2, 2015

Note the PR contains the commits from #5311. I hope once that is pushed that github will figure things out. (If not I'll rebase manually.) So if you want to skip those changes, just look at the last commit in the list.

With both of these PRs, I was able to run the core/ unit tests and verify that:

  • no executor allocator threads were left behind
  • no driver heartbeater threads were left behind
  • no "AkkaAskTimeout" exceptions show up in the unit test logs

I tried to run MiMA checks locally and they look ok, but let's see what jenkins says.

@SparkQA
Copy link

SparkQA commented Apr 2, 2015

Test build #29621 has finished for PR 5335 at commit 8caa8b3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

Conflicts:
	core/src/main/scala/org/apache/spark/SparkContext.scala
	core/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala
@vanzin
Copy link
Contributor Author

vanzin commented Apr 3, 2015

Oops, borked merge, fixing...

@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #29684 has finished for PR 5335 at commit c671c46.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@vanzin
Copy link
Contributor Author

vanzin commented Apr 3, 2015

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #29685 has finished for PR 5335 at commit 6b73fcb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

Conflicts:
	core/src/main/scala/org/apache/spark/SparkContext.scala
@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #29686 has finished for PR 5335 at commit 6b73fcb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 3, 2015

Test build #29692 has finished for PR 5335 at commit 2621609.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

Conflicts:
	core/src/main/scala/org/apache/spark/SparkContext.scala
	core/src/main/scala/org/apache/spark/executor/Executor.scala
	core/src/main/scala/org/apache/spark/scheduler/local/LocalBackend.scala
@SparkQA
Copy link

SparkQA commented Apr 7, 2015

Test build #29764 has finished for PR 5335 at commit 408dada.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@vanzin
Copy link
Contributor Author

vanzin commented Apr 10, 2015

Ping?

@srowen
Copy link
Member

srowen commented Apr 10, 2015

I tend to trust your hand on this. This is a big change and it's hard to match up the existing logic with new logic in the diff, though it looks like that was the intent and effect from spot-checking some elements. That the tests have passed several times is a good sign. The reorganization is significantly positive since the fields, and their initialization, are clearly grouped now.

One minor style thing, why the _member naming syntax, where it's not otherwise used in Spark? just to make it crystal clear here what's a member?

I think the additional changes look reasonable, like using a Java Executor in the, well, Executor.

I favor this though weakly on the grounds that I'm mostly relying on tests for correctness. The intent is sound. @rxin @pwendell do you have any thoughts on this one?

@vanzin
Copy link
Contributor Author

vanzin commented Apr 10, 2015

One minor style thing, why the _member naming syntax

That's actually used in lots of places in Spark. It's used when some variable / field name conflicts with a def, which is the case in this change.

@srowen
Copy link
Member

srowen commented Apr 10, 2015

OK right, and that's true of all of them here.

@pwendell
Copy link
Contributor

ping @JoshRosen - I think he's proposed this exact change to me in the past.

@JoshRosen
Copy link
Contributor

See the description of PR #3121 for my previous discussion of this. If we want to avoid introducing vars in SparkContext, one alternative would be to move the creation of these components outside of the SparkContext constructor and into a method in the companion object, putting the try / catch blocks there. This lets us isolate the mutability into a single method, so things are mutable while we're constructing SparkContext but are immutable once the object has been fully constructed. I'm not sure whether this approach would be easier or harder to understand than the mutable vars + getters used here.

@vanzin
Copy link
Contributor Author

vanzin commented Apr 12, 2015

That aproach would be a lot more complicated. The first reason why it would be complicated is that you'd need an uber-constructor in SparkContext that takes all the initialized internal values. Unless there's some fancy Scala feature I'm not aware of, that in itself is scary as hell, and would mean the other constructors would be similarly ugly in that they'd have to call the companion object.

It would also cause (even more) duplication of the declaration of these things, since they'd have to be declared in the companion object's method too.

Finally, it would complicate stop(), because it would have to either be copy & pasted into the companion object so that it cleans up after an exception, or you'd need a stop() method in the companion object that takes all arguments as parameters, so that the SparkContext class can call it.

So while I would love to simplify the code in SparkContext, the alternative suggestion, as far as I can see, does nothing towards that.

And that's why I chose private vars. It's not optimal, and I really wish Scala would allow me to initialize a val after its declaration, like Java does. But it's the easiest approach, and it doesn't expose any mutable SparkContext state that wasn't already exposed before.

@pwendell
Copy link
Contributor

I also feel that the current approach makes more sense than Josh's alternative. Changes to SparkContext get a lot of scrutiny during code review, so clear documentation, IMO is sufficient to ensure this is followed correctly (famous last words). I didn't have time to dive into this to make sure there are no correctness issues, but the broad approach looks good to me... I think it's worth fixing this up.

@srowen
Copy link
Member

srowen commented Apr 14, 2015

Seems like there's support for this change, though needs a rebase @vanzin . Any objections to proceeding after that?

Conflicts:
	core/src/main/scala/org/apache/spark/executor/Executor.scala
	core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala
@SparkQA
Copy link

SparkQA commented Apr 14, 2015

Test build #30260 has finished for PR 5335 at commit 80fc00e.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch removes the following dependencies:
    • RoaringBitmap-0.4.5.jar
    • activation-1.1.jar
    • akka-actor_2.10-2.3.4-spark.jar
    • akka-remote_2.10-2.3.4-spark.jar
    • akka-slf4j_2.10-2.3.4-spark.jar
    • aopalliance-1.0.jar
    • arpack_combined_all-0.1.jar
    • avro-1.7.7.jar
    • breeze-macros_2.10-0.11.2.jar
    • breeze_2.10-0.11.2.jar
    • chill-java-0.5.0.jar
    • chill_2.10-0.5.0.jar
    • commons-beanutils-1.7.0.jar
    • commons-beanutils-core-1.8.0.jar
    • commons-cli-1.2.jar
    • commons-codec-1.10.jar
    • commons-collections-3.2.1.jar
    • commons-compress-1.4.1.jar
    • commons-configuration-1.6.jar
    • commons-digester-1.8.jar
    • commons-httpclient-3.1.jar
    • commons-io-2.1.jar
    • commons-lang-2.5.jar
    • commons-lang3-3.3.2.jar
    • commons-math-2.1.jar
    • commons-math3-3.1.1.jar
    • commons-net-2.2.jar
    • compress-lzf-1.0.0.jar
    • config-1.2.1.jar
    • core-1.1.2.jar
    • curator-client-2.4.0.jar
    • curator-framework-2.4.0.jar
    • curator-recipes-2.4.0.jar
    • gmbal-api-only-3.0.0-b023.jar
    • grizzly-framework-2.1.2.jar
    • grizzly-http-2.1.2.jar
    • grizzly-http-server-2.1.2.jar
    • grizzly-http-servlet-2.1.2.jar
    • grizzly-rcm-2.1.2.jar
    • groovy-all-2.3.7.jar
    • guava-14.0.1.jar
    • guice-3.0.jar
    • hadoop-annotations-2.2.0.jar
    • hadoop-auth-2.2.0.jar
    • hadoop-client-2.2.0.jar
    • hadoop-common-2.2.0.jar
    • hadoop-hdfs-2.2.0.jar
    • hadoop-mapreduce-client-app-2.2.0.jar
    • hadoop-mapreduce-client-common-2.2.0.jar
    • hadoop-mapreduce-client-core-2.2.0.jar
    • hadoop-mapreduce-client-jobclient-2.2.0.jar
    • hadoop-mapreduce-client-shuffle-2.2.0.jar
    • hadoop-yarn-api-2.2.0.jar
    • hadoop-yarn-client-2.2.0.jar
    • hadoop-yarn-common-2.2.0.jar
    • hadoop-yarn-server-common-2.2.0.jar
    • ivy-2.4.0.jar
    • jackson-annotations-2.4.0.jar
    • jackson-core-2.4.4.jar
    • jackson-core-asl-1.8.8.jar
    • jackson-databind-2.4.4.jar
    • jackson-jaxrs-1.8.8.jar
    • jackson-mapper-asl-1.8.8.jar
    • jackson-module-scala_2.10-2.4.4.jar
    • jackson-xc-1.8.8.jar
    • jansi-1.4.jar
    • javax.inject-1.jar
    • javax.servlet-3.0.0.v201112011016.jar
    • javax.servlet-3.1.jar
    • javax.servlet-api-3.0.1.jar
    • jaxb-api-2.2.2.jar
    • jaxb-impl-2.2.3-1.jar
    • jcl-over-slf4j-1.7.10.jar
    • jersey-client-1.9.jar
    • jersey-core-1.9.jar
    • jersey-grizzly2-1.9.jar
    • jersey-guice-1.9.jar
    • jersey-json-1.9.jar
    • jersey-server-1.9.jar
    • jersey-test-framework-core-1.9.jar
    • jersey-test-framework-grizzly2-1.9.jar
    • jets3t-0.7.1.jar
    • jettison-1.1.jar
    • jetty-util-6.1.26.jar
    • jline-0.9.94.jar
    • jline-2.10.4.jar
    • jodd-core-3.6.3.jar
    • json4s-ast_2.10-3.2.10.jar
    • json4s-core_2.10-3.2.10.jar
    • json4s-jackson_2.10-3.2.10.jar
    • jsr305-1.3.9.jar
    • jtransforms-2.4.0.jar
    • jul-to-slf4j-1.7.10.jar
    • kryo-2.21.jar
    • log4j-1.2.17.jar
    • lz4-1.2.0.jar
    • management-api-3.0.0-b012.jar
    • mesos-0.21.0-shaded-protobuf.jar
    • metrics-core-3.1.0.jar
    • metrics-graphite-3.1.0.jar
    • metrics-json-3.1.0.jar
    • metrics-jvm-3.1.0.jar
    • minlog-1.2.jar
    • netty-3.8.0.Final.jar
    • netty-all-4.0.23.Final.jar
    • objenesis-1.2.jar
    • opencsv-2.3.jar
    • oro-2.0.8.jar
    • paranamer-2.6.jar
    • parquet-column-1.6.0rc3.jar
    • parquet-common-1.6.0rc3.jar
    • parquet-encoding-1.6.0rc3.jar
    • parquet-format-2.2.0-rc1.jar
    • parquet-generator-1.6.0rc3.jar
    • parquet-hadoop-1.6.0rc3.jar
    • parquet-jackson-1.6.0rc3.jar
    • protobuf-java-2.4.1.jar
    • protobuf-java-2.5.0-spark.jar
    • py4j-0.8.2.1.jar
    • pyrolite-2.0.1.jar
    • quasiquotes_2.10-2.0.1.jar
    • reflectasm-1.07-shaded.jar
    • scala-compiler-2.10.4.jar
    • scala-library-2.10.4.jar
    • scala-reflect-2.10.4.jar
    • scalap-2.10.4.jar
    • scalatest_2.10-2.2.1.jar
    • slf4j-api-1.7.10.jar
    • slf4j-log4j12-1.7.10.jar
    • snappy-java-1.1.1.6.jar
    • spark-bagel_2.10-1.4.0-SNAPSHOT.jar
    • spark-catalyst_2.10-1.4.0-SNAPSHOT.jar
    • spark-core_2.10-1.4.0-SNAPSHOT.jar
    • spark-graphx_2.10-1.4.0-SNAPSHOT.jar
    • spark-launcher_2.10-1.4.0-SNAPSHOT.jar
    • spark-mllib_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-common_2.10-1.4.0-SNAPSHOT.jar
    • spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar
    • spark-repl_2.10-1.4.0-SNAPSHOT.jar
    • spark-sql_2.10-1.4.0-SNAPSHOT.jar
    • spark-streaming_2.10-1.4.0-SNAPSHOT.jar
    • spire-macros_2.10-0.7.4.jar
    • spire_2.10-0.7.4.jar
    • stax-api-1.0.1.jar
    • stream-2.7.0.jar
    • tachyon-0.5.0.jar
    • tachyon-client-0.5.0.jar
    • uncommons-maths-1.2.2a.jar
    • unused-1.0.0.jar
    • xmlenc-0.52.jar
    • xz-1.0.jar
    • zookeeper-3.4.5.jar

@SparkQA
Copy link

SparkQA commented Apr 15, 2015

Test build #30293 has finished for PR 5335 at commit 746b661.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants