Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20247][CORE] Add jar but this jar is missing later shouldn't affect jobs that doesn't use this jar #17558

Closed
wants to merge 1 commit into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Apr 7, 2017

What changes were proposed in this pull request?

Catch exception when jar is missing, as SPARK-20247 described.

How was this patch tested?

unit tests and manual tests

@SparkQA
Copy link

SparkQA commented Apr 7, 2017

Test build #75590 has started for PR 17558 at commit de5b5fe.

@jerryshao
Copy link
Contributor

jerryshao commented Apr 7, 2017

@wangyum what if the task requires that jar? From your fix what I got is that you catch the exception and make it warning log instead, but what if that task requires the jar, will your fix suppress the exception or defer the exception to others like ClassNotFound in the task runtime?

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75590/
Test FAILed.

@felixcheung
Copy link
Member

agreed, why would the jar be missing?

@wangyum
Copy link
Member Author

wangyum commented Apr 9, 2017

SparkContext support add jar, but doesn't support uninstall jar. Imagine that I have a spark-sql or spark-thriftserver and I need upgrade UDF from spark-udfs-v1.0.0.jar to spark-udfs-v1.1.0.jar, I must restart thriftserver to make sure use spark-udfs-v1.1.0.jar.
I can manual delete spark-udfs-v1.0.0.jar to make it lose efficacy with this PR and don' t need restart thriftserver.

@jerryshao
Copy link
Contributor

@wangyum , the fix of your PR is more like a bug fix, whereas the comment above is actually a feature request, these two things are not completely matched. I would suggest to focus on shriftserver side to address your requirement, not a bandaid like fix in the executor side.

@HyukjinKwon
Copy link
Member

gentle ping @wangyum, is there any opinion on ^ ?

@wangyum
Copy link
Member Author

wangyum commented May 11, 2017

@HyukjinKwon
This implementation is not elegant, but can solve my problem, I'll apply it to my own branch later.

@wangyum wangyum closed this May 14, 2017
@mridulm
Copy link
Contributor

mridulm commented May 14, 2017

Just for clarity, this approach is broken - removal of jar will not cause already loaded classes in jvm to be removed. Pursuing this approach is brittle and will definitely break when classes from the jar have already been used : causing runtime failures.

Proper resolution would be for appropriate classloader to be dropped and new one created with the replaced jar (refer to how j2ee containers handle this scenario for war files (for example)).

@barrybecker4
Copy link

barrybecker4 commented May 20, 2017

I just checked out the spark branch-2.1 branch. I can build everything successfully if I do not run the tests, but if I run the tests, I see this unit test failure:

 add jar with invalid path *** FAILED ***
  java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown.
  at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:195)
  at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:153)
  at org.apache.spark.storage.DiskBlockManager.addShutdownHook(DiskBlockManager.scala:145)
  at org.apache.spark.storage.DiskBlockManager.<init>(DiskBlockManager.scala:51)
  at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:85)
  at org.apache.spark.SparkEnv$.create(SparkEnv.scala:349)
  at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:174)
  at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257)
  at org.apache.spark.SparkContext.<init>(SparkContext.scala:432)
  at org.apache.spark.SparkContextSuite$$anonfun$14.apply$mcV$sp(SparkContextSuite.scala:298)
  ...
- Cancelling job group should not cause SparkContext to shutdown (SPARK-6414) *** FAILED ***
  java.lang.NullPointerException:
  at org.apache.spark.SparkContextSuite$$anonfun$15.apply$mcV$sp(SparkContextSuite.scala:321)
  at org.apache.spark.SparkContextSuite$$anonfun$15.apply(SparkContextSuite.scala:311)
  at org.apache.spark.SparkContextSuite$$anonfun$15.apply(SparkContextSuite.scala:311)
  at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)
  at org.scalatest.Transformer.apply(Transformer.scala:20)
  at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
  at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
  ...
- Comma separated paths for newAPIHadoopFile/wholeTextFiles/binaryFiles (SPARK-7155) *** FAILED ***
  java.lang.NullPointerException:
  at org.apache.spark.SparkContextSuite$$anonfun$17.apply$mcV$sp(SparkContextSuite.scala:381)
  at org.apache.spark.SparkContextSuite$$anonfun$17.apply(SparkContextSuite.scala:325)
  at org.apache.spark.SparkContextSuite$$anonfun$17.apply(SparkContextSuite.scala:325)
  at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)
  at org.scalatest.Transformer.apply(Transformer.scala:20)
  at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
  at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
  ...
- Default path for file based RDDs is properly set (SPARK-12517) *** FAILED ***
  java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown.
  at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:195)
  at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:153)
  at org.apache.spark.storage.DiskBlockManager.addShutdownHook(DiskBlockManager.scala:145)
  at org.apache.spark.storage.DiskBlockManager.<init>(DiskBlockManager.scala:51)
  at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:85)
  at org.apache.spark.SparkEnv$.create(SparkEnv.scala:349)
  at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:174)
  at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257)
  at org.apache.spark.SparkContext.<init>(SparkContext.scala:432)
  at org.apache.spark.SparkContextSuite$$anonfun$18.apply$mcV$sp(SparkContextSuite.scala:386)

Is the there a problem with the build, or do is there something I can do to fix this locally?

@HyukjinKwon
Copy link
Member

Just to make sure, are they consistently being failed or just flaky tests?

@barrybecker4
Copy link

I ran it again, and got a different failure this time. Still in the core module, but not sure if its before or after the tests that failed the first time.

- caching in memory, replicated
- caching in memory, serialized, replicated
- caching on disk, replicated *** FAILED ***
  java.util.concurrent.TimeoutException: Can't find 2 executors before 30000 milliseconds elapsed
  at org.apache.spark.ui.jobs.JobProgressListener.waitUntilExecutorsUp(JobProgressListener.scala:584)
  at org.apache.spark.DistributedSuite.org$apache$spark$DistributedSuite$$testCaching(DistributedSuite.scala:154)
  at org.apache.spark.DistributedSuite$$anonfun$32$$anonfun$apply$1.apply$mcV$sp(DistributedSuite.scala:191)
  at org.apache.spark.DistributedSuite$$anonfun$32$$anonfun$apply$1.apply(DistributedSuite.scala:191)
  at org.apache.spark.DistributedSuite$$anonfun$32$$anonfun$apply$1.apply(DistributedSuite.scala:191)
  at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)
  at org.scalatest.Transformer.apply(Transformer.scala:20)
  ...

I'll try again. It takes a long time to run each time. Over 20 minutes just to get to the failed test, and its not even 1/3 of the way through all the tests.

@barrybecker4
Copy link

The 3rd time I ran, it ran for 42 minutes, and failed further on in catalyst tests. Like you say, it does seem that the tests are flaky, but why? The failures seem so random.

- GenerateOrdering with FloatType
- GenerateOrdering with ShortType
- SPARK-16845: GeneratedClass$SpecificOrdering grows beyond 64 KB *** FAILED ***
  com.google.common.util.concurrent.ExecutionError: java.lang.StackOverflowError
  at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2261)
  at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
  at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
  at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
  at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:905)
  at org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.create(GenerateOrdering.scala:188)
  at org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.create(GenerateOrdering.scala:43)
  at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:889)
  at org.apache.spark.sql.catalyst.expressions.OrderingSuite$$anonfun$1.apply$mcV$sp(OrderingSuite.scala:138)
  at org.apache.spark.sql.catalyst.expressions.OrderingSuite$$anonfun$1.apply(OrderingSuite.scala:131)
  ...
  Cause: java.lang.StackOverflowError:
  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:370)
  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
  at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:541)
  ...

@barrybecker4
Copy link

The 4th time it failed here again:

- caching on disk, replicated
- caching in memory and disk, replicated *** FAILED ***
  java.util.concurrent.TimeoutException: Can't find 2 executors before 30000 milliseconds elapsed
  at org.apache.spark.ui.jobs.JobProgressListener.waitUntilExecutorsUp(JobProgressListener.scala:584)
  at org.apache.spark.DistributedSuite.org$apache$spark$DistributedSuite$$testCaching(DistributedSuite.scala:154)
  at org.apache.spark.DistributedSuite$$anonfun$32$$anonfun$apply$1.apply$mcV$sp(DistributedSuite.scala:191)
  at org.apache.spark.DistributedSuite$$anonfun$32$$anonfun$apply$1.apply(DistributedSuite.scala:191)
  at org.apache.spark.DistributedSuite$$anonfun$32$$anonfun$apply$1.apply(DistributedSuite.scala:191)
  at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)
  at org.scalatest.Transformer.apply(Transformer.scala:20)

@barrybecker4
Copy link

It continues to fail with one of the above errors. Here is the command I use to build.
./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.5 package

@barrybecker4
Copy link

I ran one more time and the tests hung on serializer manager integration.

:
- shuffle encryption key length should be 128 by default
- create 256-bit key
- create key with invalid length
- serializer manager integration

I will checkout based on the 2.1.1 tag next. That should be stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants