BQSR on C835.HCC1143_BL.4 uses excessive amount of driver memory #714

Closed
beaunorgeot opened this Issue Jun 12, 2015 · 3 comments

Comments

Projects
None yet
2 participants
@beaunorgeot
Contributor

beaunorgeot commented Jun 12, 2015

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (1024.8 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)

As a reference, the GATK BQSR recal.table is only 12MB for a 234GB input file.

ADAM release 2.10_17.0 BQSR on an 8GB input file, running on 5 r3.2xlarge slaves, fails ~ 1/4 of the way the through.

Setting spark.driver.maxResultSize=10g allows bqsr to complete successfully.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jun 13, 2015

Member

Can you send the full log?

Member

fnothaft commented Jun 13, 2015

Can you send the full log?

@beaunorgeot

This comment has been minimized.

Show comment
Hide comment
@beaunorgeot

beaunorgeot Jun 15, 2015

Contributor

15/06/08 20:17:22 INFO scheduler.DAGScheduler: Job 1 failed: aggregate at BaseQualityRecalibration.scala:83, took 447.972250 s

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (1024.8 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)

    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)

    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)

    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)

    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)

    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)

    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)

    at scala.Option.foreach(Option.scala:236)

    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)

    at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)

    at akka.actor.Actor$class.aroundReceive(Actor.scala:465)

    at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)

    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)

    at akka.actor.ActorCell.invoke(ActorCell.scala:487)

    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)

    at akka.dispatch.Mailbox.run(Mailbox.scala:220)

    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)

    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

15/06/08 20:17:22 WARN scheduler.TaskSetManager: Lost task 58.0 in stage 1.0 (TID 99, ip-172-31-22-81.us-west-2.compute.internal): TaskKilled (killed intentionally)

15/06/08 20:17:22 WARN scheduler.TaskSetManager: Lost task 57.0 in stage 1.0 (TID 98, ip-172-31-22-81.us-west-2.compute.internal): TaskKilled (killed intentionally)

Contributor

beaunorgeot commented Jun 15, 2015

15/06/08 20:17:22 INFO scheduler.DAGScheduler: Job 1 failed: aggregate at BaseQualityRecalibration.scala:83, took 447.972250 s

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 48 tasks (1024.8 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)

    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214)

    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203)

    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202)

    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)

    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202)

    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)

    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696)

    at scala.Option.foreach(Option.scala:236)

    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696)

    at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420)

    at akka.actor.Actor$class.aroundReceive(Actor.scala:465)

    at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)

    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)

    at akka.actor.ActorCell.invoke(ActorCell.scala:487)

    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)

    at akka.dispatch.Mailbox.run(Mailbox.scala:220)

    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)

    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

15/06/08 20:17:22 WARN scheduler.TaskSetManager: Lost task 58.0 in stage 1.0 (TID 99, ip-172-31-22-81.us-west-2.compute.internal): TaskKilled (killed intentionally)

15/06/08 20:17:22 WARN scheduler.TaskSetManager: Lost task 57.0 in stage 1.0 (TID 98, ip-172-31-22-81.us-west-2.compute.internal): TaskKilled (killed intentionally)

@fnothaft fnothaft added the wontfix label Jul 6, 2016

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jul 6, 2016

Member

Closing as won't fix. BQSR is expected to take much driver memory.

Member

fnothaft commented Jul 6, 2016

Closing as won't fix. BQSR is expected to take much driver memory.

@fnothaft fnothaft closed this Jul 6, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment