New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BAM header is not getting set on partition 0 with headerless BAM output format #916

Closed
fnothaft opened this Issue Jan 12, 2016 · 2 comments

Comments

Projects
None yet
2 participants
@fnothaft
Member

fnothaft commented Jan 12, 2016

The bug that will not die... Reported by @almussel. See #676, #691, #711, #712, #721...

WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:       16/01/11 22:40:30 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 4.0 (TID 4, 172.31.16.110): java.lang.AssertionError: assertion failed: Cannot return header if not attached.
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at scala.Predef$.assert(Predef.scala:179)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at org.bdgenomics.adam.rdd.read.ADAMBAMOutputFormat$.getHeader(ADAMBAMOutputFormat.scala:60)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at org.bdgenomics.adam.rdd.read.ADAMBAMOutputFormat.<init>(ADAMBAMOutputFormat.scala:68)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at java.lang.Class.newInstance(Class.java:383)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at org.apache.spark.rdd.InstrumentedOutputFormat.<init>(InstrumentedOutputFormat.scala:33)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at org.bdgenomics.adam.rdd.read.InstrumentedADAMBAMOutputFormat.<init>(ADAMBAMOutputFormat.scala:71)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at java.lang.Class.newInstance(Class.java:383)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsNewAPIHadoopDataset$1$anonfun$12.apply(PairRDDFunctions.scala:1020)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsNewAPIHadoopDataset$1$anonfun$12.apply(PairRDDFunctions.scala:1014)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at org.apache.spark.scheduler.Task.run(Task.scala:88)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at java.lang.Thread.run(Thread.java:745)

@almussel will get us Spark logs tomorrow.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Jan 12, 2016

Member

FYI @tomwhite added support for merging BAMs in GATK4, see ReadsSparkSink.java

Member

heuermh commented Jan 12, 2016

FYI @tomwhite added support for merging BAMs in GATK4, see ReadsSparkSink.java

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Jan 12, 2016

Member

I've got a fix for this prepped; just cleaning up a unit test failure, should be good to go in 15min.

Member

fnothaft commented Jan 12, 2016

I've got a fix for this prepped; just cleaning up a unit test failure, should be good to go in 15min.

fnothaft added a commit to fnothaft/adam that referenced this issue Jan 12, 2016

[ADAM-916] New strategy for writing header.
Resolves #916. Makes several modifications that should eliminate the header
attach issue when writing back to SAM/BAM:

* Writes the SAM/BAM header as a single file.
* Instead of trying to attach the SAM/BAM header to the output format via a
  singleton object, we pass the path to the SAM/BAM header file via the Hadoop
  configuration.
* The output format reads the header from HDFS when creating the record writer.
* At the end, once we've written the full RDD and the header file, we merge all
  via Hadoop's FsUtil.

fnothaft added a commit to fnothaft/adam that referenced this issue Jan 12, 2016

[ADAM-916] New strategy for writing header.
Resolves #916. Makes several modifications that should eliminate the header
attach issue when writing back to SAM/BAM:

* Writes the SAM/BAM header as a single file.
* Instead of trying to attach the SAM/BAM header to the output format via a
  singleton object, we pass the path to the SAM/BAM header file via the Hadoop
  configuration.
* The output format reads the header from HDFS when creating the record writer.
* At the end, once we've written the full RDD and the header file, we merge all
  via Hadoop's FsUtil.

fnothaft added a commit to fnothaft/adam that referenced this issue Jan 13, 2016

[ADAM-916] New strategy for writing header.
Resolves #916. Makes several modifications that should eliminate the header
attach issue when writing back to SAM/BAM:

* Writes the SAM/BAM header as a single file.
* Instead of trying to attach the SAM/BAM header to the output format via a
  singleton object, we pass the path to the SAM/BAM header file via the Hadoop
  configuration.
* The output format reads the header from HDFS when creating the record writer.
* At the end, once we've written the full RDD and the header file, we merge all
  via Hadoop's FsUtil.

fnothaft added a commit to fnothaft/adam that referenced this issue Jan 13, 2016

[ADAM-916] New strategy for writing header.
Resolves #916. Makes several modifications that should eliminate the header
attach issue when writing back to SAM/BAM:

* Writes the SAM/BAM header as a single file.
* Instead of trying to attach the SAM/BAM header to the output format via a
  singleton object, we pass the path to the SAM/BAM header file via the Hadoop
  configuration.
* The output format reads the header from HDFS when creating the record writer.
* At the end, once we've written the full RDD and the header file, we merge all
  via Hadoop's FsUtil.

fnothaft added a commit to fnothaft/adam that referenced this issue Jan 14, 2016

[ADAM-916] New strategy for writing header.
Resolves #916. Makes several modifications that should eliminate the header
attach issue when writing back to SAM/BAM:

* Writes the SAM/BAM header as a single file.
* Instead of trying to attach the SAM/BAM header to the output format via a
  singleton object, we pass the path to the SAM/BAM header file via the Hadoop
  configuration.
* The output format reads the header from HDFS when creating the record writer.
* At the end, once we've written the full RDD and the header file, we merge all
  via Hadoop's FsUtil.

fnothaft added a commit to fnothaft/adam that referenced this issue Jan 14, 2016

[ADAM-916] New strategy for writing header.
Resolves #916. Makes several modifications that should eliminate the header
attach issue when writing back to SAM/BAM:

* Writes the SAM/BAM header as a single file.
* Instead of trying to attach the SAM/BAM header to the output format via a
  singleton object, we pass the path to the SAM/BAM header file via the Hadoop
  configuration.
* The output format reads the header from HDFS when creating the record writer.
* At the end, once we've written the full RDD and the header file, we merge all
  via Hadoop's FsUtil.

fnothaft added a commit to fnothaft/adam that referenced this issue Jan 14, 2016

[ADAM-916] New strategy for writing header.
Resolves #916. Makes several modifications that should eliminate the header
attach issue when writing back to SAM/BAM:

* Writes the SAM/BAM header as a single file.
* Instead of trying to attach the SAM/BAM header to the output format via a
  singleton object, we pass the path to the SAM/BAM header file via the Hadoop
  configuration.
* The output format reads the header from HDFS when creating the record writer.
* At the end, once we've written the full RDD and the header file, we merge all
  via Hadoop's FsUtil.

fnothaft added a commit to fnothaft/adam that referenced this issue Jan 14, 2016

[ADAM-916] New strategy for writing header.
Resolves #916. Makes several modifications that should eliminate the header
attach issue when writing back to SAM/BAM:

* Writes the SAM/BAM header as a single file.
* Instead of trying to attach the SAM/BAM header to the output format via a
  singleton object, we pass the path to the SAM/BAM header file via the Hadoop
  configuration.
* The output format reads the header from HDFS when creating the record writer.
* At the end, once we've written the full RDD and the header file, we merge all
  via Hadoop's FsUtil.

fnothaft added a commit to fnothaft/adam that referenced this issue Jan 14, 2016

[ADAM-916] New strategy for writing header.
Resolves #916. Makes several modifications that should eliminate the header
attach issue when writing back to SAM/BAM:

* Writes the SAM/BAM header as a single file.
* Instead of trying to attach the SAM/BAM header to the output format via a
  singleton object, we pass the path to the SAM/BAM header file via the Hadoop
  configuration.
* The output format reads the header from HDFS when creating the record writer.
* At the end, once we've written the full RDD and the header file, we merge all
  via Hadoop's FsUtil.

@heuermh heuermh closed this in #917 Jan 14, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment