Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BAM header is not getting set on partition 0 with headerless BAM output format #916

Closed
fnothaft opened this issue Jan 12, 2016 · 2 comments

Comments

@fnothaft
Copy link
Member

The bug that will not die... Reported by @almussel. See #676, #691, #711, #712, #721...

WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:       16/01/11 22:40:30 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 4.0 (TID 4, 172.31.16.110): java.lang.AssertionError: assertion failed: Cannot return header if not attached.
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at scala.Predef$.assert(Predef.scala:179)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at org.bdgenomics.adam.rdd.read.ADAMBAMOutputFormat$.getHeader(ADAMBAMOutputFormat.scala:60)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at org.bdgenomics.adam.rdd.read.ADAMBAMOutputFormat.<init>(ADAMBAMOutputFormat.scala:68)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at java.lang.Class.newInstance(Class.java:383)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at org.apache.spark.rdd.InstrumentedOutputFormat.<init>(InstrumentedOutputFormat.scala:33)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at org.bdgenomics.adam.rdd.read.InstrumentedADAMBAMOutputFormat.<init>(ADAMBAMOutputFormat.scala:71)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at java.lang.Class.newInstance(Class.java:383)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsNewAPIHadoopDataset$1$anonfun$12.apply(PairRDDFunctions.scala:1020)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsNewAPIHadoopDataset$1$anonfun$12.apply(PairRDDFunctions.scala:1014)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at org.apache.spark.scheduler.Task.run(Task.scala:88)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
WARNING:toil.leader:8ecfbeab-88ba-45e6-a3a4-ff1270e13051:               at java.lang.Thread.run(Thread.java:745)

@almussel will get us Spark logs tomorrow.

@heuermh
Copy link
Member

heuermh commented Jan 12, 2016

FYI @tomwhite added support for merging BAMs in GATK4, see ReadsSparkSink.java

@fnothaft
Copy link
Member Author

I've got a fix for this prepped; just cleaning up a unit test failure, should be good to go in 15min.

fnothaft added a commit to fnothaft/adam that referenced this issue Jan 12, 2016
Resolves bigdatagenomics#916. Makes several modifications that should eliminate the header
attach issue when writing back to SAM/BAM:

* Writes the SAM/BAM header as a single file.
* Instead of trying to attach the SAM/BAM header to the output format via a
  singleton object, we pass the path to the SAM/BAM header file via the Hadoop
  configuration.
* The output format reads the header from HDFS when creating the record writer.
* At the end, once we've written the full RDD and the header file, we merge all
  via Hadoop's FsUtil.
fnothaft added a commit to fnothaft/adam that referenced this issue Jan 12, 2016
Resolves bigdatagenomics#916. Makes several modifications that should eliminate the header
attach issue when writing back to SAM/BAM:

* Writes the SAM/BAM header as a single file.
* Instead of trying to attach the SAM/BAM header to the output format via a
  singleton object, we pass the path to the SAM/BAM header file via the Hadoop
  configuration.
* The output format reads the header from HDFS when creating the record writer.
* At the end, once we've written the full RDD and the header file, we merge all
  via Hadoop's FsUtil.
fnothaft added a commit to fnothaft/adam that referenced this issue Jan 13, 2016
Resolves bigdatagenomics#916. Makes several modifications that should eliminate the header
attach issue when writing back to SAM/BAM:

* Writes the SAM/BAM header as a single file.
* Instead of trying to attach the SAM/BAM header to the output format via a
  singleton object, we pass the path to the SAM/BAM header file via the Hadoop
  configuration.
* The output format reads the header from HDFS when creating the record writer.
* At the end, once we've written the full RDD and the header file, we merge all
  via Hadoop's FsUtil.
fnothaft added a commit to fnothaft/adam that referenced this issue Jan 13, 2016
Resolves bigdatagenomics#916. Makes several modifications that should eliminate the header
attach issue when writing back to SAM/BAM:

* Writes the SAM/BAM header as a single file.
* Instead of trying to attach the SAM/BAM header to the output format via a
  singleton object, we pass the path to the SAM/BAM header file via the Hadoop
  configuration.
* The output format reads the header from HDFS when creating the record writer.
* At the end, once we've written the full RDD and the header file, we merge all
  via Hadoop's FsUtil.
fnothaft added a commit to fnothaft/adam that referenced this issue Jan 14, 2016
Resolves bigdatagenomics#916. Makes several modifications that should eliminate the header
attach issue when writing back to SAM/BAM:

* Writes the SAM/BAM header as a single file.
* Instead of trying to attach the SAM/BAM header to the output format via a
  singleton object, we pass the path to the SAM/BAM header file via the Hadoop
  configuration.
* The output format reads the header from HDFS when creating the record writer.
* At the end, once we've written the full RDD and the header file, we merge all
  via Hadoop's FsUtil.
fnothaft added a commit to fnothaft/adam that referenced this issue Jan 14, 2016
Resolves bigdatagenomics#916. Makes several modifications that should eliminate the header
attach issue when writing back to SAM/BAM:

* Writes the SAM/BAM header as a single file.
* Instead of trying to attach the SAM/BAM header to the output format via a
  singleton object, we pass the path to the SAM/BAM header file via the Hadoop
  configuration.
* The output format reads the header from HDFS when creating the record writer.
* At the end, once we've written the full RDD and the header file, we merge all
  via Hadoop's FsUtil.
fnothaft added a commit to fnothaft/adam that referenced this issue Jan 14, 2016
Resolves bigdatagenomics#916. Makes several modifications that should eliminate the header
attach issue when writing back to SAM/BAM:

* Writes the SAM/BAM header as a single file.
* Instead of trying to attach the SAM/BAM header to the output format via a
  singleton object, we pass the path to the SAM/BAM header file via the Hadoop
  configuration.
* The output format reads the header from HDFS when creating the record writer.
* At the end, once we've written the full RDD and the header file, we merge all
  via Hadoop's FsUtil.
fnothaft added a commit to fnothaft/adam that referenced this issue Jan 14, 2016
Resolves bigdatagenomics#916. Makes several modifications that should eliminate the header
attach issue when writing back to SAM/BAM:

* Writes the SAM/BAM header as a single file.
* Instead of trying to attach the SAM/BAM header to the output format via a
  singleton object, we pass the path to the SAM/BAM header file via the Hadoop
  configuration.
* The output format reads the header from HDFS when creating the record writer.
* At the end, once we've written the full RDD and the header file, we merge all
  via Hadoop's FsUtil.
fnothaft added a commit to fnothaft/adam that referenced this issue Jan 14, 2016
Resolves bigdatagenomics#916. Makes several modifications that should eliminate the header
attach issue when writing back to SAM/BAM:

* Writes the SAM/BAM header as a single file.
* Instead of trying to attach the SAM/BAM header to the output format via a
  singleton object, we pass the path to the SAM/BAM header file via the Hadoop
  configuration.
* The output format reads the header from HDFS when creating the record writer.
* At the end, once we've written the full RDD and the header file, we merge all
  via Hadoop's FsUtil.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants