-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to load bwa index error when running BwaSpark under version alpha.2-45-ga30af5a #2171
Comments
Hi @huangk3. This is a known bug. The spark pipeline uses a 2bit reference but bwa needs a fasta reference. There's a branch in pr #1981 that adds a |
Actually, on further consideration, I think I've given you the wrong advice. Try just directly using a fasta instead of a 2bit in the --reference parameter. I think the two bit is only necessary for the complete reads pipeline. |
Hi @lbergelson Thanks for the quick response! I just used the fasta as reference, and it still doesn't work. The log only says "null". Using GATK wrapper script /home/kh3/Softwares/gatk/build/install/gatk/bin/gatk null |
Hmn, that might be due to bad default filters. Could you try with |
Hi @lbergelson I added --disableAllReadFilters, the log still says "null". Using GATK wrapper script /home/kh3/Softwares/gatk/build/install/gatk/bin/gatk null |
@huangk3 Hmn. I'm not sure what's going on then. I'd have to take some time to look into it. Unfortunately, this is my last week before I head out for several weeks off and I don't know if I'll be able to get to it before I leave. BWA support is still very experimental and it seems like we have quite a few existing issues with it. We're going to be putting a lot of effort in to the spark tools next quarter, but until then it may not get the attention it needs. |
@huangk3 When you replaced the 2-bit reference with the fasta reference, did you also have matched index files (amb, ann, bwt, pac and sa ) within the same directory as the fasta reference? I believe these index files, and also the |
HI @sooheelee I did try using BWA to index the 2-bit reference, but it doesn't work as well. @lbergelson Do you think the reference loading issue can be fixed soon? like this month? |
It's not likely that the reference loading issue will be fixed this month @huangk3. I'm answering in @lbergelson stead as he's out for a few weeks. This is on the team's radar so there is some chance that it could be but we are unable to say for sure. |
@tedsharpe Think this one is fixed with your new BWA bindings? |
Yes, but... It looks to me as if the index files, which appear to be in the master node's Linux file system in this failing example, are probably not available to the worker nodes. You'd have to copy each of the 5 index files to each of the workers, putting them in the same location on each. The same problem would occur with the new version: The single-image index file will still need to be available to all workers. You could distribute this file with: So, instead, you could copy it to a fixed path, identical on each worker, once up front, and then run your alignment jobs to your heart's content. The new version is a little simpler, because there's just one index file, but otherwise suffers from the same issue: bwa mem only knows how to deal with ordinary file system files -- not HDFS, not GCS -- and so the file must be copied to each worker machine in the cluster. |
Did you solve this issue @huangk3? |
Closing this since there hasn't been any response in a long time. Feel free to re-open if there are updates. |
Below are the contents of my reference folder. The index is there, but I don't know why the tool can't recognize it. Please help, thanks!
kh3@rgcaahauva08091 ~/Resources/genome_b37 $> ls -l genome.*
-rw-rw---- 1 kh3 kh3 784809415 Sep 16 10:16 genome.2bit
-rw-rw---- 1 kh3 kh3 3168829906 Feb 4 2014 genome.fa
-rw-r----- 1 kh3 kh3 106669 Sep 16 11:32 genome.fa.amb
-rw-r----- 1 kh3 kh3 3276 Sep 16 11:32 genome.fa.ann
-rw-r----- 1 kh3 kh3 3137454592 Sep 16 11:31 genome.fa.bwt
-rw-rw---- 1 kh3 kh3 2984 Feb 4 2014 genome.fa.fai
-rw-rw---- 1 kh3 kh3 2984 Sep 16 13:18 genome.fai
-rw-r----- 1 kh3 kh3 784363628 Sep 16 11:32 genome.fa.pac
-rw-r----- 1 kh3 kh3 1568727304 Sep 16 11:44 genome.fa.sa
Using GATK wrapper script /home/kh3/Softwares/gatk/build/install/gatk/bin/gatk
Running:
/home/kh3/Softwares/gatk/build/install/gatk/bin/gatk BwaAndMarkDuplicatesPipelineSpark -I /home/kh3/data/Illumina/GATK4/Platinum/TEST/test.spark.bam -R /home/kh3/Resources/genome_b37/ge
nome.2bit --disableSequenceDictionaryValidation true -t 16 -O /home/kh3/data/Illumina/GATK4/Platinum/TEST/test.spark.aligned.bam
15:47:28.760 INFO IntelGKLUtils - Trying to load Intel GKL library from:
jar:file:/home/kh3/Softwares/gatk/build/install/gatk/lib/gkl-0.1.2.jar!/com/intel/gkl/native/libIntelGKL.so
15:47:28.809 INFO IntelGKLUtils - Intel GKL library loaded from classpath.
[September 16, 2016 3:47:28 PM EDT] org.broadinstitute.hellbender.tools.spark.pipelines.BwaAndMarkDuplicatesPipelineSpark --threads 16 --output /home/kh3/data/Illumina/GATK4/Platinum/TEST/test.spark
.aligned.bam --reference /home/kh3/Resources/genome_b37/genome.2bit --input /home/kh3/data/Illumina/GATK4/Platinum/TEST/test.spark.bam --disableSequenceDictionaryValidation true --fixedChunkSiz
e 100000 --duplicates_scoring_strategy SUM_OF_BASE_QUALITIES --readValidationStringency SILENT --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --bamPartitionSize 0 --shardedO
utput false --numReducers 0 --sparkMaster local[*] --help false --version false --verbosity INFO --QUIET false --use_jdk_deflater false --disableAllReadFilters false
[September 16, 2016 3:47:28 PM EDT] Executing as kh3@rgcaahauva08091.rgc.aws.com on Linux 3.13.0-91-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_101-b13; Version: Version:4.alpha.
2-45-ga30af5a-SNAPSHOT
15:47:28.835 INFO BwaAndMarkDuplicatesPipelineSpark - Defaults.BUFFER_SIZE : 131072
15:47:28.835 INFO BwaAndMarkDuplicatesPipelineSpark - Defaults.COMPRESSION_LEVEL : 1
15:47:28.835 INFO BwaAndMarkDuplicatesPipelineSpark - Defaults.CREATE_INDEX : false
15:47:28.835 INFO BwaAndMarkDuplicatesPipelineSpark - Defaults.CREATE_MD5 : false
15:47:28.835 INFO BwaAndMarkDuplicatesPipelineSpark - Defaults.CUSTOM_READER_FACTORY :
15:47:28.835 INFO BwaAndMarkDuplicatesPipelineSpark - Defaults.EBI_REFERENCE_SERVICE_URL_MASK : http://www.ebi.ac.uk/ena/cram/md5/%s
15:47:28.835 INFO BwaAndMarkDuplicatesPipelineSpark - Defaults.NON_ZERO_BUFFER_SIZE : 131072
15:47:28.835 INFO BwaAndMarkDuplicatesPipelineSpark - Defaults.REFERENCE_FASTA : null
15:47:28.835 INFO BwaAndMarkDuplicatesPipelineSpark - Defaults.SAM_FLAG_FIELD_FORMAT : DECIMAL
15:47:28.835 INFO BwaAndMarkDuplicatesPipelineSpark - Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:47:28.835 INFO BwaAndMarkDuplicatesPipelineSpark - Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:47:28.835 INFO BwaAndMarkDuplicatesPipelineSpark - Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:47:28.835 INFO BwaAndMarkDuplicatesPipelineSpark - Defaults.USE_CRAM_REF_DOWNLOAD : false
15:47:28.835 INFO BwaAndMarkDuplicatesPipelineSpark - Deflater IntelDeflater
15:47:28.836 INFO BwaAndMarkDuplicatesPipelineSpark - Initializing engine
15:47:28.836 INFO BwaAndMarkDuplicatesPipelineSpark - Done initializing engine
15:47:29.287 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[E::bwa_idx_load_from_disk] fail to locate the index files
[E::bwa_idx_load_from_disk] fail to locate the index files
[E::bwa_idx_load_from_disk] fail to locate the index files
[E::bwa_idx_load_from_disk] fail to locate the index files
[E::bwa_idx_load_from_disk] fail to locate the index files
[E::bwa_idx_load_from_disk] fail to locate the index files
[E::bwa_idx_load_from_disk] fail to locate the index files
[E::bwa_idx_load_from_disk] fail to locate the index files
[E::bwa_idx_load_from_disk] fail to locate the index files
[E::bwa_idx_load_from_disk] fail to locate the index files
[E::bwa_idx_load_from_disk] fail to locate the index files
[E::bwa_idx_load_from_disk] fail to locate the index files
[E::bwa_idx_load_from_disk] fail to locate the index files
[E::bwa_idx_load_from_disk] fail to locate the index files
[E::bwa_idx_load_from_disk] fail to locate the index files
[E::bwa_idx_load_from_disk] fail to locate the index files
15:47:34.944 ERROR Executor:95 - Exception in task 5.0 in stage 0.0 (TID 5)
org.broadinstitute.hellbender.exceptions.GATKException: Cannot run BWA-MEM
at org.broadinstitute.hellbender.tools.spark.bwa.BwaSparkEngine.lambda$null$1(BwaSparkEngine.java:113)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:159)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:159)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: bwa_idx_load failed
at com.github.lindenb.jbwa.jni.BwaIndex._open(Native Method)
at com.github.lindenb.jbwa.jni.BwaIndex.(BwaIndex.java:216)
at org.broadinstitute.hellbender.tools.spark.bwa.BwaSparkEngine.lambda$null$1(BwaSparkEngine.java:109)
... 32 more
The text was updated successfully, but these errors were encountered: