-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where is BwaMemAligner.java? #3186
Comments
This should be in the gatk-bwamem-jni dependency. Maybe the artifact is not correctly packaged...but it is in the gradle.build... |
You are right. It's built and packed in |
I'm not familiar with this code, so I cannot help fixing the problem. After exploring at the code a bit, it looks like the image file is null. Because there is a single instance of the index file, it looks that at some point it was closed and thus the global instance is null. Maybe this will be helpful for debug/fix this problem... |
@hliang , I broke it into several lines:
And the line
is the problem. The image file has to be living a regular file system in all the worker nodes, NOT hdfs. |
So the null pointer is caused by bwa mem complaining that it cannot load the index from the HDFS. |
Could it be possible to read the file from a |
Thank you @SHuang-Broad. The error was gone after I copied bwaindeximage file to lustre file system, which can be accessed by all worker nodes.
|
This likely has to do with your spark configuration. Check on the Spark job's progress through the web interface, which should be something like http://<driver_address>:4040 (see https://spark.apache.org/docs/latest/monitoring.html). If your BAM is very small, you can also try increasing the number of partitions by reducing --bamPartitionSize. |
@magicDGS . I am afraid this is not easy. I didn't write the binding (@tedsharpe did), but I would asseme the limitation comes from bwa mem itself, not the binding, as the binding is a thin wrapper that delegates the loading of the index files (or the image that combines all 5 index files in this case) to bwa. The SV team here have a script ( |
@hliang , the suggestion by @mwalker174 might be your solution. |
Thanks for the answer @SHuang-Broad. It would be nice if the bwa-mem C library have the option to pass streams instead of files for the index, allowing passing in-memory and file-based (in whatever file system abstraction) indexes. I will try to look at the code and see if I can submit a patch, but I need to refresh my C++ for that... |
Thank you @mwalker174 . The input bamfile is about 7 GB. If no
I have to look more into BwaAndMarkDuplicatesPipelineSpark. The good news is at least we get BwaSpark working now: |
@hliang I see that many of the tasks are failing and it looks like one of the executors crashed. To find the cause, you can check the error logs of these tasks through the web UI. I suspect increasing executor memory will fix the problem. Heartbeat timeouts usually occur when an executor JVM runs out of memory or requests more memory than the node will allow. |
Thank you @mwalker174 for the suggestions. I ended up writing for loops to test which configurations work. Driver memory: 2-50g; executor memory: 2-50g; executor cores: 1-20; bamPartitionSize: 1-64m. Some combinations failed in minutes, some failed in hours, and some finished without errors. Bellow are three of which work for a ~33X WGS data:
Hope someone will find this helpful. |
Just found the log4j errors could be fixed by editing |
Got a NullPointerException while trying to run BwaAndMarkDuplicatesPipelineSpark.
And I notice
gatk/src/main/java/org/broadinstitute/hellbender/utils/bwa/BwaMemAligner.java
doesn't exist, and there is no class filegatk/build/classes/main/org/broadinstitute/hellbender/utils/bwa/BwaMemAligner.class
either. Is that causing this error?The text was updated successfully, but these errors were encountered: