Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Support for read alignment and variant calling in Adam? (e.g. BWA + Freebayes) #1311
Are you planning to support read alignment and variant calling in Adam? For example with BWA and Freebayes?
As far as I know most development work in Adam is focused on:
And that the focus is not on not developing new software for read alignment or variant calling.
Are you planning to support / test / develop read alignment + variant calling pipelines on Spark + Adam that make use of external read aligners / variant callers + your own data formats + bam post processing tools?
For Spark + Adam to be a real alternative to a normal HPC cluster for genomics data analysis read alignment + variant calling support is essential.
Long term perhaps the cloud-scale-bwamem repository may migrate under the bigdatagenomics organization, to facilitate support and tighter integration with ADAM release cycles. It is not likely the code with be migrated into the adam repository though, as most applications are developed as separate repositories.
Note there are a few other options to integrating BWA and ADAM:
BWA with ADAM on Apache Spark using workflow engine
BWA and ADAM can be run as part of the same pipeline, as is demonstrated here, with Toil as the workflow engine and Docker as the container technology:
ADAM on Apache Spark with BWA using ADAM Pipe API
An alternative execution model is being developed in the cannoli repository, where the data are partitioned using Apache Spark and ADAM and then streamed over pipes to an external BWA process on each compute node.
This takes advantage of the ADAM Pipe API, which in turn builds on Apache Spark's
Reimplement BWA algorithm on ADAM on Apache Spark
Another option would be to reimplement the BWA algorithm in Scala on ADAM on Apache Spark. We currently have no plans to do this. If someone is interested and willing however, ... :)
I look forward to trying one or more of these options later this year to run a read alignment(bwa) and variant calling pipeline(freebayes/gatk) on a Spark cluster. I see that GATK is supported downstream and that also a Freebayes wrapper is being developer in the canoli repository.
Thank you @NeillGibson for asking good questions! Ping us when you're ready to give things a go, maybe the story will be clearer by then.
Meanwhile, if you might be interested, we host a weekly video call for our team and collaborators. Email my username at berkeley.edu for details.