Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for read alignment and variant calling in Adam? (e.g. BWA + Freebayes) #1311

Closed
NeillGibson opened this issue Dec 12, 2016 · 9 comments
Closed
Labels
Milestone

Comments

@NeillGibson
Copy link
Contributor

@NeillGibson NeillGibson commented Dec 12, 2016

Hi,

Are you planning to support read alignment and variant calling in Adam? For example with BWA and Freebayes?

As far as I know most development work in Adam is focused on:

  • porting genomics data formats(FASTQ,BAM,VCF,BED) to HDFS+MapReduce friendly formats
  • developing BAM post processing tools like MarkDuplicates, RealignIndels and BQSR.

And that the focus is not on not developing new software for read alignment or variant calling.
I did see that work was done on adding pipes for stream FASTQ, BAM and VCF to legacy tools.
#1112

Are you planning to support / test / develop read alignment + variant calling pipelines on Spark + Adam that make use of external read aligners / variant callers + your own data formats + bam post processing tools?

For Spark + Adam to be a real alternative to a normal HPC cluster for genomics data analysis read alignment + variant calling support is essential.

Thank you.

@waltermblair
Copy link

@waltermblair waltermblair commented Mar 17, 2017

Check out CS-BWAMEM, it needs some updating but is an implementation of bwa via spark/adam.

@fnothaft
Copy link
Member

@fnothaft fnothaft commented Mar 17, 2017

+1 @waltermblair. I've got a WIP update PR at ytchen0323/cloud-scale-bwamem#9

@ghost
Copy link

@ghost ghost commented Mar 22, 2017

Will this be integrated with the ADAM project itself? Alignment with BWA is the critical missing link in ADAM.

@heuermh
Copy link
Member

@heuermh heuermh commented Mar 22, 2017

Will this be integrated with the ADAM project itself?

Long term perhaps the cloud-scale-bwamem repository may migrate under the bigdatagenomics organization, to facilitate support and tighter integration with ADAM release cycles. It is not likely the code with be migrated into the adam repository though, as most applications are developed as separate repositories.

Note there are a few other options to integrating BWA and ADAM:

BWA with ADAM on Apache Spark using workflow engine

BWA and ADAM can be run as part of the same pipeline, as is demonstrated here, with Toil as the workflow engine and Docker as the container technology:

https://github.com/BD2KGenomics/toil-scripts/blob/master/src/toil_scripts/adam_gatk_pipeline/align_and_call.py

Docker images for this pipeline are developed in the cgl-docker-lib repository and hosted on quay.io.

ADAM on Apache Spark with BWA using ADAM Pipe API

An alternative execution model is being developed in the cannoli repository, where the data are partitioned using Apache Spark and ADAM and then streamed over pipes to an external BWA process on each compute node.

This takes advantage of the ADAM Pipe API, which in turn builds on Apache Spark's RDD.pipe API.

Reimplement BWA algorithm on ADAM on Apache Spark

Another option would be to reimplement the BWA algorithm in Scala on ADAM on Apache Spark. We currently have no plans to do this. If someone is interested and willing however, ... :)

@fnothaft
Copy link
Member

@fnothaft fnothaft commented Mar 22, 2017

In addition to calling the native BWA code, CS-bwamem has a Scala implementation of several of the core BWA algos.

@NeillGibson
Copy link
Contributor Author

@NeillGibson NeillGibson commented Mar 22, 2017

Thank you @fnothaft and @heuermh for this information on how to run BWA and Adam together on a Spark cluster.

I look forward to trying one or more of these options later this year to run a read alignment(bwa) and variant calling pipeline(freebayes/gatk) on a Spark cluster. I see that GATK is supported downstream and that also a Freebayes wrapper is being developer in the canoli repository.

@heuermh
Copy link
Member

@heuermh heuermh commented Mar 22, 2017

Thank you @NeillGibson for asking good questions! Ping us when you're ready to give things a go, maybe the story will be clearer by then.

Meanwhile, if you might be interested, we host a weekly video call for our team and collaborators. Email my username at berkeley.edu for details.

@fnothaft fnothaft added the discussion label May 12, 2017
@fnothaft
Copy link
Member

@fnothaft fnothaft commented May 12, 2017

Closing as the alignment steps are downstream in Cannoli (e.g., bwa) and variant calling is in Avocado.

@fnothaft fnothaft closed this May 12, 2017
@rajputakhil
Copy link

@rajputakhil rajputakhil commented May 19, 2017

How can I implement Cannoli in ADAM, please help.

@heuermh heuermh modified the milestone: 0.23.0 Jul 22, 2017
@heuermh heuermh added this to Completed in Release 0.23.0 Jan 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.