New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for read alignment and variant calling in Adam? (e.g. BWA + Freebayes) #1311

Closed
NeillGibson opened this Issue Dec 12, 2016 · 9 comments

Comments

Projects
5 participants
@NeillGibson
Contributor

NeillGibson commented Dec 12, 2016

Hi,

Are you planning to support read alignment and variant calling in Adam? For example with BWA and Freebayes?

As far as I know most development work in Adam is focused on:

  • porting genomics data formats(FASTQ,BAM,VCF,BED) to HDFS+MapReduce friendly formats
  • developing BAM post processing tools like MarkDuplicates, RealignIndels and BQSR.

And that the focus is not on not developing new software for read alignment or variant calling.
I did see that work was done on adding pipes for stream FASTQ, BAM and VCF to legacy tools.
#1112

Are you planning to support / test / develop read alignment + variant calling pipelines on Spark + Adam that make use of external read aligners / variant callers + your own data formats + bam post processing tools?

For Spark + Adam to be a real alternative to a normal HPC cluster for genomics data analysis read alignment + variant calling support is essential.

Thank you.

@waltermblair

This comment has been minimized.

Show comment
Hide comment
@waltermblair

waltermblair Mar 17, 2017

Check out CS-BWAMEM, it needs some updating but is an implementation of bwa via spark/adam.

waltermblair commented Mar 17, 2017

Check out CS-BWAMEM, it needs some updating but is an implementation of bwa via spark/adam.

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Mar 17, 2017

Member

+1 @waltermblair. I've got a WIP update PR at ytchen0323/cloud-scale-bwamem#9

Member

fnothaft commented Mar 17, 2017

+1 @waltermblair. I've got a WIP update PR at ytchen0323/cloud-scale-bwamem#9

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Mar 22, 2017

Will this be integrated with the ADAM project itself? Alignment with BWA is the critical missing link in ADAM.

ghost commented Mar 22, 2017

Will this be integrated with the ADAM project itself? Alignment with BWA is the critical missing link in ADAM.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Mar 22, 2017

Member

Will this be integrated with the ADAM project itself?

Long term perhaps the cloud-scale-bwamem repository may migrate under the bigdatagenomics organization, to facilitate support and tighter integration with ADAM release cycles. It is not likely the code with be migrated into the adam repository though, as most applications are developed as separate repositories.

Note there are a few other options to integrating BWA and ADAM:

BWA with ADAM on Apache Spark using workflow engine

BWA and ADAM can be run as part of the same pipeline, as is demonstrated here, with Toil as the workflow engine and Docker as the container technology:

https://github.com/BD2KGenomics/toil-scripts/blob/master/src/toil_scripts/adam_gatk_pipeline/align_and_call.py

Docker images for this pipeline are developed in the cgl-docker-lib repository and hosted on quay.io.

ADAM on Apache Spark with BWA using ADAM Pipe API

An alternative execution model is being developed in the cannoli repository, where the data are partitioned using Apache Spark and ADAM and then streamed over pipes to an external BWA process on each compute node.

This takes advantage of the ADAM Pipe API, which in turn builds on Apache Spark's RDD.pipe API.

Reimplement BWA algorithm on ADAM on Apache Spark

Another option would be to reimplement the BWA algorithm in Scala on ADAM on Apache Spark. We currently have no plans to do this. If someone is interested and willing however, ... :)

Member

heuermh commented Mar 22, 2017

Will this be integrated with the ADAM project itself?

Long term perhaps the cloud-scale-bwamem repository may migrate under the bigdatagenomics organization, to facilitate support and tighter integration with ADAM release cycles. It is not likely the code with be migrated into the adam repository though, as most applications are developed as separate repositories.

Note there are a few other options to integrating BWA and ADAM:

BWA with ADAM on Apache Spark using workflow engine

BWA and ADAM can be run as part of the same pipeline, as is demonstrated here, with Toil as the workflow engine and Docker as the container technology:

https://github.com/BD2KGenomics/toil-scripts/blob/master/src/toil_scripts/adam_gatk_pipeline/align_and_call.py

Docker images for this pipeline are developed in the cgl-docker-lib repository and hosted on quay.io.

ADAM on Apache Spark with BWA using ADAM Pipe API

An alternative execution model is being developed in the cannoli repository, where the data are partitioned using Apache Spark and ADAM and then streamed over pipes to an external BWA process on each compute node.

This takes advantage of the ADAM Pipe API, which in turn builds on Apache Spark's RDD.pipe API.

Reimplement BWA algorithm on ADAM on Apache Spark

Another option would be to reimplement the BWA algorithm in Scala on ADAM on Apache Spark. We currently have no plans to do this. If someone is interested and willing however, ... :)

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft Mar 22, 2017

Member

In addition to calling the native BWA code, CS-bwamem has a Scala implementation of several of the core BWA algos.

Member

fnothaft commented Mar 22, 2017

In addition to calling the native BWA code, CS-bwamem has a Scala implementation of several of the core BWA algos.

@NeillGibson

This comment has been minimized.

Show comment
Hide comment
@NeillGibson

NeillGibson Mar 22, 2017

Contributor

Thank you @fnothaft and @heuermh for this information on how to run BWA and Adam together on a Spark cluster.

I look forward to trying one or more of these options later this year to run a read alignment(bwa) and variant calling pipeline(freebayes/gatk) on a Spark cluster. I see that GATK is supported downstream and that also a Freebayes wrapper is being developer in the canoli repository.

Contributor

NeillGibson commented Mar 22, 2017

Thank you @fnothaft and @heuermh for this information on how to run BWA and Adam together on a Spark cluster.

I look forward to trying one or more of these options later this year to run a read alignment(bwa) and variant calling pipeline(freebayes/gatk) on a Spark cluster. I see that GATK is supported downstream and that also a Freebayes wrapper is being developer in the canoli repository.

@heuermh

This comment has been minimized.

Show comment
Hide comment
@heuermh

heuermh Mar 22, 2017

Member

Thank you @NeillGibson for asking good questions! Ping us when you're ready to give things a go, maybe the story will be clearer by then.

Meanwhile, if you might be interested, we host a weekly video call for our team and collaborators. Email my username at berkeley.edu for details.

Member

heuermh commented Mar 22, 2017

Thank you @NeillGibson for asking good questions! Ping us when you're ready to give things a go, maybe the story will be clearer by then.

Meanwhile, if you might be interested, we host a weekly video call for our team and collaborators. Email my username at berkeley.edu for details.

@fnothaft fnothaft added the discussion label May 12, 2017

@fnothaft

This comment has been minimized.

Show comment
Hide comment
@fnothaft

fnothaft May 12, 2017

Member

Closing as the alignment steps are downstream in Cannoli (e.g., bwa) and variant calling is in Avocado.

Member

fnothaft commented May 12, 2017

Closing as the alignment steps are downstream in Cannoli (e.g., bwa) and variant calling is in Avocado.

@fnothaft fnothaft closed this May 12, 2017

@caspase8

This comment has been minimized.

Show comment
Hide comment
@caspase8

caspase8 May 19, 2017

How can I implement Cannoli in ADAM, please help.

caspase8 commented May 19, 2017

How can I implement Cannoli in ADAM, please help.

@heuermh heuermh modified the milestone: 0.23.0 Jul 22, 2017

@heuermh heuermh added this to Completed in Release 0.23.0 Jan 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment