Skip to content

Life Science Apps on AWS

mndoci edited this page Jul 15, 2012 · 19 revisions

Here is a list of life science focussed applications that I know about that run on AWS or Amazon Elastic MapReduce. Note that Elastic MapReduce apps can be run on a "roll it yourself" Hadoop cluster on EC2 as well

Feel free to fork this and send me pull requests.

Note that AMI ID's might be out of date as new AMIs are released

  • Application: NCBI BLAST

    • Type: AMI
    • AMI ID: Updated often (sometimes daily)
    • Notes: Official NCBI Blast AMI. Daily snapshot of NCBI BLAST with current data. Search for BLAST and choose the latest AMI from NCBI.
  • Application: Galaxy

    • Type: AMI
    • AMI ID: ami-9724c7fe & ami-1430d27
    • Notes: One stop shop for sequence analysis from Anton Nekurotenko at Penn State and James Taylor from Emory. Also take a look at http://usegalaxy.org/cloud. CloudMan is a manager of cloud resources that works with Galaxy. BioCloudCentral makes it easy to use Galaxy, CloudMan and Cloud BioLinux together.
  • Application: Cloud BioLinux

    • Type: AMI
    • AMI ID: ami-6953b200
    • Notes: From JCVI and others. Contains a variety of tools, including Celera Assembler, HMMer, EMBOSS, Jalview]] Also see here
  • Application: Crossbow

    • Type: AMI, Elastic MapReduce
    • AMI ID: ami-6aa34003 & ami-f85fbf91
    • Notes: From Ben Langmead and Michael Schatz. Combines Bowtie & SoapSNP. Optimized as an Elastic MapReduce application
  • Application: Cloudburst

    • Type: Elastic MapReduce
    • Notes: From Mike Schatz. RMAP-like read mapping Hadoop application
  • Application: Myrna

    • Type: Elastic MapReduce
    • Notes: From Ben Langmead. Includes a Web-UI
  • Application: Clovr

    • Type: AMI
    • AMI ID: ami-59a34d30
    • Notes: CloVR is a virtual applicance that integrates genomics tools by providing a set of push-button pipelines for applications in viral, prokaryotic, metagenomic and eukaryotic sequencing projects
  • Application: BioPerl Max

    • Type: AMI
    • AMI ID: ami-1ad03273
    • Notes: From Fortinbras
  • Application: VIPDAC

    • Type: AMI
    • AMI ID: ami-52f5123b
    • Notes: From Simon Twigger et al at MCW
  • Application: Superfamily

    • Type: AMI
    • AMI ID: Please see
    • Notes: Free for academic and commercial use. Need to register for a license first.
  • Application: Cloud-Coffee

    • Type: AMI
    • AMI ID: AMIs described in the paper don't seem to work. Contact authors
    • Notes: Paper
  • Application: BioNimbus AMI

    • Type: AMI
    • AMI ID: ami-aead58c7
    • Notes: About Bionumbus. The EC2 AMI has a lot of common peak calling pipelines
  • Application: GMOD

    • Type: AMI
    • AMI ID: ami-4b599222
    • Notes: There is a demo version as well
  • Application: CloudAligner

    • Type: Hadoop Application, Elastic MapReduce
    • Notes: Similar to CloudBurst. Application jarfile
  • Application: CRdata

    • Type: Open Source Service
    • Notes: Source Code, Paper. CRData is currently down.
  • Application: SeqWare

  • Application: Blend

    • Type: Open Source Library
    • Notes: Blend is a Python (2.6 or higher) library for interacting with BioCloudCentral.org, CloudMan, and Galaxy‘s API. Conceptually, it makes it possible to script and automate the process of cloud infrastrucutre provisioning and scaling, as well as running of analyses within Galaxy.
  • Application: GenomeSpace

    • Type: Open Source Service
    • Notes: GenomeSpace allows you to store your data files in the Amazon cloud and provides necessary file format transformations whenever you select an analysis or visualization within one of the tools.
  • Application: Sage Synapse

    • Type: Open Source Service
    • Notes: Synapse is a collaborative compute space that allows scientists to share and analyze data together.
  • Application: DNAnexus

    • Type: Commercial Service
    • Notes: "DNAnexus combines the scalability of the cloud with advanced sequence analysis and Web 2.0 technologies to provide a powerful, intuitive environment for next-generation DNA sequence analysis."
  • Application: Spiral Genetics

    • Type: Commercial Service
    • Notes: "Spiral is a genetic analysis software platform that puts the power of cloud computing at the fingertips of any genetic researcher. Our platform focuses on solving the most computationally challenging analysis, whole genome analysis."
  • Application: SeqCentral

    • Type: Commercial Service
    • Notes: "SeqCentral, LLC, aims to provide and harness the high-performance computational power of the cloud as a usable, collaborative, online service to all members of the computational genomics community."
  • Application: Nimbus Informatics

    • Type: Commercial Service
    • Notes: "Nimbus Informatics is an end-to-end solution for sequence data management and analysis on the cloud."
  • Application: Ion Flux

    • Type: Commercial Service
    • Notes: "The Ion Flux mission is to measure and catalog human genetic variation, assess its impact on human health, and transform the practice of medicine." Heavy users of Hadoop and work closely with the Ion Torrent system.

Note that AMI ID's might be out of date as new AMIs are released