Next_Generation_Sequencing

Naohisa Goto edited this page Jun 14, 2016 · 1 revision

Main Idea

Ideas for a BioRuby Plugin to handle Next Generation Sequencing data, in particular RNA-seq data.

Repository

Git Repository is: https://github.com/helios/bioruby-ngs

Tools supported

The bio-ngs plugin will be used as a container for others NGS plugins that will provide specific wrappers or bindings to existing tools. Here is a first list:

  • bio-bwa Burrows-Wheeler Aligner
  • bio-picard Picard
    • comprises Java-based command-line utilities that manipulate SAM files, and a Java API (SAM-JDK) for creating new programs that read and write SAM files.
  • bio-samtools SAM (Sequence Alignment/Map)
    • SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments.
  • bio-qseq TODO convert qseq file in fastq format.

and will include graphics libraries like Rubyvis (http://rubyvis.rubyforge.org/) to generate reports on data quality, mapping results and other related statistics.

The main idea is to wrap NGS standard tools into Ruby and where possible to include direct binding for these tools.

This could be done for example for Picard via JRuby and for SAMtools using samtools-ruby (https://github.com/homonecloco/samtools-ruby). Every option needs to be tested to ensure a good performance in handling large datasets.

bio-samtools

bio-samtools is a Ruby binding to the popular SAMtools library, and provides access to individual read alignments as well as BAM files, reference sequence and pileup information.

Source code is available on GitHub at https://github.com/helios/bioruby-samtools.

Tutorial is available here: Bio-samtools

bio-bwa

bio-picard

NGS Workflows

see also Workflows

Using Rake or Thor to run NGS analyses

The bio-ngs plugin will implement a flexible Rake task system similar to Rails, where custom tasks can be defined according to specific needs. As an alternative, Thor could be used instead of Rake (https://github.com/wycats/thor).

This will allow bio-ngs users to perform NGS analyses and pipelines directly using Rake and the functionalities provided by BioRuby and the others Bio* plugins.

Please add the people involved on this topic.

Active developers

so far:

  • bio-samtools: Raoul Bonnal

  • bio-bwa: Francesco Strozzi