Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Github Issues Pending Pull-Requests


Collection of tools for stuff I work with can be found in this repo.

Community Reference

A common collection of tools from community members around the globe for organization and accessibility.

Workflow Tools

Program Description Source
Artemis A genome browser and annotation tool that allows visualization of sequence features, next generation data and the results of analyses within the context of the sequence, and also its six-frame translation. Download
BamTools C++ API & command-line toolkit for working with BAM (Binary SAM file) data. Provides a programmer's API and an end-user's toolkit for handling BAM files. Clone
BaseMount Explore runs, projects, samples, app results and analyses by interacting directly with BaseSpace's API as a locally mounted file system Install
BaseSpace The BaseSpace Sequence Hub is a cloud-based genomics analysis and storage platform that directly integrates with all Illumina sequencers. N/A
BaseSpace CLI Work with the BaseSpace Sequence Hub data using the command line interface (CLI). Supports scripting and programmatic access to BaseSpace Sequence Hub for automation, bulk operations, and other routine functions. It can be used independently or in conjunction with BaseMount. Install
bcl2fastq Demultiplexes data and converts base calls in the per-cycle BCL files generated by Illumina sequencing systems to standard FASTQ file formats in a single step for downstream analysis. Download
BLAST+ Command line application suite of BLAST tools that utilizes the NCBI C++ Toolkit. Download
EDirect An advanced method for accessing the NCBI's set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal. N/A
E-utilities Entrez Programming Utilities (E-utilities) are a set of nine server-side programs that provide a stable interface into the Entrez query and database system at the NCBI. N/A
FastQC A quality control tool for high throughput sequence data. Clone
IGV Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.

The igvtools utility provides a set of tools for pre-processing data files.
Martian Martian is a language and framework for developing and executing complex computational pipelines. Clone
Nextflow Data-driven computational pipelines. Nextflow enables scalable and reproducible scientific workflows using software containers. It allows the adaptation of pipelines written in the most common scripting languages. Clone
Samtools A suite of programs for interacting with high-throughput sequencing data (HTS) from next generation sequencing data. It consists of three separate repositories:

Samtools: Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format.

BCFtools: Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants.

HTSlib: A C library for reading/writing high-throughput sequencing data.
Seqtk Fast and lightweight tool for processing sequences in the FASTA or FASTQ format. Clone
SRA Toolkit The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives. Download
VCFtools Package designed for working with complex genetic variation data in the form of VCF files. Download
WebLogo Create sequence logos, a graphical representation of an amino acid or nucleic acid multiple sequence alignment. Clone



Program Description Purpose Source
AUGUSTUS ab initio, trainable gene prediction in eukaryotic genomic sequences. Gene Prediction Download
BUSCO Assessing genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs

BUSCO provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB.
Assembly Quality Assesment Download
Circlator Predict and automate assembly circularization and produce accurate linear representations of circular sequences. Circularize Genome Download
Clustal Fast and scalable multiple sequence alignment (can align hundreds of thousands of sequences in hours) MSA Download
Galaxy Web portal for accessible, reproducible, and transparent computational research. Analysis package Download
HOMER HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis. Prediction and analysis Download
HMMER Search sequence databases for sequence homologs, and for making sequence alignments, analyzed by using profile hidden Markov models Detect Homologs Download
HTSeq HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays. Analysis Package Clone
Mauve A system for constructing multiple genome alignments in the presence of large-scale evolutionary events such as rearrangement and inversion. Genome Aligner Download
Mothur Expandable software to fill the bioinformatics needs of the microbial ecology community. Microbial Ecology Pipeline Download
MUMmer Package Ultra-fast alignment of large-scale DNA and protein sequences. A system for rapidly aligning entire genomes, whether in complete or draft form.

MUMmer is a suffix tree algorithm designed to find maximal exact matches of some minimum length between two input sequences.

NUCmer is a standard DNA sequence alignment. It is a robust pipeline that allows for multiple reference and multiple query sequences to be aligned in a many vs. many fashion.

PROmer is like NUCmer with one exception - all matching and alignment routines are performed on the six frame amino acid translation of the DNA input sequence.
Genome Aligner Download
MUSCLE MUSCLE can align hundreds of sequences in seconds. MSA Download
Picard Set of command line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. HTS Toolkit Download
QIIME Bioinformatics pipeline for performing microbiome analysis from raw DNA sequencing data. QIIME is designed to take users from raw sequencing data generated on the Illumina or other platforms through publication quality graphics and statistics. Microbial Ecology Pipeline Install
QUAST Evaluates genome assemblies. Evaluate Genome Assemblies Download
T-Coffee A multiple sequence alignment package that can align sequences (Protein, DNA, and RNA) or combine the output of your favorite alignment methods (Clustal, Mafft, Probcons, Muscle...) into one unique alignment (M-Coffee). It is also able to combine sequence information with protein structural information (3D-Coffee/Expresso), profile information (PSI-Coffee) or RNA secondary structures. MSA Download
ViennaRNA Package Programs for the prediction and comparison of RNA secondary structures. Prediction Download

PacBio Sequencing

Program Description Purpose Source
BLASR PacBio® long read aligner Sequence Aligner Download
Canu Fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION). Genome Assembly Download
Celera Assembler Celera Assembler is a de novo whole-genome shotgun (WGS) DNA sequence assembler, and can use any combination of platform reads. Genome Assembly Download
Cerulean Cerulean extends contigs assembled using short read datasets like Illumina paired-end reads using long reads like PacBio RS long reads. Hybrid Assembly Download
PBSuite PBJelly is a highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles. PBJelly fills or reduces as many captured gaps as possible to produce upgraded draft genomes.

PBHoney is an implementation of two variant-identification approaches designed to exploit the high mappability of long reads (i.e., greater than 10,000 bp). PBHoney considers both intra-read discordance and soft-clipped tails of long reads to identify structural variants.
Reference Mapping

Variant Calling
SMRT Analysis Self-contained software suite designed for use with Single Molecule, Real-Time (SMRT) Sequencing data. Analysis Package Download
SPAdes Genome assembler intended for both standard isolates and single-cell MDA bacteria assemblies using Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. Hybrid Assembly Download
Sprai Sprai (single-pass read accuracy improver) is a tool to correct sequencing errors in single-pass reads for de novo assembly. Sequencing Error-correction Download

Illumina Sequencing


Program Description Purpose Source
Bowtie2 An ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. Reference Aligner Download
BWA Mapping DNA sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. Reference Mapping Download
HISAT2 HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) against the general human population (as well as against a single reference genome). Whole-Genome Mapping Clone

De novo

Program Description Purpose Source
ABySS De novo, parallel, paired-end sequence assembler designed for short reads and large genomes. Genome Assembly Download
ALLPATHS-LG Short read assembler and it works on both small and large (mammalian size) genomes. Genome Assembly Download
DISCOVAR Genome assembler and variant caller. Genome Assembly Download
SOAPdenovo Novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. Genome Assembly Download
SPAdes Genome assembler intended for both standard isolates and single-cell MDA bacteria assemblies using Illumina or IonTorrent reads and is capable of providing hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. Genome/Hybrid Assembly Download
Velvet Short read de novo assembler using de Bruijn graphs. Genome Assembly Download


Program Description Purpose Source
Ballgown A program for computing differentially expressed genes in two or more RNA-seq experiments, using the output of StringTie or Cufflinks. The Ballgown package provides functions to organize, visualize, and analyze expression measurements. Transcriptome Assembly Clone
Cufflinks Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. Transcriptome Assembly Clone
DESeq2 The package DESeq2 provides methods to test for differential expression by use of negative binomial generalized linear models. Differential Expression Clone
edgeR Differential expression analysis of RNA-seq expression profiles with biological replication. It can be applied to differential signal analysis of other types of genomic data that produce counts, including ChIP-seq, Bisulfite-seq, SAGE and CAGE. Differential Expression Bioconductor
HISAT2 HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (whole-genome, transcriptome, and exome sequencing data) against the general human population (as well as against a single reference genome). Transcriptome Mapping Clone
HTSeq HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays. Analysis Package Clone
START Ultrafast universal RNA-seq aligner. RNA-seq Aligner Clone
StringTie StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. Transcriptome Assembly Clone
Trinity Trinity assembles transcript sequences from Illumina RNA-Seq data. Transcriptome Assembly Download

Single Cell

Program Description Purpose Source
Celda Bayesian hierarchical modeling for clustering Single Cell RNA-Seq Data. Differential Expression Clone
cellTree This packages computes a Latent Dirichlet Allocation (LDA) model of single-cell RNA-seq data and builds a compact tree modelling the relationship between individual cells over time or space. Visualization Bioconductor
Chromium Single Cell Software Suite Package for analyzing and visualizing single cell 3’ RNA-seq data produced by the 10x Chromium Platform.

Cell Ranger (Pipelines) is a set of analysis pipeline tools that perform sample demultiplexing, barcode processing, and single cell 3’ gene counting.

Loupe™ Cell Browser is an interactive desktop application that helps find significant genes, cell types, and substructure within your single cell data.

Cell Ranger (R Kit) is a R package for secondary analysis of Cell Ranger matrix data, including PCA and t-SNE projection, and k-means clustering.
Analysis Package Clone
Pagoda Framework which applies pathway and gene set overdispersion analysis to identify aspects of transcriptional heterogeneity among single cells. Pathway/Gene Set Analysis Clone
SCDE The SCDE package implements a set of statistical methods for analyzing single cell RNA-seq data, including differential expression analysis and pathway and geneset overdispersion analysis PAGODA. Differential Expression Clone
Seurat R package designed for QC, analysis, and exploration of single cell RNA-seq data. Differential Expression Clone
SPRING SPRING is a kinetic interface tool for uncovering high-dimensional structure in single cell gene expression data. Visualization Clone
Monocle An analysis toolkit for single cell RNA-seq that performs differential expression and time-series analysis for single cell expression experiments. Differential Expression Clone
You can’t perform that action at this time.