Skip to content

Applications

Michal J. Gajda edited this page Oct 28, 2020 · 3 revisions

Installation

There are a few options when it comes to installation of programs in Haskell, the easiest is to install the GHC compiler and the cabal-install tool from your OS distribution, and then the commands

cabal update
cabal install <program>

should Just Work. For more information, see this page.

Binary packages

Some packages are made available as Debian .deb packages. They may work on Ubuntu and other Debian-derived systems as well. They are probably not quite up to date, but if they work, probably less hassle. They can be found at http://malde.org/~ketil/debian.

List of Applications

A50

A50 is a utility to evaluate genome assemblies. As an alternative to N50, which only provides a single number, A50 generates a graph by ordering all contigs by size, and plotting the cumulative size of the assembly vs contig numbers. This makes it easy to compare assemblies at a glance.

A50 uses the bio library.

Flower reads SFF files from 454 sequencing and can produce a variety of outputs. This includes a textual format intended to make the raw flowgram information easily accessible. Flower is now part of the biosff package.

FlowSim is a simulator pipeline for 454 pyrosequencing. It comes with a separate tool, clonesim, that simulates clones as random fragments of the input sequences, and flowsim which simulates the pyrosequencing reaction and generates the corresponding SFF file.

The development version (available from the darcs repo) supports quality clipping, non-uniform clone coverage, adapter sequences (which aren't always properly clipped by the 454 pipeline), and PCR mutations.

BAM files contain alignments of (short) reads against a reference. This tool helps to evaluate their quality by generating various statistics and plots.

transalign - more sensitive pairwise alignment

Transalign is a program to generate highly sensitive alignments (typically to some curated sequence database) by using a large intermediate database (typically NR or UniProt). See also this PLoS ONE article

kmc - a k-mer counter

kmx is a k-mer counter, it reads nucleotide sequences in Fasta or FastQ format to generate an index, and can then extract various information from this. It used to be called kmc, but somebody took that name for another k-mer counter.

varan - a variant analyzer

Very much a work in progress, so although the wiki page and the README file are more or less up-to-date, the most reliable source of information is probably the --help output.

In addition to measures like Fst and nucleotide diversity, it also implements a set of measures based on allele frequency confidence intervals, including expected site information.

Simple examples

The bio library contains a set of small programs meant to both illustrate library usage as well as provide useful bioinformatics tools.