No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
Makefile
README.md
alignment.hpp
annotation_reader.cpp
annotation_reader.hpp
args.hpp
fasta_reader.cpp
fasta_reader.hpp
main.cpp
record.cpp
record.hpp
resistome.cpp
resistome.hpp
sam_reader.cpp
sam_reader.hpp
utility.cpp
utility.hpp

README.md

Rarefaction Analyzer

Overview

Rarefaction analyzer is a simple program that can be used to perform rarefaction analysis. This analysis is particularly useful when you wish to know the fraction of the target population that is captured in your sequence data versus the fraction of the target population that has not been described due to not sequencing deep enough. This is a standard analysis performed for microbiome research and is typically recommended for any kind of metagenomics where counts are of interest. Input to this tool includes a FASTA formatted reference database, a SAM formatted alignment file, and a CSV formatted annotation database.

Output

The output of this tool is four TSV text files, with the proportion of reads sampled in the first column and the number of unique genes, groups, mechanisms, or classes identified in the rarefied data. These outputs can be easilly graphed with common R packages like ggplot2 or Python libraries such as matplotlib. To graph, place the proportion of reads sampled (first column) on the x-axis and the number of unique genes, groups, mechanisms, or classes (second column) on the y-axis.

Installation

$ git clone https://github.com/cdeanj/RarefactionAnalyzer.git
$ cd RarefactionAnalyzer
$ make
$ cp rarefaction /usr/local/bin
$ ./rarefaction

Usage

$ ./rarefaction \
   -ref_fp ref.fa \
   -sam_fp alignments.sam \
   -annot_fp annotations.csv \
   -gene_fp gene.tsv \
   -group_fp group.tsv \
   -class_fp class.tsv \
   -mech_fp mech.tsv \
   -min 5 \
   -max 100 \
   -samples 1 \
   -t 80

Options

Option Type Description
ref_fp FILE Path to FASTA formatted reference database
annot_fp FILE Path to CSV formatted annotation database
sam_fp FILE Path to SAM formatted alignment file
gene_fp FILE File to write gene level results to
group_fp FILE File to write group level results to
class_fp FILE File to write class level results to
mech_fp FILE File to write mechanism level results to
min INT Starting sample level
max INT Ending sample level
skip INT Skip pattern
samples INT Number of samples to run
t INT Threshold to determine gene significance