Rarefaction Analyzer

Overview

Rarefaction analyzer is a simple program that can be used to perform rarefaction analysis. This analysis is particularly useful when you wish to know the fraction of the target population that is captured in your sequence data versus the fraction of the target population that has not been described due to not sequencing deep enough. This is a standard analysis performed for microbiome research and is typically recommended for any kind of metagenomics where counts are of interest. Input to this tool includes a FASTA formatted reference database, a SAM formatted alignment file, and a CSV formatted annotation database.

Output

The output of this tool is four TSV text files, with the proportion of reads sampled in the first column and the number of unique genes, groups, mechanisms, or classes identified in the rarefied data. These outputs can be easilly graphed with common R packages like ggplot2 or Python libraries such as matplotlib. To graph, place the proportion of reads sampled (first column) on the x-axis and the number of unique genes, groups, mechanisms, or classes (second column) on the y-axis.

Installation

$ git clone https://github.com/cdeanj/RarefactionAnalyzer.git
$ cd RarefactionAnalyzer
$ make
$ cp rarefaction /usr/local/bin
$ ./rarefaction

Usage

$ ./rarefaction \
   -ref_fp ref.fa \
   -sam_fp alignments.sam \
   -annot_fp annotations.csv \
   -gene_fp gene.tsv \
   -group_fp group.tsv \
   -class_fp class.tsv \
   -mech_fp mech.tsv \
   -min 5 \
   -max 100 \
   -samples 1 \
   -t 80

Options

Option	Type	Description
ref_fp	FILE	Path to FASTA formatted reference database
annot_fp	FILE	Path to CSV formatted annotation database
sam_fp	FILE	Path to SAM formatted alignment file
gene_fp	FILE	File to write gene level results to
group_fp	FILE	File to write group level results to
class_fp	FILE	File to write class level results to
mech_fp	FILE	File to write mechanism level results to
min	INT	Starting sample level
max	INT	Ending sample level
skip	INT	Skip pattern
samples	INT	Number of samples to run
t	INT	Threshold to determine gene significance

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
alignment.hpp		alignment.hpp
annotation_reader.cpp		annotation_reader.cpp
annotation_reader.hpp		annotation_reader.hpp
args.hpp		args.hpp
fasta_reader.cpp		fasta_reader.cpp
fasta_reader.hpp		fasta_reader.hpp
main.cpp		main.cpp
record.cpp		record.cpp
record.hpp		record.hpp
resistome.cpp		resistome.cpp
resistome.hpp		resistome.hpp
sam_reader.cpp		sam_reader.cpp
sam_reader.hpp		sam_reader.hpp
utility.cpp		utility.cpp
utility.hpp		utility.hpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rarefaction Analyzer

Overview

Output

Installation

Usage

Options

About

Releases

Packages

Languages

cdeanj/rarefactionanalyzer

Folders and files

Latest commit

History

Repository files navigation

Rarefaction Analyzer

Overview

Output

Installation

Usage

Options

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages