Home

Donovan Parks edited this page Sep 2, 2016 · 71 revisions

Introduction

Bugs and Feature Requests

Installation

The latest release of CheckM is v1.0.7 (Sept. 2, 2016).

Quick Start

Command Line Overview

  • Overview - list of all CheckM features

Workflow Overview

Genome Quality Commands

  • tree - place bins in the reference genome tree
  • tree_qa - assess phylogenetic markers found in each bin
  • lineage_set - infer lineage-specific marker sets for each bin
  • taxon_list - list available taxonomic-specific marker sets
  • taxon_set - infer taxonomic-specific marker set
  • analyze - identify marker genes in bins
  • qa - assess bins for contamination and completeness

Reported Statistics

  • qa - description of statistics reported by qa command

Plots

  • bin_qa_plot - bar plot of bin completeness, contamination, and strain heterogeneity
  • gc_plot - create GC histogram and delta-GC plot
  • coding_plot - create coding density (CD) histogram and delta-CD plot
  • tetra_plot - create tetranucleotide distance (TD) histogram and delta-TD plot
  • dist_plot - create image with GC, CD, and TD distribution plots together
  • nx_plot - create Nx-plots
  • len_plot - cumulative sequence length plot
  • len_hist - sequence length histogram
  • marker_plot - plot position of marker genes on sequences
  • par_plot - parallel coordinate plot of GC and coverage
  • gc_bias_plot - plot bin coverage as a function of GC
  • cov_pca - PCA plot of coverage profiles
  • tetra_pca - PCA plot of tetranucleotide signatures

Bin Exploration and Modification

  • unique - ensure no sequences are assigned to multiple bins
  • merge - identify bins with complementary sets of marker genes
  • bin_compare - compare two sets of bins (e.g., from alternative binning methods)
  • outliers - identify outliers in bins relative to reference distributions
  • modify - modify sequences in a bin

Utility Commands

  • unbinned - identify unbinned sequences
  • coverage - calculate coverage of sequences
  • tetra - calculate tetranucleotide signature of sequences
  • profile - calculate percentage of reads mapped to each bin
  • join_tables - join tab-separated value tables containing bin information
  • ssu_finder - identify SSU (16S/18S) rRNAs in sequences