Skip to content

LivGen/LMAT

Repository files navigation

Livermore Metagenomics Analysis Toolkit

Taxonomic classification, content summarization and gene identification: all-in-1 metagenomic analysis toolkit


Overview

LMAT's main goal is to efficiently assign taxonomic labels to the reads with reference representation down to the species level while maintaining accuracy in the presence of novel organisms. Scalable performance is demonstrated on real and simulated data to show accurate classification even with novel genomes on samples that include viruses, prokaryotes, fungi and protists.

LMAT has three related subcomponents (taxonomic profiling, content summarization and gene annotation) that can be run separately.

Quick installation

The quick installation procedure will use CMake to ease the process, by downloading, building and installing all the required packages.

Required software

  • CMake3
  • C/C++ compiler with OpenMP support (like gcc, clang, icc, xlc)
  • Recommended: python, for some tools
  • Optional: MPI, for use in building a Reference Database

Using redoall to build LMAT easily

redoall is a convenient wrapper that will direct the installation through CMake for typical compilers (GNU gcc, clang/LLVM, Intel C/C++ compilers and IBM XL compilers for Power 8 and 9):

usage: redoall.sh [profile] [compiler]

The 1st optional parameter chooses the build profile of CMake:

  • D for Debug
  • R for Release (this is the current default)
  • I for release with debug info (RelWithDebInfo)
  • M for release with minimum size (MinSizeRel)
  • for just cleaning the parameter is clean

The 2nd optional parameter selects the compiler family:

  • gnu for using GCC
  • intel for using Intel compilers
  • clang for using clang compilers
  • ibmpwr9 for compiling in Power 9 with IBM compilers
  • ibmpwr8 for compiling in Power 8 with IBM compilers

Example for GNU gcc (release profile)

git clone https://github.com/LivGen/LMAT.git
cd LMAT
./redoall.sh

Example for Intel compilers (debug profile)

git clone https://github.com/LivGen/LMAT.git
cd LMAT
./redoall.sh D intel

Details

Post-processing with Recentrifuge

If you are analyzing more than one sample with LMAT you can easily visualize and compare them using Recentrifuge: Robust comparative analysis and contamination removal for metagenomics.

With a score-oriented approach, Recentrifuge is especially useful in the case of low microbial biomass studies and when a more reliable detection of minority organisms is needed, like in clinical, environmental, and forensic applications. For further details, please check the PLOS CB article.

For usage and documentation, please, see running Recentrifuge for LMAT in the Recentrifuge wiki.

LMAT and PERM: tuning the kernel

LMAT uses PERM, a ‘C’ library for persistent heap management used with a dynamic-memory allocator, also developed at LLNL. For PERM (so LMAT) to work in the right conditions, some kernel tuning is advisable:

  • Turn off periodic flush to file and dirty ratio flush:
echo 0 > /proc/sys/vm/dirty_writeback_centisecs
echo 100 > /proc/sys/vm/dirty_background_ratio
echo 100 > /proc/sys/vm/dirty_ratio
  • Turn off address space randomization:
echo 0 > /proc/sys/kernel/randomize_va_space 

 ==============================================
  : : :       ··        ··      ·   ·········· 
  : : :       ···      ···     · ·      ··     
  : : :       ·· ··  ·· ··    ·· ··     ··     
  : : ······· ··   ··   ··   ·······    ··     
  :  ······   ··        ··  ··     ··   ··     
    ·····     ··        ·· ··       ··  ··     
 ==============================================
   Livermore  Metagenomics  Analysis  Toolkit  
 ==============================================