Skip to content

Classification of nifH gene sequences using a Classification and Regression Tree (CART) from Frank et al., 2016.

Notifications You must be signed in to change notification settings

Marine-Microbial-Ecology-Group-UCSC/nifH-phylogenetic-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

nifH-phylogenetic-classifier using a Classification and Regression Tree (CART)

This script phylogenetically classifies translated nifH gene (or amplicon) sequences using the Classification and Regression Tree (CART) from Frank et al., 2016.

The scripts directory includes the original version by Frank for Python 2 as well as an updated version for Python 3. The updated version does not require you to specify the residue position in Azotobacter vinelandii NifH (protein) where multiple sequence alignment starts. Instead the script calculates the start residue, knowing that first sequence in the alignment is NifH from Azotobacter vinelandii (WP_012698955.1). (A warning is issued if the first sequence does not appear to be from A. vinelandii.) We recommend that you use the updated script or the automated alternative described below.

To use the CART classifier you must have installed Biopython.

You can verify the classifier with the included multiple sequence alignment:

python scripts/NifH_Clusters.py data/CART_Test_Atlantic.fasta

which outputs CART_Test_Atlantic_Clusters.fasta.

Automated alternative

If you are working with nifH amplicons, e.g. ASVs from DADA2, then consider using our CART classifier in nifH_amplicons_DADA2 as the ancillary script NifHClustersFrank2016. This version predicts open reading frames (using FragGeneScan), performs a multiple sequence alignment (using MAFFT), and then runs the CART classifier. All of the required external tools (including Biopython) are provided by the miniconda environment that you create when you install nifH_amplicons_DADA2.

About

Classification of nifH gene sequences using a Classification and Regression Tree (CART) from Frank et al., 2016.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages