A dual use program for downloading and extracting genes from NCBI and for creating phylogenetic trees for many marker genes and merging the results into one
git clone git@github.com:Ulthran/ShotgunUnifrac.git
To install the CorGE
library for downloading, extracting, and merging genes,
cd ShotgunUnifrac/
pip install CorGE/
- Anaconda/miniconda (https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html#regular-installation)
- Snakemake (https://snakemake.readthedocs.io/en/stable/getting_started/installation.html)
- (For testing only) PyTest
- (For containerization only) Singularity
For the CorGE
package,
pytest CorGE/tests
For the tree building Snakemake workflow,
pytest .tests/
To download and collect genomes for tree building,
CorGE collect_genomes --ncbi_species LIST_OF_TXIDS.txt --ncbi_accessions LIST_OF_ACCS.txt --local /path/to/local/db
And then to filter out genes of interest and curate everything for tree building,
CorGE extract_genes
The default --file_type
behavior is 'prot' so that can be left off or switched to 'nucl' if you want to build trees based on nucleotide sequences. Finally to generate the tree, make sure you're in the directory with all the output from the previous step and run,
snakemake -c --use-conda --conda-prefix .snakemake/ --configfile /path/to/project/config.yml
This should output a file called RAxML_supermatrixRootedTree.final
which contains the final tree
A worked example is given in the docs.