If you found this software useful to your research please cite as follows:
Garber, AI and Miller, SR (2020) ParaHunter: identification of gene paralogs in genomes. GitHub repository: https://github.com/Arkadiy-Garber/ParaHunter.
ParaHunter uses MMseqs2 to cluster sequences from a single genomic dataset, based on a set of user-defined parameters. The clustered genes are then aligned using Muscle and PAL2NAL, and pairwise dS, dN, and dN/dS values are calculated using PAML.
git clone https://github.com/Arkadiy-Garber/ParaHunter.git
cd ParaHunter
bash setup.sh
source activate parahunt
ParaHunter.sh -a genomeOrfs_aa.faa -n genomeOrfs_nuc.ffn
ParaHunter.sh -a genomeOrfs_aa.faa -n genomeOrfs_nuc.ffn -m 0.35 -c 0.5
This software uses PAML, in addition to other tools like MMseqs2 and Muscle. If you would like to learn more about what is going on under the hood of this program, and some of the theoretical framework behind the evolutionary biology of gene divergence estimates, please see the following slideshow and associated video:
Slideshow | Video presentation
If you'd like to try out this program in a virtual terminal session with all dependencies and test dataset pre-loaded, hit the 'launch binder' button below, and follow the commands in 'Walkthrough'
(Initially forked from here. Thank you to the awesome binder team!)
You can also follow along in this linked video: Video presentation
These videos were made as part of the Bioinformatics Virtual Coordination Network :)
Enter the first direcotry
cd interspecies_example//
Align the FASTA amino acid file
muscle -in MNBX01000583.1_4.faa -out MNBX01000583.1_4.fa
Make a codon alignment from the peptide alignment and nucleotide sequence
pal2nal.pl MNBX01000583.1_4.fa MNBX01000583.1_4.ffn -output fasta > MNBX01000583.1_4.codonalign.fa
Check that the correct input/output files are in the codeml.ctl file
less codeml.ctl
Run codeml
codeml codeml.ctl
Enter the second directory
cd intraspecies_example/
Align the FASTA amino acid file
muscle -in NC_009928.1_232.faa -out NC_009928.1_232.fa
Make a codon alignment from the peptide alignment and nucleotide sequence
pal2nal.pl NC_009928.1_232.fa NC_009928.1_232.ffn -output fasta > NC_009928.1_232.codonalign.fa
Run codeml after checking out the input/output codeml.ctl file
codeml codeml.ctl
Print the ParaHunter help menu
Parahunter.sh -h
Run program on the test dataset
cd ParaHunter/
cd cyanothece/
ParaHunter.sh -a CyanothecePCC7425.faa -n CyanothecePCC7425.ffn -l ../codeml-2.ctl