Scripts for homology and orthology assessment from genomic sequences.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
TestData deleted sample file as it can be sources elsewhere Feb 19, 2018
doc_src Added listings, removed biopython as dependency. Small changes. Jun 15, 2017
misc
Al2Phylo.py Small edits. Fixed effective dbasize for blast searches Jan 24, 2018
Alistats.py separated misisng from gap report Dec 6, 2017
BlastResultsCluster.py minor edits Jan 29, 2018
Blast_helper.sh Small edits. Fixed effective dbasize for blast searches Jan 24, 2018
Consensus.py separated misisng from gap report Dec 6, 2017
Documentation.pdf
Get_fasta_from_Ref.py
LICENSE include GPL3 Apr 7, 2016
README.md Update README.md Mar 18, 2017
SelectRepresentative.py Small edits. Fixed effective dbasize for blast searches Jan 24, 2018
UPhO.py allow one letter taxon label... Feb 8, 2018
UPhO_wt.py minor edits andremoved internal args. parameters. Jan 23, 2016
distOrth.py exporting as newick Sep 12, 2017
distOrth_interactive.py upgraded from et2 to ete3. Nothing really changed. Tested on Debian. Apr 5, 2017
minreID.py minor edits to minreid.py Dec 5, 2015
paMATRAX+.sh

README.md

UPhO

UPhO finds orthologs with and without inparalogs from input gene family trees. Refer to the Documentation.pdf for more detailed explanations on its usage, installation and dependencies. Type UPhO.py -h for help.

The only input requierement for UPhO is a tree (or trees) in Newick format in which the leaves are named with a species idenfifier, a field separator, and sequence identifier. By default, the field separator is the character "|" but custom delimiters can be defined. Examples of trees to test UPhO are provided in the TestData folder.

Additional scripts are provided for a variety of task including:

  • minreID.py Renames sequence identifiers adding species (OTU) name and field delimiters character.
  • blast_helper.sh Assists in all vs. all blastp search.
  • BlastResultCluster.py Clusters genes in gene families based on e values threshold and a minimum number of OTUs.
  • paMATRAX+.sh Wrapper of gnu-parallel mafft, trimAl and RAxML (or FastTree) for parallel estimation of phylogenetic trees.
  • UPhO.py The orthology evaluation tool.
  • UPhO_wt.py UPhO with an additional parameter to tolerate some (n) paralogous. Maybe useful in cases where few spurious or misplaced sequences discard a whole orthogroup. Also, this feature could be useful for rooting this orthobranch.
  • Get_Fasta_from_Ref.py Creates FASTA files from lists of sequence identifiers.
  • Al2phylo.py A simple script to prepare MSA for phylogenetic inference with sanitation and representative sequences options.
  • Consensus.py Finds conserved regions in MSA. Not quite useful for this pipeline... I might move it somewhere else or repurpose it.
  • Alistats.py Writes a simple report as (tsv) from input alignments, includind number of species, GC content, and gaps content.
  • distOrth.py Functions for annotating the distribution of orthologs on a tree.
  • distOrth_interactive.py interactive helper for distOrth.

    Each script has (or should have) its own -help flag for details on its usage.

    Disclaimer

    This software is experimental, in active development and comes without warranty. UPhO scripts were developed and tested using Python 2.7 on Linux (RHLE and Debian) and MacOS. Versions of these scripts using Python3 are being tested.

    Citation

    Ballesteros JA and Hormiga G. 2016. A new orthology assessment method for phylogenomic data: Unrooted Phylogenetic Orthology. Molecular Biology and Evolution, doi: 10.1093/molbev/msw069 [abstract](https://doi.org/10.1093/molbev/msw069)