Scripts for homology and orthology assessment from genomic sequences.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
TestData deleted sample file as it can be sources elsewhere Feb 19, 2018
doc_src Added listings, removed biopython as dependency. Small changes. Jun 15, 2017
misc Small edits. Fixed effective dbasize for blast searches Jan 24, 2018 separated misisng from gap report Dec 6, 2017 minor edits Jan 29, 2018 Small edits. Fixed effective dbasize for blast searches Jan 24, 2018 separated misisng from gap report Dec 6, 2017
LICENSE include GPL3 Apr 7, 2016 Update Mar 18, 2017 Small edits. Fixed effective dbasize for blast searches Jan 24, 2018 allow one letter taxon label... Feb 8, 2018 minor edits andremoved internal args. parameters. Jan 23, 2016 exporting as newick Sep 12, 2017 upgraded from et2 to ete3. Nothing really changed. Tested on Debian. Apr 5, 2017 minor edits to Dec 5, 2015


UPhO finds orthologs with and without inparalogs from input gene family trees. Refer to the Documentation.pdf for more detailed explanations on its usage, installation and dependencies. Type -h for help.

The only input requierement for UPhO is a tree (or trees) in Newick format in which the leaves are named with a species idenfifier, a field separator, and sequence identifier. By default, the field separator is the character "|" but custom delimiters can be defined. Examples of trees to test UPhO are provided in the TestData folder.

Additional scripts are provided for a variety of task including:

  • Renames sequence identifiers adding species (OTU) name and field delimiters character.
  • Assists in all vs. all blastp search.
  • Clusters genes in gene families based on e values threshold and a minimum number of OTUs.
  • Wrapper of gnu-parallel mafft, trimAl and RAxML (or FastTree) for parallel estimation of phylogenetic trees.
  • The orthology evaluation tool.
  • UPhO with an additional parameter to tolerate some (n) paralogous. Maybe useful in cases where few spurious or misplaced sequences discard a whole orthogroup. Also, this feature could be useful for rooting this orthobranch.
  • Creates FASTA files from lists of sequence identifiers.
  • A simple script to prepare MSA for phylogenetic inference with sanitation and representative sequences options.
  • Finds conserved regions in MSA. Not quite useful for this pipeline... I might move it somewhere else or repurpose it.
  • Writes a simple report as (tsv) from input alignments, includind number of species, GC content, and gaps content.
  • Functions for annotating the distribution of orthologs on a tree.
  • interactive helper for distOrth.

    Each script has (or should have) its own -help flag for details on its usage.


    This software is experimental, in active development and comes without warranty. UPhO scripts were developed and tested using Python 2.7 on Linux (RHLE and Debian) and MacOS. Versions of these scripts using Python3 are being tested.


    Ballesteros JA and Hormiga G. 2016. A new orthology assessment method for phylogenomic data: Unrooted Phylogenetic Orthology. Molecular Biology and Evolution, doi: 10.1093/molbev/msw069 [abstract](