Entropy-based automatic partitioning of UCE alignments
The accuracy of phylogenetic inferences often depends on choosing an appropriate model of molecular evolution. In Tagliacollo & Lanfear (submitted), we evaluated the performance of two new partitioning methods for phylogenomics studies of UCES, and conclude that automatic selection of partitions through the SWSC-EN method considerably improves model-fit and parameter estimates. This repository contains scripts to run the SWSC-EN method on your own alignments. For more information about the method, please see the accompanying paper, and the repository for replicating the analyses in that paper.
In brief, this method uses entropy to attempt to split each UCE into 3 parts - a middle part (which is usually quite conserved), and two flanking regions (which are typically more variable). This parts can then be used as input to e.g. PartitionFinder2, which can optimise the partitioning scheme by joining toether similar subsets (e.g. it is often the case that a lot of the central regions are better analysed together than separately).
Use Python 3.6.x or higher, with
Your input file should be a single .nex file with each UCE defined as a CHARSET (see
python SWSCEN.py input.nex output_folder.
input.nex is the full file path to your input file and
output_folder is the full file path to where you want to store your output. If you leave the final argument out, the output will be put in the same folder as the
- Use the .cfg file as input to PartitionFinder2 to optimise the partitioning scheme.
More details on installing and running the script
1. Install Python and its dependencies
SWSC-EN needs Python 3.6.x or higher (but not 2.x!) and some additional libraries to run on personal computers. The simplest way to set this up is to install the Anaconda Python distribution, which can be downloaded from here:
Follow the link for the Python 3.6 graphical installer, then open it and follow the prompts. You need to make sure that you have version 4.4.0 or higher of the Anaconda Python distribution. In addition, install the following dependencies:
which you can do with the following commands after installing Anaconda Python 3.6:
conda install biopython conda install numpy conda install pathlib2 conda install tqdm
2. Download SWSC-EN
Download the latest version of SWSC-EN repository here
Double-click the .zip file, and it will automatically unzip. You will get a folder called 'PFinderUCE-SWSC-EN-master'
Move it to wherever you want to store PFinderUCE-SWSC-EN (e.g. in /Applications)
3. Run SWSC-EN
The instructions below describe how to run the
example_input/example_dataset.nex using SWSC-EN method. This is a nexus DNA alignment with 15 concatenated UCEs
Open Terminal (on Macs), or a command prompt (on windows)
Type “python“ followed by a space (remember, it needs to be python 3.6.x or higher)
Drag and drop the
SWSCEN.pyscript onto your terminal or commmand prompt (the script is here
Type another space
Drag and drop the input file
example_input/example_dataset.nexonto your terminal (note, you need to provide the full filepath)
Hit Enter/Return to run PFinderUCE-SWSC-EN
SWSC-EN needs only a single input file, which is a concatenated nexus alignment (.nex) comprised of UCE markers and including charsets with the locations of the UCEs (example input is in the
/example_input folder of this repository).
By default the output is stored in the same folder as the input file. If you want to change this, you can just add another argument at the commandline, defining the output folder you'd like to use.
There are two outputs:
A PartitionFinder 2 configuration file (.cfg) to be used as the input file for PartitionFinder 2
A csv file (.csv) with values of entropy (SWSC-EN) for each site of the UCEs.
Example output is provided in the