Switch branches/tags
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
..
Failed to load latest commit information.
man
Doxyfile
GNUmakefile
HMM.lyx
Instructions3.tut.xml
README.html
README.itex.xml
README.pdf
README.xhtml
Tutorial.html
Tutorial.pdf
Tutorial.tut.xml
Tutorial.xhtml
Tutorial2.tut.xml
Tutorial3.tut.xml
Tutorial3.xhtml
Tutorial4.tut.xml
add_screen.pl
developer.md
docbook-fo.xsl
docbook-html.xsl
docbook-xhtml.xsl
docbook.css
log4j.properties
meson.build

README.html

<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>BAli-Phy User's Guide v3.3</title><link rel="stylesheet" type="text/css" href="docbook.css"><meta name="generator" content="DocBook XSL Stylesheets V1.79.1"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div lang="en" class="article"><div class="titlepage"><div><div><h2 class="title"><a name="idp1"></a><span class="application">BAli-Phy</span> User's Guide v3.3</h2></div><div><div class="author"><h3 class="author"><span class="firstname">Benjamin</span> <span class="surname">Redelings</span></h3></div></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl class="toc"><dt><span class="section"><a href="#intro">1. Introduction</a></span></dt><dt><span class="section"><a href="#installation">2. Installation</a></span></dt><dd><dl><dt><span class="section"><a href="#pre-requisites">2.1. Hardware requirements</a></span></dt><dt><span class="section"><a href="#upgrades">2.2. Upgrades</a></span></dt><dt><span class="section"><a href="#idp5">2.3. Install on MS Windows</a></span></dt><dt><span class="section"><a href="#idp10">2.4. Install on Mac OS X</a></span></dt><dt><span class="section"><a href="#idp15">2.5. Install on Linux</a></span></dt><dt><span class="section"><a href="#path">2.6. Add BAli-Phy to your <code class="envar">PATH</code></a></span></dt><dt><span class="section"><a href="#tests">2.7. Test the installed software</a></span></dt><dt><span class="section"><a href="#software_req">2.8. Install programs used for viewing the results</a></span></dt></dl></dd><dt><span class="section"><a href="#running">3. Running the program</a></span></dt><dd><dl><dt><span class="section"><a href="#idp22">3.1. Quick Start</a></span></dt><dt><span class="section"><a href="#idp23">3.2. Command line options</a></span></dt><dt><span class="section"><a href="#cluster">3.3. Running on computing clusters</a></span></dt><dt><span class="section"><a href="#idp27">3.4. Option files (Scripts)</a></span></dt></dl></dd><dt><span class="section"><a href="#input">4. Input</a></span></dt><dd><dl><dt><span class="section"><a href="#idp28">4.1. Sequence formats</a></span></dt><dt><span class="section"><a href="#idp31">4.2. Is my data set too large?</a></span></dt></dl></dd><dt><span class="section"><a href="#output">5. Output</a></span></dt><dd><dl><dt><span class="section"><a href="#idp32">5.1. Output directory</a></span></dt><dt><span class="section"><a href="#idp36">5.2. Output files</a></span></dt><dt><span class="section"><a href="#idp44">5.3. Summarizing the output</a></span></dt><dt><span class="section"><a href="#analysis">5.4. Summarizing the output - scripted</a></span></dt></dl></dd><dt><span class="section"><a href="#subst_models">6. Substitution models</a></span></dt><dd><dl><dt><span class="section"><a href="#dna_models">6.1. DNA and RNA models</a></span></dt><dt><span class="section"><a href="#protein_models">6.2. Protein models</a></span></dt><dt><span class="section"><a href="#doublet_models">6.3. Doublet models (RNA stems)</a></span></dt><dt><span class="section"><a href="#triplet_models">6.4. Triplet models</a></span></dt><dt><span class="section"><a href="#codon_models">6.5. Codon models</a></span></dt><dt><span class="section"><a href="#idp54">6.6. Heterogenous Rates across Sites</a></span></dt></dl></dd><dt><span class="section"><a href="#indel_models">7. Insertion/deletion models</a></span></dt><dt><span class="section"><a href="#functions">8. Models and Priors</a></span></dt><dd><dl><dt><span class="section"><a href="#idp55">8.1. Models and distributions are functions</a></span></dt><dt><span class="section"><a href="#idp56">8.2. Models and '<strong class="userinput"><code>+</code></strong>' notation</a></span></dt><dt><span class="section"><a href="#priors">8.3. Priors</a></span></dt><dt><span class="section"><a href="#default_values">8.4. Default values and default priors</a></span></dt><dt><span class="section"><a href="#types">8.5. Argument and result types</a></span></dt></dl></dd><dt><span class="section"><a href="#idp65">9. Partitioned data sets</a></span></dt><dd><dl><dt><span class="section"><a href="#idp60">9.1. Partitions</a></span></dt><dt><span class="section"><a href="#idp61">9.2. Unlinked models</a></span></dt><dt><span class="section"><a href="#idp62">9.3. Fixing the alignment in some partitions</a></span></dt><dt><span class="section"><a href="#idp63">9.4. Linked models</a></span></dt><dt><span class="section"><a href="#idp64">9.5. Linking models via the <strong class="userinput"><code>link</code></strong> command</a></span></dt></dl></dd><dt><span class="section"><a href="#mixing_and_convergence">10. Convergence and Mixing: Is it done yet?</a></span></dt><dd><dl><dt><span class="section"><a href="#idp66">10.1. Definition of Convergence</a></span></dt><dt><span class="section"><a href="#idp67">10.2. Definition of Mixing</a></span></dt><dt><span class="section"><a href="#idp70">10.3. Diagnostics: Variation in split frequencies across runs (ASDSF/MSDSF)</a></span></dt><dt><span class="section"><a href="#idp71">10.4. Diagnostics: Potential Scale Reduction Factors (PSRF)</a></span></dt><dt><span class="section"><a href="#idp74">10.5. Diagnostics: Effective sample sizes (ESS)</a></span></dt><dt><span class="section"><a href="#idp77">10.6. Diagnostics: Stabilization</a></span></dt></dl></dd><dt><span class="section"><a href="#alignment-utilities">11. Alignment utilities</a></span></dt><dd><dl><dt><span class="section"><a href="#idp78">11.1. alignment-info</a></span></dt><dt><span class="section"><a href="#idp79">11.2. alignment-cat</a></span></dt><dt><span class="section"><a href="#idp80">11.3. alignment-thin</a></span></dt><dt><span class="section"><a href="#idp81">11.4. alignment-draw</a></span></dt><dt><span class="section"><a href="#idp82">11.5. alignment-find</a></span></dt><dt><span class="section"><a href="#idp83">11.6. alignment-indices</a></span></dt><dt><span class="section"><a href="#idp84">11.7. alignment-chop-internal</a></span></dt></dl></dd><dt><span class="section"><a href="#tree-utilities">12. Tree utilities</a></span></dt><dd><dl><dt><span class="section"><a href="#idp85">12.1. trees-consensus</a></span></dt><dt><span class="section"><a href="#idp86">12.2. trees-bootstrap</a></span></dt><dt><span class="section"><a href="#idp87">12.3. trees-to-SRQ</a></span></dt></dl></dd><dt><span class="section"><a href="#compilation">13. Compiling <span class="application">BAli-Phy</span></a></span></dt><dd><dl><dt><span class="section"><a href="#idp91">13.1. Setup</a></span></dt><dt><span class="section"><a href="#quickstart">13.2. Clone, Configure, Compile</a></span></dt><dt><span class="section"><a href="#idp92">13.3. Options: compiler and linker flags</a></span></dt></dl></dd><dt><span class="section"><a href="#FAQ">14. Frequently Asked Questions (FAQ)</a></span></dt><dd><dl><dt><span class="section"><a href="#idp93">14.1. Input files</a></span></dt><dt><span class="section"><a href="#idp94">14.2. Running <span class="command"><strong>bali-phy</strong></span>.</a></span></dt><dt><span class="section"><a href="#idp95">14.3. Run-time error messages</a></span></dt><dt><span class="section"><a href="#idp96">14.4. Stopping <span class="command"><strong>bali-phy</strong></span>.</a></span></dt><dt><span class="section"><a href="#idp97">14.5. Running <span class="command"><strong>bp-analyze</strong></span>.</a></span></dt><dt><span class="section"><a href="#idp98">14.6. Interpreting the results.</a></span></dt><dt><span class="section"><a href="#idp99">14.7. How do I...</a></span></dt></dl></dd></dl></div>
  

  <div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="intro"></a>1. Introduction</h2></div></div></div>
    <p><span class="application">BAli-Phy</span> is a Unix command line program that is developed primarily on Linux.  <span class="application">BAli-Phy</span> also runs on Windows and Mac OS X, but it is not a GUI program and so you must run it in a terminal.  Therefore, you might want to keep a <a class="ulink" href="http://www.ee.surrey.ac.uk/Teaching/Unix" target="_top">Unix tutorial</a> or <a class="ulink" href="http://www.rain.org/~mkummel/unix.html" target="_top">Unix cheat sheet</a> handy while you work.
    </p>

    <p>In addition to the main <span class="command"><strong>bali-phy</strong></span> executable, <span class="application">BAli-Phy</span> comes with a collection of small command-line utilities such as <span class="command"><strong>alignment-cat</strong></span>, <span class="command"><strong>trees-consensus</strong></span>, etc.  These utilities can be used to process alignments, assemble data sets, and summarize the results of MCMC.
    </p>
  </div>

  <div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="installation"></a>2. Installation</h2></div></div></div>

  <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="pre-requisites"></a>2.1. Hardware requirements</h3></div></div></div>
    
    <p>
      We typically run <span class="application">BAli-Phy</span> on workstations with at least 8Gb of RAM and 2 cores.  More cores will allow you to run more MCMC chains at once, and more RAM will allow you to run larger data sets.  However, it is often easier and faster to run BAli-Phy on a (Linux) computing cluster, if you have one available.
    </p>

    <p>
    </p>
  </div>

  <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="upgrades"></a>2.2. Upgrades</h3></div></div></div>
  <p>If you have previously installed bali-phy, you do not have to remove the old version before installing the new version.  Simply follow the installation instructions for the new version.  If you are manually adding the new version of bali-phy to your PATH, just make sure that the new version comes before the old version in the PATH, or remove the old version from the PATH.</p>

  <p>In order to remove an older version, simply delete the directory <code class="filename">bali-phy-<em class="replaceable"><code>oldversion</code></em></code>.  This will completely uninstall the old version from the system. BAli-Phy does not create hidden files that will remain after you remove its directory.</p>
  </div>
  <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp5"></a>2.3. Install on MS Windows</h3></div></div></div>
  <p>First check that you have a 64-bit version of the Windows operation system installed. The executables for download will only run on a 64-bit installation of Windows.  </p>
  <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp2"></a>2.3.1. Install a Unix command line: Cygwin (recommended)</h4></div></div></div>
  <p>Before you can use <span class="application">BAli-Phy</span> on Windows, you need to install a Unix command-line environment.  We recommand installing <a class="ulink" href="http://www.cygwin.com/install.html" target="_top">Cygwin</a>.  You may then access the Unix command line environment by running the Cygwin shell (not the normal windows command line).  The Cygwin shell mounts the <code class="filename">C:</code> drive on <code class="filename">/cygdrive/c/</code>, so you can access the directory <code class="filename">C:/Users/</code> as <code class="filename">/cygdrive/c/Users/</code> from within the Cygwin shell, for example.</p>
  <p>
    While running the Cygwin installer <a class="ulink" href="https://www.cygwin.com/setup-x86_64.exe" target="_top"><code class="filename">setup-x86_64.exe</code></a>, you will be given an opportunity to select additional packages.
	</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>From <span class="guilabel">Science</span>, select <span class="guilabel">R</span>.</p></li><li class="listitem"><p>From <span class="guilabel">Math</span>, select <span class="guilabel">gnuplot</span>.</p></li><li class="listitem"><p>From <span class="guilabel">Interpreters</span>, select <span class="guilabel">perl</span>.</p></li><li class="listitem"><p>From <span class="guilabel">Web</span>, select <span class="guilabel">wget</span>.</p></li><li class="listitem"><p>From <span class="guilabel">Editors</span>, select <span class="guilabel">nano</span>.</p></li></ul></div><p>
    You can re-run the installer to add packages that you did not add during the initial install.
  </p>
      <div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3>BAli-Phy refers to windows files using the normal
      <code class="filename">C:/</code> method because it is compiled as a
      native windows executable.
      The combination of native windows executables (which want <code class="filename">C:/</code>)
      and the Cygwin shell (which wants <code class="filename">/cygdrive/c/</code>) can be
      confusing.  If you supply Cygwin filenames with
      <code class="filename">/cygdrive/</code> to native windows executables like BAli-Phy, then it
      may complain that the files cannot be found.  
      </div>
      
  </div>
  <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp3"></a>2.3.2. Install a Unix command line: Msys2 (alternative)</h4></div></div></div>
  <p>You can optionally use <a class="ulink" href="http://www.msys2.org" target="_top">MSYS2</a> instead of Cygwin. Both MSYS2 and Cygwin can be installed at the same time.  After installing MSYS2, You may access the Unix command line environment by running the MSYS2 shell (not the normal windows command line).  The MSYS2 shell mounts the <code class="filename">C:</code> drive on <code class="filename">/c/</code>, so you can access the directory <code class="filename">C:/Users/</code> as <code class="filename">/c/Users/</code> from within the MSYS2 shell, for example.</p>
  <p>After installing MSYS2 you will need to install a few packages before you proceed.  Run the MSYS2 shell, and enter the command:
  </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>pacman -S perl tar</code></strong></pre><p>
  </p>

  </div>
  <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp4"></a>2.3.3. Install BAli-Phy executables from website</h4></div></div></div>
      <p>
After installing a Unix command line, use it to download and extract the bali-phy executables from the website:
      </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>mkdir -p ~/Applications</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>cd ~/Applications</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>wget http://www.bali-phy.org/files/bali-phy-3.3-win64.tar.gz</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>tar -zxf bali-phy-3.3-win64.tar.gz</code></strong></pre><p>
       Second, check that the <span class="command"><strong>bali-phy</strong></span> executable runs:
      </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>~/Applications/bali-phy-3.3/bin/bali-phy --version</code></strong></pre><p>
      You still need to add it to your PATH as described in <a class="xref" href="#path" title="2.6. Add BAli-Phy to your PATH">Section 2.6, &#8220;Add BAli-Phy to your <code class="envar">PATH</code>&#8221;</a>.
    </p>
</div>
  </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp10"></a>2.4. Install on Mac OS X</h3></div></div></div>
    <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp6"></a>2.4.1. Install BAli-Phy using homebrew (recommended) </h4></div></div></div>
    <p>First install the <a class="ulink" href="https://developer.apple.com/xcode/" target="_top">XCode</a> (version 6 or higher) command line tools:
    </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>xcode-select --install</code></strong></pre><p>

    Then install <a class="ulink" href="http://brew.sh/" target="_top">homebrew</a> and use homebrew to compile and install <span class="command"><strong>bali-phy</strong></span>:
</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>brew tap brewsci/bio</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>brew install bali-phy</code></strong></pre><p>
Check that the executable runs:
</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy --version</code></strong></pre><p>
If you install with homebrew, you don't need to do anything extra to put bali-phy in your PATH.
    </p>
    </div>

    <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp7"></a>2.4.2. Install BAli-Phy using executables from website (alternative)</h4></div></div></div>
    <p>
      Open a windows in the Terminal app to access the UNIX command line.  Then download and extract the executables:
      </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>mkdir -p ~/Applications</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>cd ~/Applications</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>curl -O http://www.bali-phy.org/files/bali-phy-3.3-mac64.tar.gz</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>tar -zxf bali-phy-3.3-mac64.tar.gz</code></strong></pre><p>
      Check that the executable runs:
      </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>~/Applications/bali-phy-3.3/bin/bali-phy --version</code></strong></pre><p>
      You still need to add it to your PATH as described in <a class="xref" href="#path" title="2.6. Add BAli-Phy to your PATH">Section 2.6, &#8220;Add BAli-Phy to your <code class="envar">PATH</code>&#8221;</a>.
    </p>

    </div>

    <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp8"></a>2.4.3. Install programs used by <span class="command"><strong>bp-analyze</strong></span> using homebrew</h4></div></div></div>
    <p>
      You can install <span class="application">gnuplot</span> via homebrew:
</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code> brew install gnuplot</code></strong></pre><p> 
You can install <span class="application">R</span> via homebrew:
</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code> brew tap caskroom/cask</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code> brew cask install xquartz</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code> brew install r</code></strong></pre><p>
However, note that this might conflict with R installed from other places, such as <a class="ulink" href="https://mran.microsoft.com/open/" target="_top">MRAN</a>.
</p>
    </div>
    <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp9"></a>2.4.4. Install some of the programs used for viewing the results using homebrew</h4></div></div></div>
      <p>
	
	You can install Figtree with homebrew:
	</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>brew tap caskroom/cask</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>brew cask install figtree</code></strong></pre><p>
	However, Seaview and Tracer don't have homebrew packages at the moment.
      </p>
    </div>
    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp15"></a>2.5. Install on Linux</h3></div></div></div>

    <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp11"></a>2.5.1. Install BAli-Phy using <span class="command"><strong>apt-get</strong></span></h4></div></div></div>
    BAli-Phy is available on Ubuntu <a class="ulink" href="https://launchpad.net/ubuntu/+source/bali-phy/" target="_top">("Cosmic Cuttlefish" or later)</a>, and Debian (<a class="ulink" href="https://packages.debian.org/search?keywords=bali-phy&amp;searchon=names&amp;section=all" target="_top">testing and unstable</a>).
    <pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>sudo apt-get install bali-phy</code></strong></pre>
    Check that the executable runs:
    <pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy --version</code></strong></pre>
    If you install with <span class="command"><strong>apt-get</strong></span>, you don't need to do anything extra to put bali-phy in your PATH.
    </div>

    <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp12"></a>2.5.2. Install BAli-Phy using executables from website (alternative)</h4></div></div></div>

    <p>First install <span class="command"><strong>wget</strong></span>.  If you have Debian or Ubuntu Linux, type:
    </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>sudo apt-get install wget</code></strong></pre><p>
    </p>
    <p>
      Then download and extract the executables:
      </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>mkdir -p ~/Applications</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>cd ~/Applications</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>wget http://www.bali-phy.org/files/bali-phy-3.3-linux64.tar.gz</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>tar -zxf bali-phy-3.3-linux64.tar.gz</code></strong></pre><p>
      Second, check that the executable runs:
      </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>~/Applications/bali-phy-3.3/bin/bali-phy --version</code></strong></pre><p>
      You still need to add it to your PATH as described in <a class="xref" href="#path" title="2.6. Add BAli-Phy to your PATH">Section 2.6, &#8220;Add BAli-Phy to your <code class="envar">PATH</code>&#8221;</a>.
    </p>
    </div>
    <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp13"></a>2.5.3. Install programs used by <span class="command"><strong>bp-analyze</strong></span></h4></div></div></div>
    <p>If you have Debian or Ubuntu Linux, you can install other recommended programs by typing:
    </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>sudo apt-get install gnuplot</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>sudo apt-get install r-base</code></strong></pre><p>
    </p>
    </div>
    <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp14"></a>2.5.4. Install programs used to view the results</h4></div></div></div>
    <p>
</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>sudo apt-get install seaview</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>sudo apt-get install figtree</code></strong></pre><p>
    However, there isn't a Debian or Ubuntu package for Tracer at the moment.
</p>
    </div>
    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="path"></a>2.6. Add BAli-Phy to your <code class="envar">PATH</code></h3></div></div></div>

    <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp16"></a>2.6.1. Is bali-phy in your PATH already?</h4></div></div></div>
<p> First check if the executable is in your PATH.
</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy --version</code></strong></pre><p>
If this shows version info, then <span class="command"><strong>bali-phy</strong></span> is already in your PATH and you can skip this section.  This should be true if you installed <span class="command"><strong>bali-phy</strong></span> using a package manager such as homebrew or apt, or if you've already added it to your PATH.</p>
<p>If bali-phy is not in your path, then you should see:
</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy --version</code></strong>
bali-phy: command not found.</pre><p>
If bali-phy is not in your PATH, then continue with this section.
</p>
    </div>
    <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp17"></a>2.6.2. Quick version</h4></div></div></div>
<p>Add <span class="command"><strong>bali-phy</strong></span> to your PATH, so that the shell knows where to find it.  This command only affects the terminal in which it is typed, and will not affect new terminals:
</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>export PATH=~/Applications/bali-phy-3.3/bin:$PATH</code></strong></pre><p>
To set the PATH automatically for new terminals, type:
</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>test -r ~/.bash_profile &amp;&amp; echo 'export PATH=~/Applications/bali-phy-3.3/bin:$PATH' &gt;&gt; ~/.bash_profile</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>echo 'export PATH=~/Applications/bali-phy-3.3/bin:$PATH' &gt;&gt; ~/.profile</code></strong></pre><p>
This will affect new terminals only after you log out and log back in though.</p>
<p>
Now check that the executable runs:
</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy --version</code></strong></pre><p>
If it does, then your PATH is set up correctly, and you can probably skip the rest of this section. 
</p>
    
    </div>
      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp18"></a>2.6.3. I have a path?</h4></div></div></div>
      <p>
	If you installed <span class="application">BAli-Phy</span> to the directory
	<code class="filename">~/Applications</code>, then you can run
	bali-phy by typing <span class="command"><strong>~/Applications/bali-phy-3.3/bin/bali-phy</strong></span>.
	However, it would be much nicer to simply type
	<span class="command"><strong>bali-phy</strong></span> and let the computer find the
	executable for you.  This can be achieved by putting the directory
	that contains the <span class="application">BAli-Phy</span> executables into
	your "path".  	The "path" is a colon-separated list of directories that is
	searched to find program names that you type.  It is stored in an
	environment variable called <code class="envar">PATH</code>.
	</p>
      <p>
	Setting your <code class="envar">PATH</code> is also a pre-requisite for running
	the <span class="command"><strong>bp-analyze</strong></span> script to summarize your
	MCMC runs.
      </p>
      </div>

      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp19"></a>2.6.4. Examining your <code class="envar">PATH</code></h4></div></div></div>
      <p>
	You can examine the current value of
	this environment variable by typing:
	</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>echo $PATH</code></strong></pre><p>
	We will assume that you extracted the bali-phy archive in
	<code class="filename">~/Applications</code> and so you want to add
	<code class="filename">$HOME/Applications/bali-phy-3.3/bin</code>
	to your <code class="envar">PATH</code>.  (If you installed to another directory,
	replace <code class="filename">$HOME/Applications/bali-phy-3.3/</code> with that directory.)
      </p>
      </div>

      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp20"></a>2.6.5. Adding BAli-Phy to your <code class="envar">PATH</code></h4></div></div></div>
      <p>The commands
	for doing this depend on what "shell" you are using.  Type
	<span class="command"><strong>echo $SHELL</strong></span> to find out. If your
	shell is <span class="command"><strong>sh</strong></span> or 
	<span class="command"><strong>bash</strong></span> then the command looks like this: 
	</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>PATH=$HOME/Applications/bali-phy-3.3/bin:$PATH</code></strong></pre><p>
	If your shell is <span class="command"><strong>csh</strong></span> or
	<span class="command"><strong>tcsh</strong></span>, then the command looks like this:
	</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>setenv PATH $HOME/Applications/bali-phy-3.3/bin:$PATH</code></strong></pre><p>
	Note that these commands will only affect the window you are typing
	in, and will vanish when you reboot.   
      </p>
      </div>

      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp21"></a>2.6.6. Making the change stick</h4></div></div></div>
	<p>
	  To make this change survives when you logout or reboot, open
	  your shell configuration file in a text editor, and add the
	  command on a line by itself.  This will ensure that it is
	  run every time you log in.
	</p>

	<p>To find the right configuration file, look in your $HOME directory
	  for <code class="filename">.profile</code> (for the Bourne shell <span class="command"><strong>sh</strong></span>), 
	  <code class="filename">.bash_profile</code> (for BASH), or
	  <code class="filename">.login</code> (for tcsh).  You may have to
	  create the file if it is not present.  On Cygwin, you should
	  put the change in the file <code class="filename">.bashrc</code>.
	</p>

	<p>If you do not know which directory is your home
	directory, you can find its full name by typing:
	</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>echo $HOME</code></strong></pre><p>
	</p>
      </div>
    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="tests"></a>2.7. Test the installed software</h3></div></div></div>
    <p>In order to determine that the software has been correctly installed, and the <code class="envar">PATH</code> has been correctly set, run the following commands:
    </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy ~/Applications/bali-phy-3.3/share/doc/bali-phy/examples/sequences/5S-rRNA/25.fasta --iter=150</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>bali-phy ~/Applications/bali-phy-3.3/share/doc/bali-phy/examples/sequences/5S-rRNA/25.fasta --iter=150</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>bp-analyze 25-1 25-2</code></strong></pre><p>
    </p>
    <p>Furthermore, the directories <code class="filename">25-1</code> and <code class="filename">25-2</code> should contain a file called <code class="filename">C1.log</code>.  You should be able to load these files in Tracer, although the chain will not really have converged yet.</p>
    </div>


    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="software_req"></a>2.8. Install programs used for viewing the results</h3></div></div></div>

    <p>
      </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem">
	  <p>
	    <a class="ulink" href="http://tree.bio.ed.ac.uk/software/tracer/" target="_top">
	      Tracer
	    </a>
	    :  MCMC parameter/diagnostic viewer.
	  </p>
	</li><li class="listitem">
	  <p>
	    <a class="ulink" href="http://tree.bio.ed.ac.uk/software/figtree/" target="_top">
	      FigTree
	    </a>
	    : Phylogeny Viewer
	  </p>
	</li><li class="listitem">
	  <p>
	    <a class="ulink" href="http://pbil.univ-lyon1.fr/software/seaview.html" target="_top">
	      SeaView
	    </a>
	    : Alignment viewer.
	  </p>
	</li></ul></div><p>
    </p>

    </div>
    </div>


  <div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="running"></a>3. Running the program</h2></div></div></div>
    

    <p>Here are some examples and explanations of how to run <span class="command"><strong>bali-phy</strong></span>.  You can get an overview of command line options by running <span class="command"><strong> bali-phy --help</strong></span>.</p>
<p>We recommend running multiple chains in parallel for each command, because 
</p><div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem">You can combine the samples, leading to faster run times.</li><li class="listitem">You can compare the runs to determine if the chains have converged.</li></ol></div><p>
 
This can be done simply by starting several instances of the program, and does not require using MPI or special command-line options.
</p>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp22"></a>3.1. Quick Start</h3></div></div></div>
      
      <p>The simplest way to run <span class="command"><strong>BAli-Phy</strong></span> is
	to type all the arguments on the command line:

	</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy <em class="replaceable"><code>sequence-file</code></em></code></strong></pre><p>

	Here <em class="replaceable"><code>sequence-file</code></em> is a FastA or PHYLIP
	file containing the sequences you wish to analyze.  The filename should end
	in <strong class="userinput"><code>.fasta</code></strong> or <strong class="userinput"><code>.phy</code></strong> to
	indicate which format it is using.</p>

      <p>In this simple example, <span class="command"><strong>bali-phy</strong></span> automatically detects whether <em class="replaceable"><code>sequence-file</code></em> contains DNA, RNA, or Amino-Acids and uses default values for several command line options.  Thus, if <em class="replaceable"><code>sequence-file</code></em> contains DNA, then this is equivalent to the more verbose command line
	</p><pre class="screen"><code class="prompt">%</code> bali-phy <em class="replaceable"><code>sequence-file</code></em> --alphabet DNA --smodel tn93 --imodel rs07</pre><p>  Here the substitution model is Tamura-Nei, the insertion/deletion model is rs07.  If <em class="replaceable"><code>sequence-file</code></em> contains amino acids, then the defaults will be:</p><pre class="screen"><code class="prompt">%</code> bali-phy <em class="replaceable"><code>sequence-file</code></em> --alphabet Amino-Acids --smodel lg08 --imodel rs07</pre><p> 
      </p>
    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp23"></a>3.2. Command line options</h3></div></div></div>
      

      <p>You can specify a more complex substitution model as follows:

	</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy <em class="replaceable"><code>sequence-file</code></em> --smodel lg08+Rates.gamma+inv</code></strong></pre><p> </p>
      <p>You may specify an indel model of <strong class="userinput"><code>none</code></strong>
	to fix the alignment to its initial value, and ignore information in shared insertions or deletions.
  	</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy <em class="replaceable"><code>sequence-file</code></em> --imodel none</code></strong></pre>
    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="cluster"></a>3.3. Running on computing clusters</h3></div></div></div>
    <p>
      Running <span class="command"><strong>bali-phy</strong></span> on a computing cluster is
      not necessary, but can speed up the analysis dramatically.
      This is because a cluster allows you to run several
      <span class="emphasis"><em>independent</em></span> MCMC chains simultaneously and
      pool the resulting samples.  You can run multiple chains
      simultaneously simply by starting several different instances of
      <span class="command"><strong>bali-phy</strong></span>.  Each instance of bali-phy runs
      only one chain and does not require using MPI or special
    command-line options.</p>   

      <p>This approach to parallel computation is sometimes more
      efficient than MCMCMC-based parallelism involving heated chains.
      It is equivalent to running MCMCMC with no temperature
      difference between chains, with the exception that it allows
      results from <span class="emphasis"><em>all</em></span> chains to be used, instead
      of just results from the single "cold" chain.  Thus, if you run
      10 independent chains in parallel, then you may gather samples
      10 times faster that a single chain. 
      </p>
    </div>

   <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp27"></a>3.4. Option files (Scripts)</h3></div></div></div>
      
      <p>
	In addition to using the command line, you may also specify
	options in a file. Using an option file can be more convenient
	if you are going to run the same analysis many times, or if
	the number of options is large. Furthermore, the option file
	may contain comments and blank lines.  Option files are a good
	to record what options you used in an analysis, and why.
      </p>

	<p>
	  An option file is specified with the command line option <strong class="userinput"><code>--config
	    <em class="replaceable"><code>file</code></em></code></strong> or <strong class="userinput"><code>-c<em class="replaceable"><code>file</code></em></code></strong>. If values
	    for an option are given both on the command line and
	    in an option file, then the command line value overrides
	    the value in the option file. 
	</p>
      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp24"></a>3.4.1. Syntax</h4></div></div></div>
	

	<p>Option files use the same option names as the command
	line.  However, the syntax is different:  each option is given
	  on its own line using the syntax "<strong class="userinput"><code>option =
	    value</code></strong>" instead of the syntax "<strong class="userinput"><code>--option
	    value</code></strong>".  If the option has no value then it is
	  given using the syntax  "<strong class="userinput"><code>option =
	    option</code></strong>".  
	</p>
      </div>

      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp25"></a>3.4.2. Example</h4></div></div></div>
	
	<p>
	  For example, consider the following
	  option file:
	  </p><pre class="programlisting"># sequence data for 3 genes/partitions
align = ITS1.fasta
align = 5.8S.fasta
align = ITS2.fasta

# linked substitution model for 1st and 3rd partition
smodel = 1,3:tn93+Rates.free[n=3]

# substitution model for 2nd partition
smodel = 2:tn93

# indel model for second partition
imodel = 2:none

# linked scale for 1st and 3rd partition
scale = 1,3:</pre><p>
          The <strong class="userinput"><code>align</code></strong> option indicates sequence files, and has no name on the command line.
	  Lines that begin with # are comments, and blank lines are
	  ignored.  This is thus equivalent to the rather long command line:
	  </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy ITS1.fasta 5.8S.fasta ITS2.fasta --smodel=1,3:tn93+Rates.free[n=3] --smodel=2:tn93 --imodel=2:none --scale=1,3:</code></strong></pre><p>
	</p>
      </div>

      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp26"></a>3.4.3. The configuration file</h4></div></div></div>
	
	<p>
	  The file <code class="filename">~/.bali-phy</code> is a special
	  option file called the <span class="emphasis"><em>configuration
	    file.</em></span>  If it exists, it is always loaded.
	  Options given on the command line or an option file 
	  override values given in <code class="filename">~/.bali-phy</code>. 
	</p>
      </div>
    </div>

  </div>

  <div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="input"></a>4. Input</h2></div></div></div>
    

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp28"></a>4.1. Sequence formats</h3></div></div></div>
      
      <p><span class="application">BAli-Phy</span> can read in sequences
	and alignments in both FastA and PHYLIP formats.  Filenames for
	FastA files should end in <strong class="userinput"><code>.fasta</code></strong>,
	<strong class="userinput"><code>.mpfa</code></strong>, <strong class="userinput"><code>.fna</code></strong>,
	<strong class="userinput"><code>.fas</code></strong>, <strong class="userinput"><code>.fsa</code></strong>, or
	<strong class="userinput"><code>.fa</code></strong>.  Filenames for PHYLIP files should
	end in <strong class="userinput"><code>.phy</code></strong>.  If one of these extensions
	is not used, then <span class="application">BAli-Phy</span> will
	attempt to guess which format is being used.
      </p>

    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp31"></a>4.2. Is my data set too large?</h3></div></div></div>
      

      <p>Large data sets run more slowly than small data
	sets. We recommend a conservative starting point with few taxa
	and short sequence lengths.  You can then increase the size of
	your data set until a balance between speed and size is
	reached.  The tool <span class="application">alignment-thin</span> described
        in <a class="xref" href="#alignment-utilities" title="11. Alignment utilities">Section 11, &#8220;Alignment utilities&#8221;</a> can be used to construct a smaller
        data set.</p> 

      <p>The number of MCMC samples that you need depends on whether you
	are primarily interested in obtaining a point estimate or in
	obtaining detailed measures of confidence and uncertainty.  For
	detailed measures of confidence and uncertainty you should
	obtain a minimum of 10,000 samples after the Markov chain
	converges.  For an estimate, you don't need very many samples
	after convergence.  (But you may need many samples to be sure
	that you've converged!)
      </p>

      <p>See also <a class="xref" href="#cluster" title="3.3. Running on computing clusters">Section 3.3, &#8220;Running on computing clusters&#8221;</a>.</p>

      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp29"></a>4.2.1. Too many taxa?</h4></div></div></div>
	

	<p><span class="application">BAli-Phy</span> is quite CPU
	  intensive, and so we recommend using 150 or fewer taxa in order
	  to limit the time required to accumulate enough MCMC
	  samples.
	</p>
	<p>When designing an MCMC analysis, I recommend performing an initial analysis
	with a much smaller number of sequences.  This smaller analysis will run much faster, and
	allow discovering mistakes much more quickly.  Then, after you are sure that you
	are running the program correctly and have chosen the best model, you can ramp
	up the number of sequences towards your desired number.
	</p> 

      </div>

      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp30"></a>4.2.2. Sequences too long?</h4></div></div></div>
	

	<p>Aligning just a pair of sequences takes <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><mi>O</mi><mo stretchy="false">(</mo><msup><mi>L</mi> <mn>2</mn></msup><mo stretchy="false">)</mo></math> time
	  and memory, where <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><mi>L</mi></math> represents the sequence length.  Therefore
	  sequences longer than (say) 1000 letters become increasingly
	  impractical.  However, you might try to see how long you can make your
	  sequences before you run out of memory, or the program
	  becomes too slow.</p>

	<p>For multi-gene analyses, two separate data partitions
	  (i.e. genes) of 500 letters will be twice as fast
	  to align as one data partition of 1000 letters. So, it may be possible
	  to analyze several genes as long as each gene individually
	  is not too long.</p> 

	

	<p>Also, note that you can sometimes speed up the analysis
	  of protein sequences by coding them as amino acids or codons, rather 
	  than nucleotides. This is because it decreases the sequence
	  length.
	</p> 
      </div>

    </div>

  </div>

  <div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="output"></a>5. Output</h2></div></div></div>
    

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp32"></a>5.1. Output directory</h3></div></div></div>
      
      <p><span class="application">BAli-Phy</span> creates a new
	directory to store its output files each time it is run.  By default, the
	directory name is the name of the sequence file, with a number
	added on the end to make it unique. <span class="application">BAli-Phy</span>
	first checks  if there is already a directory called
	<code class="filename"><em class="replaceable"><code>file</code></em>-1/</code>, and then moves on to
	<code class="filename"><em class="replaceable"><code>file</code></em>-2/</code>, etc. until it finds an
	unused directory name.</p> 
      
      <p>You can specify a different name to use instead of the
	sequence-file name by using the <strong class="userinput"><code>--name</code></strong> option.</p>
    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp36"></a>5.2. Output files</h3></div></div></div>
      
      <p><span class="application">BAli-Phy</span> writes the following output
	files inside the directory that it creates:</p>
      
      <div class="variablelist"><table border="0" class="variablelist"><colgroup><col align="left" valign="top"><col></colgroup><tbody><tr><td><p><span class="term">C1.out</span></p></td><td>
	    <p>Iteration numbers, probabilities, success probabilities for transition kernels, etc..</p>
	  </td></tr><tr><td><p><span class="term">C1.P<em class="replaceable"><code>p</code></em>.fastas</span></p></td><td>
	    <p>Sampled alignments for partition <em class="replaceable"><code>p</code></em> including ancestral sequences.</p>
	  </td></tr><tr><td><p><span class="term">C1.err</span></p></td><td>
	    <p>Log file for hopefully irrelevant error messages.</p>
	  </td></tr><tr><td><p><span class="term">C1.MAP</span></p></td><td>
	    <p>Successive estimates of the MAP alignment, tree and parameters.</p>
	  </td></tr><tr><td><p><span class="term">C1.log</span></p></td><td>
	    <p>Numeric parameters: indel and substitution rates, etc. </p>
	    <p>(<span class="emphasis"><em>One sample per line.</em></span>)</p>
	  </td></tr><tr><td><p><span class="term">C1.trees</span></p></td><td>
	    <p>Tree samples in Newick format.</p>
	    <p>(<span class="emphasis"><em>One sample per line.</em></span>)</p>
	  </td></tr><tr><td><p><span class="term">C1.run.json</span></p></td><td>
	    <p>JSON file containing information about the command line, models, hostname, start time, etc.</p>
	  </td></tr></tbody></table></div>
      
      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp35"></a>5.2.1. Field names in <code class="filename">C1.log</code></h4></div></div></div>

      <p>This section explains the meaning of the various field names in the file <code class="filename">C1.log</code>.</p>


      <div class="section"><div class="titlepage"><div><div><h5 class="title"><a name="idp33"></a>5.2.1.1. Computed parameter names</h5></div></div></div>
      <div class="variablelist"><table border="0" class="variablelist"><colgroup><col align="left" valign="top"><col></colgroup><tbody><tr><td><p><span class="term">prior</span></p></td><td>
	    <p>The log prior probability.  This includes the probability of the alignment, since the alignment is not observed.</p>
	  </td></tr><tr><td><p><span class="term">prior_A<em class="replaceable"><code>n</code></em></span></p></td><td>
	    <p>The log of the probability <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><mi>Pr</mi><mo stretchy="false">(</mo><msub><mi>A</mi> <mi>n</mi></msub><mo stretchy="false">&#8739;</mo><mi>&#964;</mi><mo>,</mo><mi>T</mi><mo>,</mo><mi>&#923;</mi><mo stretchy="false">)</mo></math> of the alignment <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><msub><mi>A</mi> <mi>n</mi></msub></math> of the <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><mi>n</mi></math>th partition, given the topology <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><mi>&#964;</mi></math>, the branch lengths <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><mi>T</mi></math>, and insertion-deletion process parameters <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><mi>&#923;</mi></math>.  This log probability is the probabilistic equivalent of a gap penalty on the alignment <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><msub><mi>A</mi> <mi>n</mi></msub></math> given the scoring parameters <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><mi>&#923;</mi></math>.</p>
	  </td></tr><tr><td><p><span class="term">likelihood</span></p></td><td>
	    <p>The log of the likelihood.  Conditional on the alignment, this is determined entirely by the substitution model, and ignores insertions and deletions.  This is the probabilistic equivalent of the mismatch penalty.</p>
	  </td></tr><tr><td><p><span class="term">posterior</span></p></td><td>
	    <p>The log of the posterior probability.  The posterior probability is the product of the prior and the likelihood.</p>
	  </td></tr><tr><td><p><span class="term">|A|</span></p></td><td>
	    <p>The total number of alignment columns across all partitions.</p>
	  </td></tr><tr><td><p><span class="term">#indels<em class="replaceable"><code>n</code></em></span></p></td><td>
	    <p>The number of indel events in partition <em class="replaceable"><code>n</code></em>, if we group adjacent indels that occur on the same branch.</p>
	  </td></tr><tr><td><p><span class="term">#indels</span></p></td><td>
	    <p>The total number of indel events across all partitions, if we group adjacent indels that occur on the same branch.</p>
	  </td></tr><tr><td><p><span class="term">|indels<em class="replaceable"><code>n</code></em>|</span></p></td><td>
	    <p>The length of indel events in partition <em class="replaceable"><code>n</code></em>, if we group adjacent indels that occur on the same branch.</p>
	  </td></tr><tr><td><p><span class="term">|indels|</span></p></td><td>
	    <p>The total length of indel events across all partitions, if we group adjacent indels that occur on the same branch.</p>
	  </td></tr><tr><td><p><span class="term">#substs<em class="replaceable"><code>n</code></em></span></p></td><td>
	    <p>The unweighted parsimony score for substitutions in partition <em class="replaceable"><code>n</code></em>.</p>
	  </td></tr><tr><td><p><span class="term">#substs</span></p></td><td>
	    <p>The total unweighted parsimony score for substitutions across all partitions.</p>
	  </td></tr><tr><td><p><span class="term">Scale<em class="replaceable"><code>n</code></em> * |T|</span></p></td><td>
	    <p>The branch lengths for partition group <em class="replaceable"><code>n</code></em>.</p>
	  </td></tr></tbody></table></div>
      </div>

      <div class="section"><div class="titlepage"><div><div><h5 class="title"><a name="idp34"></a>5.2.1.2. Model parameter names</h5></div></div></div>

      <p>The prefixes "S<em class="replaceable"><code>n</code></em>/" and "I<em class="replaceable"><code>n</code></em>/" will be dropped if not necessary to disambiguate parameters with the same name in different sub-models.</p>

      <div class="variablelist"><table border="0" class="variablelist"><colgroup><col align="left" valign="top"><col></colgroup><tbody><tr><td><p><span class="term">Scale[<em class="replaceable"><code>n</code></em>]</span></p></td><td>
	    <p>The average number of substitutions per branch in scale group <em class="replaceable"><code>n</code></em>.  The <em class="replaceable"><code>n</code></em>th scale group applies to the <em class="replaceable"><code>n</code></em>th partition, unless multiple partitions are forced to have the same branch length scale using <strong class="userinput"><code>--scale</code></strong> or <strong class="userinput"><code>--link</code></strong>.</p>
	  </td></tr><tr><td><p><span class="term">S<em class="replaceable"><code>n</code></em>/<em class="replaceable"><code>name</code></em></span></p></td><td>
	    <p>Parameter <em class="replaceable"><code>name</code></em> in the <em class="replaceable"><code>n</code></em>th substitution model.</p>
	  </td></tr><tr><td><p><span class="term">I<em class="replaceable"><code>n</code></em>/<em class="replaceable"><code>name</code></em></span></p></td><td>
	    <p>Parameter <em class="replaceable"><code>name</code></em> in the <em class="replaceable"><code>n</code></em>th insertion/deletion model.</p>
	  </td></tr></tbody></table></div>
      </div>
      </div>
    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp44"></a>5.3. Summarizing the output</h3></div></div></div>
      

      <p>This section is primarily about extracting estimates from output files.  See <a class="xref" href="#mixing_and_convergence" title="10. Convergence and Mixing: Is it done yet?">Section 10, &#8220;Convergence and Mixing: Is it done yet?&#8221;</a> for methods of determine effective sample sizes, and for checking mixing and convergence.</p>

      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp37"></a>5.3.1. Finding the majority consensus tree</h4></div></div></div>
	
	<p>
To compute the majority consensus tree, do the following.  (The
program <a class="ulink" href="http://tree.bio.ed.ac.uk/software/figtree/" target="_top">FigTree</a>
allows you to view the resulting tree file graphically.)
</p><pre class="screen"><code class="prompt">%</code> trees-consensus <em class="replaceable"><code>dir-1</code></em>/C1.trees <em class="replaceable"><code>dir-2</code></em>/C1.trees &gt; <code class="filename">c50.PP.tree</code></pre><p>
</p>

<p>By default, the first 10% of tree samples are skipped as burn-in (<strong class="userinput"><code>--skip=10%</code></strong> or <strong class="userinput"><code>-s 10%</code></strong>) and every generation is analyzed (<strong class="userinput"><code>--subsample=1</code></strong> or <strong class="userinput"><code>-x 1</code></strong>).  To discard the first 1000 tree samples and analyze every 10th sample:
</p><pre class="screen"><code class="prompt">%</code> trees-consensus -s 1000 -x 10 <em class="replaceable"><code>dir-1</code></em>/C1.trees <em class="replaceable"><code>dir-2</code></em>/C1.trees &gt; <code class="filename">c50.PP.tree</code></pre><p>
By default, splits are included in the consensus tree if they have a
PP greater than 0.5.  You can specify a more stringent level
(e.g. 0.66) by adding the option
<strong class="userinput"><code>--consensus-PP=0.66</code></strong> as follows:
</p><pre class="screen"><code class="prompt">%</code> trees-consensus -s20% -x10 --consensus-PP=0.66 <em class="replaceable"><code>dir-1</code></em>/C1.trees <em class="replaceable"><code>dir-2</code></em>/C1.trees &gt; <code class="filename">c66.PP.tree</code></pre><p>
You may also make the program write directly to the output file
(e.g. <code class="filename">c66.PP.tree</code>) by using the more general form
<strong class="userinput"><code>--consensus-PP=0.66:c66.PP.tree</code></strong>.  Leaving off 
the "<strong class="userinput"><code>:c66.PP.tree</code></strong>" part (as we did above) or specifying
"<strong class="userinput"><code>:-</code></strong>" sends the output to the standard output
(e.g. the terminal, if not redirected). 
</p><pre class="screen"><code class="prompt">%</code> trees-consensus -s20% -x10 <em class="replaceable"><code>dir-1</code></em>/C1.trees <em class="replaceable"><code>dir-2</code></em>/C1.trees --consensus-PP=0.66:<code class="filename">c66.PP.tree</code></pre><p>
You can supply multiple levels and filenames separated by commas.
This is faster than running the program multiple times with different
consensus levels.
</p><pre class="screen"><code class="prompt">%</code> trees-consensus -s20% -x10 <em class="replaceable"><code>dir-1</code></em>/C1.trees <em class="replaceable"><code>dir-2</code></em>/C1.trees --consensus-PP=0.5:<code class="filename">c50.PP.tree</code>,0.66:<code class="filename">c66.PP.tree</code></pre><p>
Finally, you may use the option <strong class="userinput"><code>--consensus=</code></strong>
instead of the option <strong class="userinput"><code>--consensus-PP=</code></strong> if you do
not wish the resulting tree to contain embedded posterior
probabilities on branches, as well as branch lengths.
</p><pre class="screen"><code class="prompt">%</code> trees-consensus -s20% -x10 <em class="replaceable"><code>dir-1</code></em>/C1.trees <em class="replaceable"><code>dir-2</code></em>/C1.trees --consensus=0.5:<code class="filename">c50.PP.tree</code>,0.66:<code class="filename">c66.PP.tree</code></pre><p>
Both the <strong class="userinput"><code>--consensus=</code></strong> and 
<strong class="userinput"><code>--consensus-PP=</code></strong> options may be given simultaneously.
</p>

<p>
  See <strong class="userinput"><code>trees-consensus --help</code></strong> for a complete list of options.
</p>

      </div>

      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp38"></a>5.3.2. Finding the greedy consensus tree</h4></div></div></div>
	
	<p>
	  The greedy consensus tree may be used instead of a majority-consensus tree when a fully resolved (e.g. bifurcating) tree is required.  When the topology has many tips and each topology may be sampled only once, the greedy consensus should be higher quality than the estimate of the MAP topology.  To obtained a fully resolved tree, the  greedy consensus strategy starts with the majority consensus and then adds the highest-supported split that does not conflict.</p>
	
	<p>To compute the <span class="emphasis"><em>greedy consensus</em></span> tree do:
</p><pre class="screen"><code class="prompt">%</code> trees-consensus --skip=<em class="replaceable"><code>burnin</code></em> <em class="replaceable"><code>dir-1</code></em>/C1.trees <em class="replaceable"><code>dir-2</code></em>/C1.trees --greedy-consensus=<code class="filename">greedy.tree</code></pre><p>	
</p>

      </div>

      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp39"></a>5.3.3. Finding the M.A.P. tree</h4></div></div></div>
	
	<p>
To compute the <span class="emphasis"><em>maximum a posteriori</em></span> tree do:
</p><pre class="screen"><code class="prompt">%</code> trees-consensus --skip=<em class="replaceable"><code>burnin</code></em> <em class="replaceable"><code>dir-1</code></em>/C1.trees <em class="replaceable"><code>dir-2</code></em>/C1.trees --map-tree=<code class="filename">MAP.tree</code></pre><p>	
When the tree has many tips, each topology may be sampled only once, leading to low quality estimates of the MAP topology.  As a result, when you need a bifurcating tree you should probably use the greedy consensus instead.
</p>
      </div>

      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp40"></a>5.3.4. Checking topology convergence</h4></div></div></div>
	
<p>
</p><pre class="screen"><code class="prompt">%</code> trees-bootstrap <em class="replaceable"><code>dir-1</code></em>/C1.trees <em class="replaceable"><code>dir-2</code></em>/C1.trees</pre><p>	
This command computes the effective sample size for the posterior probability of each split.  It also computes the Average Standard Deviation of Split Frequencies (ASDSF) between two or more independent runs.</p>

<p>See <a class="xref" href="#mixing_and_convergence" title="10. Convergence and Mixing: Is it done yet?">Section 10, &#8220;Convergence and Mixing: Is it done yet?&#8221;</a> for more information.
</p>  
      </div>

      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp41"></a>5.3.5. Summarizing numerical parameters</h4></div></div></div>
	
<p>
This command gives a median and confidence interval, ESS, and a stabilization time:
</p><pre class="screen"><code class="prompt">%</code> statreport <em class="replaceable"><code>dir-1</code></em>/C1.log <em class="replaceable"><code>dir-2</code></em>/C1.log &gt; Report </pre><p>	
When multiple runs are analyzed, this command gives PSRF and joint ESS values. The program <a class="ulink" href="http://tree.bio.ed.ac.uk/software/tracer/" target="_top">Tracer</a> allows you to view the same summaries graphically.</p>

<p>See <a class="xref" href="#mixing_and_convergence" title="10. Convergence and Mixing: Is it done yet?">Section 10, &#8220;Convergence and Mixing: Is it done yet?&#8221;</a> for more information.
</p>
      </div>

      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp42"></a>5.3.6. Computing an alignment using Posterior Decoding</h4></div></div></div>
	
<p>
</p><pre class="screen"><code class="prompt">%</code> cut-range --skip=<em class="replaceable"><code>burn-in</code></em> &lt; C1.P<em class="replaceable"><code>p</code></em>.fastas | alignment-max &gt; P<em class="replaceable"><code>p</code></em>-max.fasta</pre><p>
You can use the program <a class="ulink" href="http://pbil.univ-lyon1.fr/software/seaview.html" target="_top">SeaView</a> to view the alignment graphically.
</p>
      </div>      

            

      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp43"></a>5.3.7. Create an Au (Alignment Uncertainty) plot</h4></div></div></div>
	
<p>To annotate a specific alignment <em class="replaceable"><code>alignment</code></em>.fasta, choose a fully resolved tree estimate <em class="replaceable"><code>tree</code></em>:
</p><pre class="screen"><code class="prompt">%</code> cut-range --skip=<em class="replaceable"><code>burn-in</code></em> &lt; C1.P<em class="replaceable"><code>p</code></em>.fastas | alignment-gild <em class="replaceable"><code>alignment</code></em>.fasta <em class="replaceable"><code>tree</code></em>  &gt; <em class="replaceable"><code>alignment</code></em>-AU.prob
<code class="prompt">%</code> alignment-draw <em class="replaceable"><code>alignment</code></em>.fasta --AU <em class="replaceable"><code>alignment</code></em>-AU.prob &gt; <em class="replaceable"><code>alignment</code></em>-AU.html</pre><p>
The majority consensus tree is usually not fully resolved, so we recommend using the greedy consensus instead.
</p>
      </div>


    </div>


  <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="analysis"></a>5.4. Summarizing the output - scripted</h3></div></div></div>
    
    
    <p>
      Instead of manually running each of the steps to analyze the
      output files, you may instead run the PERL script
      <span class="command"><strong>bp-analyze</strong></span> to execute these commands.  The
      script will create an HTML page
      <code class="filename">Results/index.html</code> that summarizes the
      posterior distribution.
    </p>

  <p>You may run <span class="command"><strong>bp-analyze</strong></span> inside the output directory, like this:
</p><pre class="screen"><code class="prompt">%</code> bp-analyze --burnin=<em class="replaceable"><code>iterations</code></em></pre><p>
      You may also run it with one or more output directories as
      arguments, like this:
</p><pre class="screen"><code class="prompt">%</code> bp-analyze --burnin=<em class="replaceable"><code>iterations</code></em> <em class="replaceable"><code>directory</code></em>-1/ <em class="replaceable"><code>directory</code></em>-2/</pre><p>
      In this case, output from multiple runs will be used to assess convergence and mixing, as well as to increase the precision of the estimates.
    </p>

<p> All the commands that are executed by <span class="command"><strong>bp-analyze</strong></span> will be logged to
      <code class="filename">Results/bp-analyze.log</code>. You can also see these
      commands as they are executed by supplying the <span class="command"><strong>--verbose</strong></span> option:
</p><pre class="screen"><code class="prompt">%</code> bp-analyze --burnin=<em class="replaceable"><code>iterations</code></em> --verbose</pre><p>
    </p>

    <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp45"></a>5.4.1. Meaning of generated files</h4></div></div></div>
      
    <p>The <code class="filename">Results/</code> directory will contain
    the following useful files:</p>

      <div class="variablelist"><table border="0" class="variablelist"><colgroup><col align="left" valign="top"><col></colgroup><tbody><tr><td><p><span class="term">Report</span></p></td><td>
	    <p>A summary of numerical parameters: credible
	    intervals and mixing.</p>
	</td></tr><tr><td><p><span class="term">consensus</span></p></td><td>
	    <p>A summary of supported splits (clades). </p>
	</td></tr><tr><td><p><span class="term">c-levels.plot</span></p></td><td>
	    <p>The number of splits (clades) supported at each LOD level.</p>
	</td></tr><tr><td><p><span class="term">c50.tree</span></p></td><td>	<p>The majority consensus topology + branch lengths (Newick format)</p> 
	</td></tr><tr><td><p><span class="term">c50.PP.tree</span></p></td><td>
	<p>The majority consensus topology + branch lengths +
	Posterior Probabilities (Newick format)</p> 
	</td></tr><tr><td><p><span class="term">MAP.tree</span></p></td><td>
	    <p>An estimate of the MAP topology + branch lengths (Newick format)</p>
	</td></tr></tbody></table></div>
      <p> 
	The following files will be generated to summarize alignment uncertainty, unless the analysis uses a fixed alignment.

      </p>

      <div class="variablelist"><table border="0" class="variablelist"><colgroup><col align="left" valign="top"><col></colgroup><tbody><tr><td><p><span class="term">P<em class="replaceable"><code>p</code></em>-max.fasta</span></p></td><td>
	    <p>An estimate of the alignment for partition
	    <em class="replaceable"><code>p</code></em> using maximum posterior decoding.</p>
	</td></tr><tr><td><p><span class="term">P<em class="replaceable"><code>p</code></em>-max-AU.html</span></p></td><td>
	    <p>An AU plot of the maximum posterior decoding alignment for partition
	    <em class="replaceable"><code>p</code></em>  (AA/DNA color-scheme).</p>
	</td></tr></tbody></table></div>


      <p>The following files describe convergence and mixing:</p>


      <div class="variablelist"><table border="0" class="variablelist"><colgroup><col align="left" valign="top"><col></colgroup><tbody><tr><td><p><span class="term">partitions.bs</span></p></td><td>
	    <p>Confidence intervals on the support for partitions, generated
	      using a block bootstrap.</p>
	</td></tr><tr><td><p><span class="term">partitions.SRQ</span></p></td><td><p>A collection of
	      SRQ plots for the supported partitions.
	</p></td></tr><tr><td><p><span class="term">c50.SRQ</span></p></td><td><p>An
	      SRQ plot for the majority consensus tree.
	</p></td></tr></tbody></table></div>

      <p>The SRQ plots can be viewed by typing "<strong class="userinput"><code>plot
	  '<em class="replaceable"><code>file</code></em>' with lines</code></strong>" in
	<span class="application">gnuplot</span>.</p>

    </div>
      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp46"></a>5.4.2. <code class="filename">Mixing/partitions.bs</code>: partition mixing</h4></div></div></div>
	
	<p>
	  This file reports the quality of estimates of support for each
	  partition in terms of the posterior probability (PP) and
	  log-10 odds (LOD).  It also reports the auto-correlation time (ACT),
	  the effective sample size (Ne), the number of samples
	  that support (1) or do not support (0) the partition, and
	  the number of regenerations. 

	  Only partitions with PP &gt; 0.1 are shown by default.
	</p>
      </div>


  </div>
  </div>


  <div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="subst_models"></a>6. Substitution models</h2></div></div></div>
    
      
    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="dna_models"></a>6.1. DNA and RNA models</h3></div></div></div>
      
	
      <p>The default substitution model for DNA and RNA is tn93.</p>
      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp47"></a>6.1.1. Substitution rates</h4></div></div></div>
	
      <p>All the DNA models are special cases of the GTR model.  </p>
      <div class="informaltable">
	<table class="informaltable" border="1"><colgroup><col class="col1"><col class="col2"><col class="col3"></colgroup><thead><tr><th>Model</th><th>  d.f.  </th><th>Summary</th></tr></thead><tbody><tr><td><strong class="userinput"><code>jc69</code></strong></td><td>0</td><td><p>Equal rates and equal base frequencies.</p>
	      <a class="ulink" href="https://doi.org/10.1016/B978-1-4832-3211-9.50009-7" target="_top">(Jukes and Cantor, 1969)</a>
	      </td></tr><tr><td><strong class="userinput"><code>k80</code></strong></td><td>1</td><td><p>Unequal transition &amp; transversion rates, equal base frequencies.</p>
	      <a class="ulink" href="https://doi.org/10.1007%2FBF01731581" target="_top">(Kimura, 1980)</a>
	      </td></tr><tr><td><strong class="userinput"><code>f81</code></strong></td><td>3</td><td><p>Equal exchangabilities, unequal frequencies.</p>
	      <a class="ulink" href="https://doi.org/10.1007%2FBF01734359" target="_top">
		(Felsenstein, 1981)
	      </a>
	      </td></tr><tr><td><strong class="userinput"><code>hky85</code></strong></td><td>4</td><td><p>Unequal Transition &amp; transversion rates, unequal base frequencies.</p>
	      <a class="ulink" href="https://doi.org/10.1007/BF02101694" target="_top">
		(Hasegawa, Kishino, and Yano, 1985)
	      </a>
	      </td></tr><tr><td><strong class="userinput"><code>tn93</code></strong></td><td>5</td><td>
		<p>Unequal rates for transitions (purines), transitions (pyrimadies) and transversions, unequal base frequencies.</p>
		<a class="ulink" href="https://doi.org/10.1093/oxfordjournals.molbev.a040023" target="_top">
		  (Tamura and Nei, 1993)
		</a>
	      </td></tr><tr><td><strong class="userinput"><code>gtr</code></strong></td><td>8</td><td><p>Unequal exchangeabilities, unequal frequencies.</p>
	      <a class="ulink" href="http://www.damtp.cam.ac.uk/user/st321/CV_&amp;_Publications_files/STpapers-pdf/T86.pdf" target="_top">
		(Tavare, 1986)
	      </a>
	      </td></tr></tbody></table>
      </div>
      </div>
    <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="nucleotide-frequencies"></a>6.1.2. Frequencies</h4></div></div></div>
      
      <p>Frequencies are estimated by default.  Users can fix frequencies by setting the <strong class="userinput"><code>pi</code></strong> parameter to a constant value, if the model allows unequal frequencies.</p>
      <p>Constant frequencies are specified as a list of pairs that associates each letter with its frequency:</p>
      <pre class="programlisting">gtr[pi=List[Pair["A",0.1],Pair["C",0.2],Pair["T",0.3],Pair["G",0.4]]</pre>
      <p>Frequencies can also be specified using functions:</p>

      <p>
	</p><pre class="programlisting">gtr[pi=Frequencies.uniform]</pre><p>
      </p>

      <div class="informaltable">
	<table class="informaltable" border="1"><colgroup><col class="col1"><col class="col2"><col class="col3"></colgroup><thead><tr><th>Model</th><th>  d.f.  </th><th>Summary</th></tr></thead><tbody><tr><td><strong class="userinput"><code>Frequencies.uniform</code></strong></td><td>0</td><td>Equal frequencies</td></tr></tbody></table>
      </div>
    </div>


    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="protein_models"></a>6.2. Protein models</h3></div></div></div>
      

      <p>The default substitution model for proteins is lg08.</p>

      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp48"></a>6.2.1. Substitution rates</h4></div></div></div>
	
      <div class="informaltable">
	  
	<table class="informaltable" border="1"><colgroup><col class="col1"><col class="col2"><col class="col3"></colgroup><thead><tr><th>Model</th><th>  d.f.  </th><th>Summary</th></tr></thead><tbody><tr><td><strong class="userinput"><code>jc69</code></strong></td><td>0</td><td><p>Equal rates and equal frequencies.</p>
	      <a class="ulink" href="https://doi.org/10.1016/B978-1-4832-3211-9.50009-7" target="_top">(Jukes and Cantor, 1969)</a></td></tr><tr><td><strong class="userinput"><code>f81</code></strong></td><td>19</td><td><p>Equal exchangabilities, unequal frequencies.</p>
	      <a class="ulink" href="https://doi.org/10.1007%2FBF01734359" target="_top">
		(Felsenstein, 1981)
	      </a>
	      </td></tr><tr><td>
		<p><strong class="userinput"><code>jtt+f</code></strong></p>
	      </td><td>19</td><td>
		<p>Empirical exchange rates, all proteins.</p>
		<a class="ulink" href="https://doi.org/10.1007/BF02101694" target="_top">
		  (Jones, Taylor, Thornton 1992)
		</a>
	      </td></tr><tr><td>
		<p><strong class="userinput"><code>wag+f</code></strong></p>
	      </td><td>19</td><td>
		<p>Empirical exchange rates, all proteins.</p>
		<a class="ulink" href="https://doi.org/10.1093/oxfordjournals.molbev.a003851" target="_top">
		  Whelan and Goldman (2001)
		</a>
	      </td></tr><tr><td>
		<p><strong class="userinput"><code>lg08+f</code></strong></p>
	      </td><td>19</td><td>
		<p>Empirical exchange rates, all proteins.</p>
		<a class="ulink" href="https://doi.org/10.1093/molbev/msn067" target="_top">
		  <p>Le and Gascuel (2008)</p>
		</a>
	      </td></tr><tr><td>
		<p><strong class="userinput"><code>empirical[<em class="replaceable"><code>file</code></em>]+f</code></strong></p>
	      </td><td>19</td><td>
	      </td></tr><tr><td><strong class="userinput"><code>gtr</code></strong></td><td>208</td><td><p>Unequal exchangeabilities, unequal frequencies.</p>
	      <a class="ulink" href="http://www.damtp.cam.ac.uk/user/st321/CV_&amp;_Publications_files/STpapers-pdf/T86.pdf" target="_top">
		(Tavare, 1986)
	      </a>
	      </td></tr></tbody></table>
      </div>
      </div>
    <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="amino-acid-frequencies"></a>6.2.2. Frequencies</h4></div></div></div>
      
      <p>Frequencies are estimated by default.  Users can fix frequencies by setting the <strong class="userinput"><code>pi</code></strong> parameter to a constant value, if the model allows unequal frequencies.</p>
      <p>Constant frequencies are specified as a list of pairs that associates each letter with its frequency:</p>
      <pre class="programlisting">wag+f[pi=List[Pair["A",0.047],Pair["R",0.19],...]]</pre>
      <p>Frequencies can also be specified using functions:</p>
      <p>
	</p><pre class="programlisting">wag+f[pi=Frequencies.uniform]]</pre><p>
      </p>

      <div class="informaltable">
	<table class="informaltable" border="1"><colgroup><col class="col1"><col class="col2"><col class="col3"></colgroup><thead><tr><th>Model</th><th>  d.f.  </th><th>Summary</th></tr></thead><tbody><tr><td><strong class="userinput"><code>Frequencies.uniform</code></strong></td><td>0</td><td>Equal frequencies</td></tr><tr><td><strong class="userinput"><code>wag_freq</code></strong></td><td>0</td><td>The constant amino-acid frequencies from the WAG paper.</td></tr><tr><td><strong class="userinput"><code>lg08_freq</code></strong></td><td>0</td><td>The constant amino-acid frequencies from the LG08 paper.</td></tr></tbody></table>
      </div>
	  
      <p>The <strong class="userinput"><code>+fe</code></strong> model is shorthand for <strong class="userinput"><code>+f[pi=Frequencies.uniform]</code></strong>:</p>
      <p>
	</p><pre class="programlisting">wag+fe</pre><p>
      </p>
    </div>


    </div>
      
    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="doublet_models"></a>6.3. Doublet models (RNA stems)</h3></div></div></div>
      
      <p>The doublets alphabet consists of 16 RNA dinucleotides.  It is used to model RNA stems, where two nucleotides matched in the RNA second structure are highly correlated.</p>
      <p>The default substitution model for doublets is <strong class="userinput"><code>tn93_sym+x2_sym+f</code></strong>.</p>
      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="doublet-data"></a>6.3.1. Doublet data</h4></div></div></div>
	
	<p>As of version 3.4, BAli-Phy does not yet allow specifying which nucleotides are paired either with a string like <strong class="userinput"><code>((.))</code></strong> or with a "pairs" file.  Instead you must manually extract the paired nucleotides and put them in their own (for stems), and then manually extract each loop and put it in its own partition.</p>
	<p>The stems should be arranged so that paired nucleotides are adjacent. For example, suppose the sequence <strong class="userinput"><code>AGGCT</code></strong> was paired according to <strong class="userinput"><code>((.))</code></strong>.  Then the input file for the stems should contain a sequence of doublets that looks like <strong class="userinput"><code>ATGC</code></strong>, where <strong class="userinput"><code>AT</code></strong> is the first pair, and <strong class="userinput"><code>GC</code></strong> is the second pair.  Later versions of the software should allow extracting stems and loops from nucleotide sequences using parenthesis notation or a "pairs" file.
	</p>
      </div>
      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp49"></a>6.3.2. Substitution rates</h4></div></div></div>
	
      <div class="informaltable">

	<table class="informaltable" border="1"><colgroup><col class="col1"><col class="col2"><col class="col3"></colgroup><thead><tr><th>Model</th><th>  d.f.  </th><th>Summary</th></tr></thead><tbody><tr><td><strong class="userinput"><code><em class="replaceable"><code>nuc_model</code></em>+x2_sym+f</code></strong></td><td>df(nuc_model)+15</td><td><p>This model has separate frequencies for each dinucleotide.</p>  <p>Both dinucleotide letters cannot change simultaneously, so double-changes must proceed through a possibly mismatched intermediate.</p>
	      </td></tr><tr><td><strong class="userinput"><code><em class="replaceable"><code>nuc_model</code></em>+x2</code></strong></td><td>df(nuc_model)</td><td><p>Dinucleotide frequencies are the product of independent nucleotide frequencies.</p>
	      <p>This model should give the same likelihood as <em class="replaceable"><code>nuc_model</code></em>.</p>
	      </td></tr><tr><td><strong class="userinput"><code><em class="replaceable"><code>RNA.m16a</code></em></code></strong></td><td>19</td><td>
		<p>This model has separate frequencies for each dinucleotide, and distinguishes between transitions and transversion between match states (including GU/UG).</p>
		<p>Simultaneous changes of both letters <span class="emphasis"><em>are</em></span> are allowed, but only between match states.</p>
	      <a class="ulink" href="https://www.ncbi.nlm.nih.gov/pubmed/11139520" target="_top">
		(Savill, N. et al 2001)
	      </a>
	      </td></tr><tr><td><strong class="userinput"><code>x2x2[q1,q2]</code></strong></td><td>df(q1)+df(q2)</td><td><p>Doublet rate matrix constructed from a nucleotide rate matrix for each doublet position.</p>
	      </td></tr><tr><td><strong class="userinput"><code>gtr</code></strong></td><td>134</td><td><p>Unequal exchangeabilities, unequal frequencies.</p>
	      <p>It is unlikely that you would want to use this model, since it has so many parameters.</p>
	      <a class="ulink" href="http://www.damtp.cam.ac.uk/user/st321/CV_&amp;_Publications_files/STpapers-pdf/T86.pdf" target="_top">
		(Tavare, 1986)
	      </a>
	      </td></tr></tbody></table>
      </div>
      </div>
    <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="doublet-frequencies"></a>6.3.3. Frequencies</h4></div></div></div>
      
      <p>Frequencies are estimated by default.  Users can fix frequencies by setting the <strong class="userinput"><code>pi</code></strong> parameter to a constant value, if the model allows unequal frequencies.</p>
      <p>Constant frequencies are specified as a list of pairs that associates each letter with its frequency.</p>
      <pre class="programlisting">hky85[pi=List[Pair["A",0.1],Pair["C",0.2],Pair["T",0.3],Pair["G",0.4]]]+x2

hky85_sym+x2_sym+f[pi=List[Pair["AA",0.01],Pair["AC",0.01],Pair["AG",0.01],Pair["AU",0.22],Pair["CA",0.01],Pair["CC",0.01],Pair["CG",0.22],Pair["CU",0.01],Pair["GA",0.01],Pair["GC",0.22],Pair["GG",0.01],Pair["GU",0.01],Pair["UA",0.22],Pair["UC",0.01],Pair["UG",0.01],Pair["UU",0.01]]]</pre>
      <p>Frequencies can also be specified using functions:</p>
      


      <div class="informaltable">
	<table class="informaltable" border="1"><colgroup><col class="col1"><col class="col2"><col class="col3"></colgroup><thead><tr><th>Model</th><th>  d.f.  </th><th>Summary</th></tr></thead><tbody><tr><td><strong class="userinput"><code>Frequencies.uniform</code></strong></td><td>0</td><td>Equal frequencies on dinucleotides</td></tr><tr><td><strong class="userinput"><code>f2x4</code></strong></td><td>6</td><td>Constructs triplet frequencies from independent nucleotide frequencies for each codon position.</td></tr></tbody></table>
      </div>
    </div>

    <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="doublet-branch-lengths"></a>6.3.4. Branch lengths</h4></div></div></div>
      
      <p>BAli-Phy interprets branch lengths for doublet models as 1/2 the number of substitutions per doublet.  Thus, they should be comparable to branch lengths under DNA/RNA nucleotide models.</p>
    </div>
      

  </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="triplet_models"></a>6.4. Triplet models</h3></div></div></div>
      
      <p>The triplets alphabet is similar to the codons alphabet, except that stop codons are not removed. Unlike the codons alphabet, the triplets alphabet has no knowledge of the genetic code.</p>
      <p>The default substitution model for triplets is tn93+x3.</p>
      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp50"></a>6.4.1. Substitution rates</h4></div></div></div>
	
      <div class="informaltable">

	<table class="informaltable" border="1"><colgroup><col class="col1"><col class="col2"><col class="col3"></colgroup><thead><tr><th>Model</th><th>  d.f.  </th><th>Summary</th></tr></thead><tbody><tr><td><strong class="userinput"><code><em class="replaceable"><code>nuc_model</code></em>+x3_sym+f</code></strong></td><td>df(nuc_model)+63</td><td><p>GY94-style rate matrix constructed from nucleotide exchangability matrix.</p>
	      </td></tr><tr><td><strong class="userinput"><code><em class="replaceable"><code>nuc_model</code></em>+x3</code></strong></td><td>df(nuc_model)</td><td><p>MG94-style rate matrix constructed from nucleotide rate matrix.</p>
	             <p>This model should give the same likelihood as <em class="replaceable"><code>nuc_model</code></em> on triplets, but not on codons.</p>
	      </td></tr><tr><td><strong class="userinput"><code>x3x3[q1,q2,q3]</code></strong></td><td>df(q1)+df(q2)+df(q3)</td><td><p>Triplet rate matrix constructed from a nucleotide rate matrix for each codon position.</p>
	      </td></tr></tbody></table>
      </div>
      </div>
    <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="triplet-frequencies"></a>6.4.2. Frequencies</h4></div></div></div>
      
      <p>Frequencies are estimated by default.  Users can fix frequencies by setting the <strong class="userinput"><code>pi</code></strong> parameter to a constant value, if the model allows unequal frequencies.</p>
      <p>Constant frequencies are specified as a list of pairs that associates each letter with its frequency.</p>
      <pre class="programlisting">hky85[pi=List[Pair["A",0.1],Pair["C",0.2],Pair["T",0.3],Pair["G",0.4]]]+x3</pre>
      <p>Frequencies can also be specified using functions:</p>
      <p>
	</p><pre class="programlisting">hky85_sym+x3_sym+f[pi=f1x4]            // nucleotide frequencies are estimated</pre><p>
      </p>


      <div class="informaltable">
	<table class="informaltable" border="1"><colgroup><col class="col1"><col class="col2"><col class="col3"></colgroup><thead><tr><th>Model</th><th>  d.f.  </th><th>Summary</th></tr></thead><tbody><tr><td><strong class="userinput"><code>Frequencies.uniform</code></strong></td><td>0</td><td>Equal frequencies</td></tr><tr><td><strong class="userinput"><code>f1x4</code></strong></td><td>3</td><td>Constructs triplet frequencies from independent nucleotide frequencies.</td></tr><tr><td><strong class="userinput"><code>f3x4</code></strong></td><td>9</td><td>Constructs triplet frequencies from independent nucleotide frequencies for each codon position.</td></tr></tbody></table>
      </div>
	<p>The <strong class="userinput"><code>+fe</code></strong> model is shorthand for <strong class="userinput"><code>+f[pi=Frequencies.uniform]</code></strong>:</p>
	<p>
	  </p><pre class="programlisting">hky85_sym+x3_sym+fe</pre><p>
	</p>
    </div>

      <p>BAli-Phy interprets branch lengths for codon models as 1/3 the number of substitutions per triplet.  Thus, they should be comparable to branch lengths under DNA/RNA nucleotide models.</p>
      

    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="codon_models"></a>6.5. Codon models</h3></div></div></div>
      

      <p>The default substitution model for codons is gy94.</p>

      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp51"></a>6.5.1. Substitution rates</h4></div></div></div>
	
	
	<div class="informaltable">
	  
	  <table class="informaltable" border="1"><colgroup><col class="col1"><col class="col2"><col class="col3"></colgroup><thead><tr><th>Model</th><th>  d.f.  </th><th>Summary</th></tr></thead><tbody><tr><td><strong class="userinput"><code>gy94</code></strong></td><td>62</td><td><p>Model of dN/dS with a separate frequency for each codon.</p>
		<p>Rate for changing a nucleotide depends on neighboring nucleotides.</p>
		<a class="ulink" href="http://www.genetics.org/content/148/3/929.short" target="_top">
		  (Goldman and Yang, 1994)
		</a>
		</td></tr><tr><td><strong class="userinput"><code>gy94[pi=f1x4]</code></strong></td><td>5</td><td><p>The GY94 model with codon frequencies constructed from nucleotide frequencies.</p>
		<a class="ulink" href="http://www.genetics.org/content/148/3/929.short" target="_top">
		  (Goldman and Yang, 1994)
		</a>
		</td></tr><tr><td><strong class="userinput"><code>gy94[pi=f3x4]</code></strong></td><td>11</td><td><p>The GY94 model with codon frequencies constructed from nucleotide frequencies for each codon position.</p>
		<a class="ulink" href="http://www.genetics.org/content/148/3/929.short" target="_top">
		  (Goldman and Yang, 1994)
		</a>
		</td></tr><tr><td><strong class="userinput"><code>gy94_ext[<em class="replaceable"><code>nuc_model</code></em>]</code></strong></td><td>df(<em class="replaceable"><code>nuc_model</code></em>)+61</td><td><p>GY94 model extended with a generic nucleotide exchangeability matrix.</p>
		<a class="ulink" href="http://www.genetics.org/content/148/3/929.short" target="_top">
		  (Goldman and Yang, 1994)
		</a>
		</td></tr><tr><td><strong class="userinput"><code>mg94</code></strong></td><td>4</td><td>
		  <p>Model of dN/dS with f81 as the neutral model.</p>
		  <p>Rate for changing a nucleotide depends only on that nucleotide.</p>
		  <a class="ulink" href="https://academic.oup.com/mbe/article/11/5/715/1008710" target="_top">
		    (Muse and Gaut, 1994)
		  </a>
		</td></tr><tr><td><strong class="userinput"><code>mg94k</code></strong></td><td>5</td><td><p>Model of dN/dS with hky85 as the neutral model.</p>
		<a class="ulink" href="https://academic.oup.com/mbe/article/11/5/715/1008710" target="_top">
		  (Muse and Gaut, 1994)
		</a>
		</td></tr><tr><td><strong class="userinput"><code>mg94_ext[<em class="replaceable"><code>nuc_model</code></em>]</code></strong></td><td>df(<em class="replaceable"><code>nuc_model</code></em>)+1</td><td><p>Model of dN/dS with <em class="replaceable"><code>nuc_model</code></em> as the neutral model.</p>
		<a class="ulink" href="https://academic.oup.com/mbe/article/11/5/715/1008710" target="_top">
		  (Muse and Gaut, 1994)
		</a>
		</td></tr><tr><td><strong class="userinput"><code>fMutSel</code></strong></td><td>65</td><td><p>MG94-like model with fitnesses for each codon.</p>
		<a class="ulink" href="https://academic.oup.com/mbe/article/11/5/715/1008710" target="_top">
		  (Yang and Nielsen, 2008)
		</a>
		</td></tr><tr><td><strong class="userinput"><code>fMutSel0</code></strong></td><td>24</td><td><p>MG94-like model with fitnesses for each amino-acid.</p>
		<a class="ulink" href="https://academic.oup.com/mbe/article/11/5/715/1008710" target="_top">
		  (Yang and Nielsen, 2008)
		</a>
		</td></tr><tr><td><strong class="userinput"><code><em class="replaceable"><code>nuc_model</code></em>+x3_sym+f</code></strong></td><td>df(nuc_model)+60</td><td><p>GY94-style rate matrix constructed from nucleotide exchangability matrix (dN/dS = 1).</p>
	               <p>This model should give the same likelihood as <em class="replaceable"><code>nuc_model</code></em> on codons only if the frequency of stop codons is zero.</p>
		</td></tr><tr><td><strong class="userinput"><code><em class="replaceable"><code>nuc_model</code></em>+x3</code></strong></td><td>df(nuc_model)</td><td><p>MG94-style rate matrix constructed from nucleotide rate matrix (dN/dS = 1).</p>
		</td></tr><tr><td><strong class="userinput"><code>x3x3[q1,q2,q3]</code></strong></td><td>df(q1)+df(q2)+df(q3)</td><td><p>Triplet rate matrix constructed from a nucleotide rate matrix for each codon position (dN/dS = 1).</p>
		</td></tr><tr><td><strong class="userinput"><code><em class="replaceable"><code>codon_model</code></em>+dNdS[omega]</code></strong></td><td>df(<em class="replaceable"><code>codon_model</code></em>)+1</td><td><p>Scales non-synonymous rates by <em class="replaceable"><code>omega</code></em>.</p>
		</td></tr></tbody></table>
	</div>


	<p>BAli-Phy interprets branch lengths for codon models as 1/3 of the number of substitutions per codon.  Thus, they should be comparable to branch lengths under DNA/RNA models.</p>
	<p>The <strong class="userinput"><code>x3</code></strong>, <strong class="userinput"><code>x3_sym</code></strong>, <strong class="userinput"><code>x3x3</code></strong>, and <strong class="userinput"><code>dNdS</code></strong> models
	can be used to build up codon models piecewise:
	</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><strong class="userinput"><code>mg94</code></strong> is equivalent to <strong class="userinput"><code>f81+x3+dNdS</code></strong>.</li><li class="listitem"><strong class="userinput"><code>gy94</code></strong> is equivalent to <strong class="userinput"><code>hky85_sym+x3_sym+f+dNdS</code></strong>.</li></ul></div><p>
	</p>
      </div>

      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="codon-frequencies"></a>6.5.2. Frequencies</h4></div></div></div>
      
      <p>Frequencies are estimated by default.  Users can fix frequencies by setting the <strong class="userinput"><code>pi</code></strong> parameter to a constant value, if the model allows unequal frequencies.</p>
      <p>Constant frequencies are specified as a list of pairs that associates each letter with its frequency.</p>
      <pre class="programlisting">gy94[pi=List[Pair["AAA",0.01],Pair["C",0.02],...]]
mg94[pi=List[Pair["A",0.1],Pair["C",0.2],Pair["T",0.3],Pair["G",0.4]]]
</pre>
      <p>Frequencies can also be specified using functions:</p>
      <p>
	</p><pre class="programlisting">gy94[pi=f1x4]              // nucleotide frequencies are estimated</pre><p>
      </p>


      <div class="informaltable">
	<table class="informaltable" border="1"><colgroup><col class="col1"><col class="col2"><col class="col3"></colgroup><thead><tr><th>Model</th><th>  d.f.  </th><th>Summary</th></tr></thead><tbody><tr><td><strong class="userinput"><code>Frequencies.uniform</code></strong></td><td>0</td><td>Equal frequencies</td></tr><tr><td><strong class="userinput"><code>f1x4</code></strong></td><td>3</td><td>Constructs codon frequencies from independent nucleotide frequencies.</td></tr><tr><td><strong class="userinput"><code>f3x4</code></strong></td><td>9</td><td>Constructs codon frequencies from independent nucleotide frequencies for each codon position.</td></tr></tbody></table>
      </div>
    </div>

      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="genetic-codes"></a>6.5.3. Genetic Codes</h4></div></div></div>
	
      
	<p>When using a codon-based substitution model like <strong class="userinput"><code>gy94</code></strong>, you may select the genetic code by specifying <strong class="userinput"><code>--alphabet Codons[,<em class="replaceable"><code>genetic-code</code></em>]</code></strong>.  Available genetic codes are <strong class="userinput"><code>standard</code></strong>, <strong class="userinput"><code>mt-vert</code></strong>, <strong class="userinput"><code>mt-invert</code></strong>, <strong class="userinput"><code>mt-yeast</code></strong>, <strong class="userinput"><code>mt-protozoan</code></strong>.</p>
	<p>If the genetic code is not specified, then the standard code is used:
</p><pre class="screen"><code class="prompt">%</code> bali-phy <em class="replaceable"><code>sequence-file</code></em> --smodel gy94 --alphabet Codons
<code class="prompt">%</code> bali-phy <em class="replaceable"><code>sequence-file</code></em> --smodel gy94 --alphabet Codons[RNA]</pre><p>
	These examples specify the vertebrate mitochondrial code:
	</p><pre class="screen"><code class="prompt">%</code> bali-phy <em class="replaceable"><code>sequence-file</code></em> --smodel gy94 --alphabet Codons[DNA,mt-vert]
<code class="prompt">%</code> bali-phy <em class="replaceable"><code>sequence-file</code></em> --smodel gy94 --alphabet Codons[,mt-vert]</pre><p>	
</p>
      </div>
      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp52"></a>6.5.4. Heterogeneous dN/dS and tests for positive selection</h4></div></div></div>
	
	<div class="informaltable">
	  <table class="informaltable" border="1"><colgroup><col><col><col></colgroup><thead><tr><th>Model</th><th>  d.f.  </th><th>Summary</th></tr></thead><tbody><tr><td>
		  <p><strong class="userinput"><code>m1a</code></strong></p>
		</td><td>df(<em class="replaceable"><code>submodel</code></em>)+2</td><td>
		  <p>A mixture of conserved and neutral sites.</p>
		  <a class="ulink" href="https://doi.org/10.1534/genetics.104.031153" target="_top">
		    (Wong, et. al. 2004)
		  </a>
		</td></tr><tr><td>
		  <p><strong class="userinput"><code>m2a</code></strong></p>
		</td><td>df(<em class="replaceable"><code>submodel</code></em>)+4</td><td>
		  <p>A mixture of conserved, neutral, and positively-selected sites.</p>
		  <a class="ulink" href="https://doi.org/10.1534/genetics.104.031153" target="_top">
		    (Wong, et. al. 2004)
		  </a>
		</td></tr><tr><td><strong class="userinput"><code>m2a_test</code></strong></td><td>df(<em class="replaceable"><code>submodel</code></em>)+4</td><td>
		  <p>A Bayesian test for positive selection that compares M2a with M1a.</p>
		  <a class="ulink" href="https://doi.org/10.1534/genetics.104.031153" target="_top">
		    (Wong, et. al. 2004)
		  </a>
		</td></tr><tr><td><strong class="userinput"><code>m3</code></strong></td><td>df(<em class="replaceable"><code>submodel</code></em>)+2*<em class="replaceable"><code>n</code></em>-1</td><td><p>An free mixture of <em class="replaceable"><code>n</code></em> categories of conserved dN/dS values.</p>
		<a class="ulink" href="http://www.genetics.org/content/155/1/431.full" target="_top">
		  (Yang et. al. 2000)
		</a>
		</td></tr><tr><td>
		  <p><strong class="userinput"><code>m3_test</code></strong></p>
		</td><td>df(<em class="replaceable"><code>submodel</code></em>)+2*<em class="replaceable"><code>n</code></em>+1</td><td>
		  <p>A Bayesian test for positive selection based on the M3 model extended with an extra category of either neutral of positively-selected sites.</p>
		</td></tr><tr><td><p><strong class="userinput"><code>m7</code></strong></p></td><td>df(<em class="replaceable"><code>submodel</code></em>)+2</td><td><p>The M7 model places a beta distribution on dN/dS.</p>
		<a class="ulink" href="http://www.genetics.org/content/155/1/431.full" target="_top">
		  (Yang et. al. 2000)
		</a>
		</td></tr><tr><td><strong class="userinput"><code>m8a</code></strong></td><td>df(<em class="replaceable"><code>submodel</code></em>)+3</td><td><p>The M8a model adds a category of <span class="emphasis"><em>neutral</em></span> sites to the M7 model.</p>
		<a class="ulink" href="https://doi.org/10.1093/oxfordjournals.molbev.a004233" target="_top">
		  (Swanson et. al. 2003)
		</a>
		</td></tr><tr><td><strong class="userinput"><code>m8</code></strong></td><td>df(<em class="replaceable"><code>submodel</code></em>)+4</td><td><p>The M8 model adds a category of <span class="emphasis"><em>positively-selected</em></span> sites to the M7 model.</p>
		<a class="ulink" href="http://www.genetics.org/content/155/1/431.full" target="_top">
		  (Yang et. al. 2000)
		</a>
		</td></tr><tr><td><strong class="userinput"><code>m8a_test</code></strong></td><td>df(<em class="replaceable"><code>submodel</code></em>)+4</td><td><p>A Bayesian test for positive selection that compares the M8 to the M8a model.</p>
		<a class="ulink" href="https://doi.org/10.1093/oxfordjournals.molbev.a004233" target="_top">
		  (Swanson et. al. 2003)
		</a>
		</td></tr><tr><td><strong class="userinput"><code>branch_site</code></strong></td><td>df(<em class="replaceable"><code>submodel</code></em>)+4</td><td><p>A Bayesian test for positive selection that on some (unknown) sites and some (known) branches.</p>
		<a class="ulink" href="https://doi.org/10.1093/oxfordjournals.molbev.a004233" target="_top">
		  (Zhang et. al 2005)
		</a>
		</td></tr></tbody></table>
	</div>
      </div>

      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp53"></a>6.5.5. The branch-site substitution model</h4></div></div></div>
      <p>In order to use the branch-site substitution model, the user needs to specify an unrooted tree topology and disable topology changes in order to keep the topology fixed:
    </p><pre class="screen"><code class="prompt">%</code> bali-phy <em class="replaceable"><code>alignment</code></em>.fasta -S branch_site -T <em class="replaceable"><code>tree</code></em>.tree --disable=topology</pre><p>
    The tree file should be in Newick format, with foreground branches labelled using NHX attributes. The NHX attribute must be applied to the branch, not the node, so it must occur after a colon.

</p><div class="example"><a name="idp1691"></a><p class="title"><b>Example 1. An tree with a foreground branch</b></p><div class="example-contents">
(((A1, B1),(C1, D1)),((E1:[&amp;&amp;NHX:foreground=1], F1:[&amp;&amp;NHX:foreground=1]),(G1, H1)),(((A2, B2),(C2, D2)),((E2, F2),(G2, H2))));
</div></div><p><br class="example-break">
      </p>
      <p>The posterior probability of positive selection is the posterior mean of the posSelection parameter.  This may be computed using the statreport program with the <strong class="userinput"><code>--mean</code></strong> option. In case this probability is extremely close to 1 or 0, you may wish to add the option <strong class="userinput"><code>--Rao-Blackwellize branch_site:posSelection</code></strong>.  This will report the log-probability of positive selection each iteration.  The user may exponentiate the reported values and then average them (using R, for example) in order to compute a more accurate estimate of the posterior probability of positive selection.
      </p>
      </div>

    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp54"></a>6.6. Heterogenous Rates across Sites</h3></div></div></div>
    
    <p>
      Complex substitution models in <span class="application">BAli-Phy</span>
      are constructed as mixtures of reversible CTMC models that run at different rates (e.g. <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><msub><mi>&#915;</mi> <mn>4</mn></msub><mo>+</mo><mi>INV</mi></math>)
      or have different parameters (e.g. an M2a codon model).
    </p>

      <div class="informaltable">

      <table class="informaltable" border="1"><colgroup><col><col><col></colgroup><thead><tr><th>Model</th><th>  d.f.  </th><th>Summary</th></tr></thead><tbody><tr><td><strong class="userinput"><code><em class="replaceable"><code>submodel</code></em> + Rates.gamma</code></strong></td><td>df(<em class="replaceable"><code>submodel</code></em>)+1</td><td><p>Site rates follow a discrete approximation to the Gamma distribution</p>
		  <a class="ulink" href="https://doi.org/10.1007/BF00160154" target="_top">
		    (Yang 1994)
		  </a>
	      </td></tr><tr><td>
		<p><strong class="userinput"><code><em class="replaceable"><code>submodel</code></em> + Rates.log_normal</code></strong></p>
	      </td><td>df(<em class="replaceable"><code>submodel</code></em>)+1</td><td><p>Site rates follow a discrete approximation to the logNormal distribution</p>
	      </td></tr><tr><td><p><strong class="userinput"><code><em class="replaceable"><code>submodel</code></em> + Rates.free</code></strong></p></td><td>df(<em class="replaceable"><code>submodel</code></em>)+2(<em class="replaceable"><code>n</code></em>-1)</td><td><p>Sites fall in one of <em class="replaceable"><code>n</code></em> categories. Each category has its own rate.</p>
		  <a class="ulink" href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1206396" target="_top">
		    (Yang 1995)
		  </a>
	      </td></tr><tr><td><strong class="userinput"><code><em class="replaceable"><code>submodel</code></em> + multi_rate[<em class="replaceable"><code>dist</code></em>]</code></strong></td><td>df(<em class="replaceable"><code>submodel</code></em>)+df(<em class="replaceable"><code>dist</code></em>)</td><td><p>Sites rates follow a discrete approximation to the distribution <em class="replaceable"><code>dist</code></em>.</p></td></tr><tr><td><strong class="userinput"><code><em class="replaceable"><code>submodel</code></em> + inv</code></strong></td><td>df(<em class="replaceable"><code>submodel</code></em>)+1</td><td><p>Some fraction inv:p_inv of sites are invariable.</p>
	      </td></tr></tbody></table>
      </div>

    </div>
      </div>


    <div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="indel_models"></a>7. Insertion/deletion models</h2></div></div></div>
      
    <p>Each of these models is a probability distribution on pairwise alignments.  The probability distribution on multiple sequence alignments <math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><mi>Pr</mi><mo stretchy="false">(</mo><mi>A</mi><mo stretchy="false">&#8739;</mo><mi>T</mi><mo>,</mo><mi>&#964;</mi><mo>,</mo><mi>&#923;</mi><mo stretchy="false">)</mo></math> is constructed by factoring the multiple sequence alignment into pairwise alignments along each branch of the tree, as described in Redelings and Suchard (2005).</p>

	<p>The default insertion/deletion model is <strong class="userinput"><code>rs07</code></strong>.</p>
	
    <div class="informaltable"><span style="color: red">&lt;title&gt;Insertion-Deletion Models&lt;/title&gt;</span>
	  
	  <table class="informaltable" summary="Insertion-Deletion Models" border="1"><colgroup><col class="col1"><col class="col2"><col class="col3"></colgroup><thead><tr><th>Model</th><th>  d.f.  </th><th>Summary</th></tr></thead><tbody><tr><td><strong class="userinput"><code>rs05</code></strong></td><td>3</td><td>
		  <p>A symmetric insertion-deletion model with geometrically-distributed indel lengths.</p>
		  <p>Indels occur on all branches with the same probability, regardless of branch length.</p>
		  <a class="ulink" href="https://doi.org/10.1080/10635150590947041" target="_top">
		    (Redelings and Suchard, 2005)
		  </a>
		</td></tr><tr><td><strong class="userinput"><code>rs07</code></strong></td><td>2</td><td>
		  <p>A symmetric insertion-deletion model with geometrically-distributed indel lengths.</p>
		  <p>Longer branches have more indels.</p>
		  <a class="ulink" href="https://doi.org/10.1186/1471-2148-7-40" target="_top">
		    (Redelings and Suchard, 2007)
		  </a>
		</td></tr><tr><td>
		  <p><strong class="userinput"><code>none</code></strong></p>
		</td><td> </td><td>
		  <p>No indel model for the partition, indels uninformative.</p>
		  <p>Fixed alignment for the partition.</p>
		</td></tr></tbody></table>
	</div>
	    
	<p>The user can specify priors and parameters for indel models (See section <a class="xref" href="#functions" title="8. Models and Priors">Section 8, &#8220;Models and Priors&#8221;</a>):
</p><pre class="programlisting">rs07[log_rate~log_laplace[-4,0.707],mean_length=2]</pre><p>
	</p>

    </div>

  <div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="functions"></a>8. Models and Priors</h2></div></div></div>

  <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp55"></a>8.1. Models and distributions are functions</h3></div></div></div>
  <p>Models, probability distributions, and functions are treated the same in BAli-Phy because all of them have parameters or arguments. Parameters have names in BAli-Phy.  Parameter values are specified using square brackets as follows:
  </p><pre class="programlisting">hky85[kappa=2]            // model
log[x=2]                  // function
normal[mean=0,sigma=1]    // probability distribution</pre><p>
It is possible to specify parameter values by position instead of by name:
  </p><pre class="programlisting">hky85[2]
log[2]
normal[0,1]</pre><p>
It is even possible to mix positional and named arguments, as long as all the positional arguments come before all the named arguments:
</p><pre class="programlisting">normal[0,sigma=1]   // OK
normal[mean=0,1]    // not OK</pre><p>

  The order and type of parameters for a function can be found with the <strong class="userinput"><code>help</code></strong> command.  For example,
</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy help hky85</code></strong></pre><p>
A value must be given for each parameter, unless the parameter has a default value (See <a class="xref" href="#default_values" title="8.4. Default values and default priors">Section 8.4, &#8220;Default values and default priors&#8221;</a>).
  </p>
	<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3>
	  <p>
	    If you are using the <span class="command"><strong>csh</strong></span>, <span class="command"><strong>tcsh</strong></span>, or <span class="command"><strong>zsh</strong></span> instead of the <span class="command"><strong>bash</strong></span> shell, then you need to put single quotes around terms with square brackets on the command-line:
</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy file.fasta -S 'hky85[kappa=2]'</code></strong></pre><p>
	  </p>

	  <p>If you do not add quotes, you will get an error message like "<code class="computeroutput">bali-phy: No match.</code>" (for <span class="command"><strong>csh</strong></span>) or "<code class="computeroutput">zsh: no matches found: hky85[kappa=2]</code>" (for <span class="command"><strong>zsh</strong></span>).
	  </p>
	</div>

  </div>

  <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp56"></a>8.2. Models and '<strong class="userinput"><code>+</code></strong>' notation</h3></div></div></div>
  <p>Models in phylogenetics literature are often combined using <strong class="userinput"><code>+</code></strong>. For example, the model <strong class="userinput"><code>WAG+F+G4+I</code></strong> starts with the WAG amino-acid model, and places several modifiers, like "+G4" on the right.</p>
  <p>BAli-Phy follows this convention by treating <strong class="userinput"><code>A+B</code></strong> as an abbreviation for <strong class="userinput"><code>B[submodel=A]</code></strong>.  When there are multiple '<strong class="userinput"><code>+</code></strong>' symbols they associate to the left, so that <strong class="userinput"><code>A+B+C</code></strong> is understood to mean <strong class="userinput"><code>(A+B)+C</code></strong>.  For example:
</p><pre class="programlisting">hky85+Rates.gamma        // rewritten to Rates.gamma[submodel=hky85]
hky85+inv                // rewritten to inv[submodel=hky85]
wag+f                    // rewritten to f[submodel=wag]
wag+f+Rates.gamma+inv    // rewritten to inv[submodel=Rates.gamma[submodel=f[submodel=wag]]]
</pre><p>
This allows a simple method for combining models, when one model is an argument to another model.
</p>
  </div>

  <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="priors"></a>8.3. Priors</h3></div></div></div>
  <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp57"></a>8.3.1. Specifying priors</h4></div></div></div>
<p>Priors on model parameters are specified by giving a random value. Random values can be obtained from distributions using the function <strong class="userinput"><code>sample</code></strong>.  For example, this places a log-normal prior on the parameter <strong class="userinput"><code>kappa</code></strong> of the <strong class="userinput"><code>hky85</code></strong> model:
</p><pre class="programlisting">hky85[kappa=sample[log_normal[1,1]]]</pre><p>
You can write <strong class="userinput"><code>~Dist</code></strong> as a shorthand for <strong class="userinput"><code>sample[Dist]</code></strong>:
</p><pre class="programlisting">hky85[kappa=~log_normal[1,1]]</pre><p>
The <strong class="userinput"><code>=~</code></strong> can be further shortened to just <strong class="userinput"><code>~</code></strong>:
</p><pre class="programlisting">hky85[kappa~log_normal[1,1]]</pre><p>
</p>
</div>

  <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp58"></a>8.3.2. Random function arguments</h4></div></div></div>
<p>It also is possible to use random values as inputs to other functions.  For example:
</p><pre class="programlisting">add[1,~exponential[10]]</pre><p>
In such cases the parameter value should be specified with <strong class="userinput"><code>=</code></strong>, as in the following example:
</p><pre class="programlisting">rs07[mean_length=add[1,~exponential[10]]]]</pre><p>
</p>
  </div>
<div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp59"></a>8.3.3. Distributions are not random values</h4></div></div></div>
<p>Random values and distributions have different types.  For example, the
following is of type <strong class="userinput"><code>Distribution[Double]</code></strong>:
</p><pre class="programlisting">uniform[0,1]</pre><p>
In contrast, the following are both of type <strong class="userinput"><code>Double</code></strong>:
</p><pre class="programlisting">sample[uniform[0,1]]
~uniform[0,1]</pre><p>
This is important when passing distributions as arguments to other
distributions and functions.  For example, the distribution <strong class="userinput"><code>iid</code></strong> is used to generate a specific number of samples from another distribution.  Thus, it needs to receive a distribution as an argument:
</p><pre class="programlisting">~iid[4,normal[0,1]]      // OK    : 4 samples from the normal[0,1] distribution
~iid[4,~normal[0,1]]     // not OK: 4 samples from ... a random number?</pre><p>
(See <a class="xref" href="#types" title="8.5. Argument and result types">Section 8.5, &#8220;Argument and result types&#8221;</a>.)
</p>
  </div>
  </div>

  <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="default_values"></a>8.4. Default values and default priors</h3></div></div></div>
  <p>
Some function arguments have default values.  For example, the <strong class="userinput"><code>Rates.gamma</code></strong> parameter <strong class="userinput"><code>n</code></strong> has a default value of 4.  Thus the following are equivalent:
</p><pre class="programlisting">hky85+Rates.gamma[n=4]+inv
hky85+Rates.gamma+inv</pre>

  <p>When the default value is random, then the argument has a default prior. For example, the <strong class="userinput"><code>kappa</code></strong> parameter of <strong class="userinput"><code>hky85</code></strong> has a default value of <strong class="userinput"><code>~log_normal[log[2],0.25]</code></strong>, so the following are equivalent:
</p><pre class="programlisting">hky85[kappa~log_normal[log[2],0.25]]
hky85</pre><p>
The <strong class="userinput"><code>help</code></strong> command can be used to determine the default value for a parameter, if there is one.</p>
  </div>

  <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="types"></a>8.5. Argument and result types</h3></div></div></div>
  <p>  Every function has a <span class="emphasis"><em>result type</em></span>, as well as an <span class="emphasis"><em>argument type</em></span> for each argument.  The argument type specifies what kind of arguments are acceptable, and the result type specifies what kind of result the function produces.  Types include <strong class="userinput"><code>Int</code></strong> for integers, <strong class="userinput"><code>Double</code></strong> for double-precision floating point numbers, and <strong class="userinput"><code>String</code></strong> for text strings.  Integer arguments are implicitly converted to <strong class="userinput"><code>Double</code></strong> when the argument type is <strong class="userinput"><code>Double</code></strong>.</p>

  <p>Some types contain parameters.  For example <strong class="userinput"><code>List[Int]</code></strong> indicates a list of integers and <strong class="userinput"><code>List[Double]</code></strong> indicates a list of real numbers.  In order to indicate a list of unknown type, we use a <span class="emphasis"><em>type variable</em></span> <strong class="userinput"><code>a</code></strong> and write <strong class="userinput"><code>List[a]</code></strong>.  Type variables always begin with a lower-case letter.  They are able to match any specific type, and their value is found by pattern-matching.  For example, the function <strong class="userinput"><code>add[x,y]</code></strong> takes two arguments of type <strong class="userinput"><code>a</code></strong> and has a result of type <strong class="userinput"><code>a</code></strong>.  Thus:
</p><pre class="programlisting">add[1,2]        // arguments are a=Int, so result is of type Int
add[1.0,2.0]    // arguments are a=Double, so result is of type Double</pre><p>
<strong class="userinput"><code>Pair[a,b]</code></strong> is a parameterized type that can be specialized to (for example) <strong class="userinput"><code>Pair[String,Double]</code></strong> and <strong class="userinput"><code>Pair[Int,Int]</code></strong>.
  </p>

  <p>Types for components of substitution models are often parameterized by type of the alphabet.  For example, hky85 has a result type of <strong class="userinput"><code>RevCTMC[a]</code></strong>, where <strong class="userinput"><code>a</code></strong> could be <strong class="userinput"><code>DNA</code></strong> or <strong class="userinput"><code>RNA</code></strong>.  The use of alphabet types in substitution models prevents combining substitution models with mismatched alphabets.
  </p>

  </div>
  </div>
    

    <div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="idp65"></a>9. Partitioned data sets</h2></div></div></div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp60"></a>9.1. Partitions</h3></div></div></div>
    <p>You should analyze multiple genes under different evolutionary models by putting each one it its own data partition.  Placing different genes in different partitions means that their alignments vary independently.  It also prevents sequences in one gene from being aligned against sequences in another gene.</p>

    <p>Different partitions share the same tree topology and a common set of unscaled branch lengths.  However, branch lengths are scaled by a different factor in each partition, since some genes may evolve faster than others.</p>

    <p>To put different genes in different partitions, you should place the sequences from each partition in a different FASTA or Phylip file:
    </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy <em class="replaceable"><code>sequence-file1</code></em> <em class="replaceable"><code>sequence-file2</code></em></code></strong></pre><p>
    The sequence names in files for all partitions should be the same. 
    </p>
    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp61"></a>9.2. Unlinked models</h3></div></div></div>
    <p>By default, each partition will have its own substitution model, insertion/deletion model, and scaled tree length.  For example, even if all partitions are assigned a <strong class="userinput"><code>tn93</code></strong> substitution model, their base frequencies will all be estimated independently.  When parameters are estimated separately for two partitions, we say that the parameters for those partitions are "unlinked".</p>

    <p>A substitution model or insertion-deletion model that is specified without qualification will apply to every partition.  However, each partition will recieve its own copy of each model with unlinked parameter values:
      </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy <em class="replaceable"><code>sequence-file1</code></em> <em class="replaceable"><code>sequence-file2</code></em> --smodel tn93 --imodel rs07</code></strong></pre>

<p>You can select partition-specific values for 4 options: <strong class="userinput"><code>--smodel</code></strong>, <strong class="userinput"><code>--imodel</code></strong>, <strong class="userinput"><code>--alphabet</code></strong>, and <strong class="userinput"><code>--scale</code></strong>.  For example, to specify different substitution models but the same alphabet:
</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy <em class="replaceable"><code>sequence-file1</code></em> <em class="replaceable"><code>sequence-file2</code></em> --smodel 1:tn93 --smodel 2:gtr --alphabet DNA</code></strong></pre>
    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp62"></a>9.3. Fixing the alignment in some partitions</h3></div></div></div>
<p>You can fix the alignment and ignore insertion/deletion information in one partition, while allowing the alignment to vary and using insertion/deletion information in another partition:
</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy <em class="replaceable"><code>sequence-file1</code></em> <em class="replaceable"><code>sequence-file2</code></em> --imodel 2:none</code></strong></pre><p>
Since alignments are estimated by default, the alignment will be estimated in the first partition, but fixed in the second partition.</p>
<p>Specifying specify <strong class="userinput"><code>-I none</code></strong> fixes the alignment in all partitions:
</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy <em class="replaceable"><code>sequence-file1</code></em> <em class="replaceable"><code>sequence-file2</code></em> -I none</code></strong></pre><p>
</p>
    </div>
    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp63"></a>9.4. Linked models</h3></div></div></div>
<p>You can also specify that two partitions share a single copy of a single substitution model or indel model.  For example, if two partitions both have a <strong class="userinput"><code>tn93</code></strong> model, linking these models would force the partitions to have the same nucleotide frequencies and substitution rates.  Linking partitions reduces the number of parameters that need to be estimated, and also pools information between the partitions:
</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy <em class="replaceable"><code>sequence-file1</code></em> <em class="replaceable"><code>sequence-file2</code></em> --smodel 1,2:tn93 --imodel 1,2:rs07</code></strong></pre><p>
By default each partition has a separate scale, but you can force groups of partitions to share a scale as follows:
</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy <em class="replaceable"><code>sequence-file1</code></em> <em class="replaceable"><code>sequence-file2</code></em> --scale 1,2:</code></strong></pre><p>
</p>
    </div>
    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp64"></a>9.5. Linking models via the <strong class="userinput"><code>link</code></strong> command</h3></div></div></div>

    <p>The <strong class="userinput"><code>--link</code></strong> command is provided to allow specifying a model for each partition separately, and then afterwards choose which partitions to link.
    </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy <em class="replaceable"><code>sequence-file1</code></em> <em class="replaceable"><code>sequence-file2</code></em> --smodel 1:tn93 --smodel 2:tn93 --link=1,2 -t</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>bali-phy <em class="replaceable"><code>sequence-file1</code></em> <em class="replaceable"><code>sequence-file2</code></em> --smodel tn93                   --link=1,2 -t</code></strong>   </pre><p>
    If the linked partitions are given different models, BAli-Phy will give an error and refuse to run:
    </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy <em class="replaceable"><code>sequence-file1</code></em> <em class="replaceable"><code>sequence-file2</code></em> --smodel 1:tn93 --link=1,2 -t</code></strong>
bali-phy: Error! Partitions 1 and 2 cannot be linked because they have differing values 'tn93' and ''</pre>
    <p>You can also specify which of the 3 attributes "smodel", "imodel", and "scale" are being linked:
    </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>bali-phy <em class="replaceable"><code>sequence-file1</code></em> <em class="replaceable"><code>sequence-file2</code></em> --link=1,2:smodel,scale -t</code></strong>    // Don't link the indel model</pre><p>
    </p>
    </div>
    </div>

  <div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="mixing_and_convergence"></a>10. Convergence and Mixing: Is it done yet?</h2></div></div></div>
    

    <p>
      When using Markov chain Monte Carlo (MCMC) programs like
      <span class="application">MrBayes</span>, <span class="application">BEAST</span> or
      <span class="application">BAli-Phy</span>, it is hard to determine in
      advance how many iterations are required to give a good
      estimate. The number depends on the specific data set that is
      being examined. As a result, <span class="application">BAli-Phy</span>
      relies on the user to analyze the output of a running chain
      periodically in order to determine when enough samples have been
      obtained.  This section describes a number of techniques to
      diagnose when more samples must be taken.
    </p>

    <p>Some of the better diagnostics for lack of convergence rely on running at least 2 independent copies of the Markov chain (preferably 4-10) from different random starting points to see if the sampled posterior distributions for each chain are the same.  Unfortunately, when the distributions all seem to be this same, this doesn't <span class="emphasis"><em>prove</em></span> that they have all converged to the equilibrium distribution.  However, if the distributions are different then you can reject either convergence or good mixing.</p>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp66"></a>10.1. Definition of Convergence</h3></div></div></div>
      

      <p>Convergence refers to the the tendency of a Markov chain to
	to "forget" its starting value and become typical of its
	equilibrium distribution. Note that convergence is a property
	of the Markov chain itself, not of individual runs of the
	Markov chain.  Ideally a number of individual runs should be
	examined in order to determine how many initial iterations to
	discard as "burnin".
      </p>
    </div>
    
    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp67"></a>10.2. Definition of Mixing</h3></div></div></div>
      
      <p>
	In MCMC, each sample is not fully independent of previous
	samples.  In fact, even after a Markov chain has converged,
	it can get "stuck" in one part of the parameter space for a
	long time, before jumping to an equally important part.  When
	this happens, each new sample contributes very little new
	information, and we need to obtain many more samples to get
	good precision on our parameter estimates.  In such a case, we say 
	that the chain isn't "mixing" well. 
      </p>
    </div>

    
    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp70"></a>10.3. Diagnostics: Variation in split frequencies across runs (ASDSF/MSDSF)</h3></div></div></div>
      
      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp68"></a>10.3.1. ASDSF and MSDSF</h4></div></div></div>
	
<p>
To calculate the ASDSF and MSDSF run:
</p><pre class="screen"><code class="prompt">%</code> trees-bootstrap <em class="replaceable"><code>dir-1</code></em>/C1.trees <em class="replaceable"><code>dir-2</code></em>/C1.trees ... <em class="replaceable"><code>dir-n</code></em>/C1.trees &gt; partitions.bs</pre><p>	
For each split, the SDSF value is just the standard deviation across
runs of the Posterior Probabilities for that split.  By averaging the
resulting SDSF values across splits, we may obtain the ASDSF value
(Huelsenbeck and Ronquist 2001).  This is commonly considered
acceptable if it is &lt; 0.01.
</p>

<p>However, it is also useful to consider the maximum of the SDSF
  values (MSDSF).  This represents the range of variation in PP across
  the runs for the split with the most variation.
</p>
      </div>
      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp69"></a>10.3.2. Split-frequency comparison plot</h4></div></div></div>
	
	<p>To generate the split-frequency comparison plot, you must have R installed.  Locate the script <code class="filename">compare-runs.R</code>.  Then run:
</p><pre class="screen"><code class="prompt">%</code> trees-bootstrap <em class="replaceable"><code>dir-1</code></em>/C1.trees <em class="replaceable"><code>dir-2</code></em>/C1.trees ... <em class="replaceable"><code>dir-n</code></em>/C1.trees --LOD-table=LOD-table &gt; partitions.bs 
<code class="prompt">%</code> R --slave --vanilla --args LOD-table compare-SF.pdf &lt; compare-runs.R</pre><p>
	  Following <a class="ulink" href="http://dx.doi.org/10.1080/10635150600812544" target="_top">Beiko et al (2006)</a>, this displays the variation in
	  estimates of split frequencies across runs.  Splits are
	  arranged on the x-axis in increasing order of 
	  Posterior Probability (PP), which is obtained by averaging over
	  runs.  We then plot a vertical bar from the minimum PP to the
	  maximum PP.
	</p>
	</div>
    </div>


    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp71"></a>10.4. Diagnostics: Potential Scale Reduction Factors (PSRF)</h3></div></div></div>
      
<p>
Potential Scale Reduction Factors check that different runs have
similar posterior distributions.  Only numerical variables may have a
PSRF. To calculate the PSRF for each
numerical parameter, you may run: 

</p><pre class="screen"><code class="prompt">%</code> statreport <em class="replaceable"><code>dir-1</code></em>/C1.log <em class="replaceable"><code>dir-2</code></em>/C2.p ... <em class="replaceable"><code>dir-n</code></em>/C1.log &gt; Report </pre><p>
The PSRF is a ratio of the width of the pooled distribution to the
average width of each distribution, and should ideally be 1.  The PSRF
is customarily considered to be small enough if it is less than 1.01.
</p>

<p>
We compare the PSRF based on the length of 80% credible intervals
(Brooks and Gelman 1998) and report the result as PSRF-80%CI.  For
integer-valued parameters, we avoid excessively large PSRF values by
subtracting 1 from the width of the pooled CI.
</p>

<p>
We also report a new PSRF that is more sensitive for integer
distributions.  For each individual distribution, we find the 80%
credible interval.  We divide the probability of that interval (which
may be more than 80%) by the probability of the same interval under the
pooled distribution.  The average of this measure over all
distributions gives us a PSRF that we report as PSRF-RCF.
</p>

<p>This convergence diagnostic gives a criterion for
detecting when a parameter value has stabilized at different
values in several independent runs, indicating a lack of
convergence. This situation might occur if different runs of
the Markov chain were trapped in different modes and failed to
adequately mix between modes.</p>
    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp74"></a>10.5. Diagnostics: Effective sample sizes (ESS)</h3></div></div></div>
      
      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp72"></a>10.5.1. ESS for numerical values</h4></div></div></div>
	
      <p>To calculate the split ESS values, run:
</p><pre class="screen"><code class="prompt">%</code> statreport <em class="replaceable"><code>dir-1</code></em>/C1.log <em class="replaceable"><code>dir-2</code></em>/C1.log ... <em class="replaceable"><code>dir-n</code></em>/C1.log &gt; Report </pre><p>
      We calculate effective sample sizes based on integrated
      autocorrelation times.  This method has the nice property that
      simply duplicating every sample does not increase the ESS.
      </p>

      <p>The
      program <a class="ulink" href="http://evolve.zoo.ox.ac.uk/software/tracer/" target="_top">Tracer</a>
      also computes ESS values.</p>
      </div>

      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp73"></a>10.5.2. ESS for split frequencies</h4></div></div></div>
	
      <p>To calculate the split ESS values, run:
</p><pre class="screen"><code class="prompt">%</code> trees-bootstrap <em class="replaceable"><code>dir-1</code></em>/C1.trees <em class="replaceable"><code>dir-2</code></em>/C1.trees ... <em class="replaceable"><code>dir-n</code></em>/C1.trees &gt; partitions.bs</pre><p>
      To compute the ESS for a split, we consider the presence or absence
      of a split in each iteration as a series of binary values.  We
      compute the integrated autocorrelation time for this binary
      sequence, which leads to an ESS.  This approach is similar to
      dividing the iterations into blocks and computing the ESS on the
      PP estimates in the blocks.  It is also similar to estimating
      the variance reduction under a block bootstrap.
      </p>
    </div>
</div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp77"></a>10.6. Diagnostics: Stabilization</h3></div></div></div>
      
      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp75"></a>10.6.1. Stabilization of numerical values</h4></div></div></div>
	
<p>To obtain estimates of the stabilization time for each
numerical  parameter, you may run:
</p><pre class="screen"><code class="prompt">%</code> statreport C1.log &gt; Report </pre><p>
Each series of values is counted as having stabilized after
the series crosses its upper and then lower 95% confidence bounds
twice (if the initial value is below the median) or crosses its lower
and then upper confidence bounds twice (if the initial value is above
the median). The confidence bounds are those based on its
equilibrium distribution as calculated from the last third of the
values in the sequence.</p>
      </div>

      <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp76"></a>10.6.2. Stabilization of tree topologies and tree distances</h4></div></div></div>
	
	<p>In addition to examining convergence diagnostics for continuous
	parameters, it is important to examine convergence diagnostics
	for the topology as well
	(<a class="ulink" href="http://dx.doi.org/10.1080/10635150600812544" target="_top">Beiko
	et al 2006</a>).  In theory, we recommend the web tool <a class="ulink" href="http://ceb.csit.fsu.edu/awty/" target="_top">Are We There Yet (AWTY)</a> (Wilgenbush et al, 2004).  However, AWTY gives incorrect results if you upload plain NEWICK tree samples -- which is what BAli-Phy outputs.  Therefore, if you wish to use AWTY, you must convert the tree samples files to NEXUS before you upload them to AWTY in order to get correct results.</p>

<p>It is also be possible to assess stabilization of tree topologies using tools distributed with <span class="application">bali-phy</span> by using commands like the following.  Here, sub-sampling and burnin does not apply to the equilibrium tree files. Also, note that you need to manually construct the equilibrium samples, which we recommend to contain at least 500 trees; you might do this by sub-sampling using the <span class="application">BAli-Phy</span> tool <span class="command"><strong>sub-sample</strong></span>.</p>

<div class="orderedlist"><ol class="orderedlist" type="1"><li class="listitem"><p>To report the average distances within and between two tree samples:
</p><pre class="screen"><code class="prompt">%</code> trees-distances --skip=<em class="replaceable"><code>burnin</code></em> --subsample=<em class="replaceable"><code>factor</code></em> compare <em class="replaceable"><code>dir-1</code></em>/C1.trees <em class="replaceable"><code>dir-2</code></em>/C1.trees</pre><p>
</p></li><li class="listitem"><p>To compute the distance from each tree in C1.trees to all trees equilibrium.trees, as a time series:
</p><pre class="screen"><code class="prompt">%</code> trees-distances --skip=<em class="replaceable"><code>burnin</code></em> --subsample=<em class="replaceable"><code>factor</code></em> convergence <code class="filename">C1.trees</code> <code class="filename">equilibrium.trees</code></pre><p>
</p></li><li class="listitem"><p>To assess when the above time series stabilizes:
</p><pre class="screen"><code class="prompt">%</code> trees-distances --skip=<em class="replaceable"><code>burnin</code></em> --subsample=<em class="replaceable"><code>factor</code></em> converged <code class="filename">C1.trees</code> <code class="filename">equilibrium.trees</code></pre><p>
The stabilization criterion is the same one described above for numerical values.
</p></li></ol></div>

<p>Note that the running time is the product of the number of trees in the two files.  Therefore, comparing two complete tree samples without sub-sampling will take too long.</p>

   </div>
	
    </div>

    
  </div>


  <div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="alignment-utilities"></a>11. Alignment utilities</h2></div></div></div>
    

    <p>Most of these tools will describe their options if given the "<strong class="userinput"><code>--help</code></strong>" argument on the command line.</p>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp78"></a>11.1. alignment-info</h3></div></div></div>
      
      <p>Show basic information about the alignment:</p>
<pre class="screen"><code class="prompt">%</code> alignment-info file.fasta
<code class="prompt">%</code> alignment-info file.fasta file.tree</pre>
    </div>
    
    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp79"></a>11.2. alignment-cat</h3></div></div></div>
      
      <p>To select columns from an alignment:</p>
<pre class="screen"><code class="prompt">%</code> alignment-cat -c1-10,50-100,600- file.fasta &gt; result.fasta
<code class="prompt">%</code> alignment-cat -c5-250/3 file.fasta &gt; first_codon_position.fasta
<code class="prompt">%</code> alignment-cat -c6-250/3 file.fasta &gt; second_codon_position.fasta</pre>

  <p>To concatenate two or more alignments:</p>
<pre class="screen"><code class="prompt">%</code> alignment-cat file1.fasta file2.fasta &gt; all.fasta</pre>
    </div>
    
    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp80"></a>11.3. alignment-thin</h3></div></div></div>
      
      <p>Remove columns without a minimum number of letters:</p> 
<pre class="screen"><code class="prompt">%</code> alignment-thin --min-letters=5 <em class="replaceable"><code>file</code></em>.fasta &gt; <em class="replaceable"><code>file</code></em>-thinned.fasta</pre>
      <p>Remove sequences:</p>
<pre class="screen"><code class="prompt">%</code> alignment-thin --remove=seq1,seq2 <em class="replaceable"><code>file</code></em>.fasta &gt; <em class="replaceable"><code>file</code></em>2.fasta</pre>
      <p>Remove short sequences:</p>
<pre class="screen"><code class="prompt">%</code> alignment-thin --longer-than=250 <em class="replaceable"><code>file</code></em>.fasta &gt; <em class="replaceable"><code>file</code></em>-long.fasta</pre>
      <p>Remove sequences while preserving sequence diversity:</p> 
<pre class="screen"><code class="prompt">%</code> alignment-thin --down-to=30 <em class="replaceable"><code>file</code></em>.fasta &gt; <em class="replaceable"><code>file</code></em>-30taxa.fasta
<code class="prompt">%</code> alignment-thin --down-to=30 <em class="replaceable"><code>file</code></em>.fasta --protect=seq1,seq2 &gt; <em class="replaceable"><code>file</code></em>-30taxa.fasta</pre>
      <p>Remove sequences that are missing conserved columns:</p> 
<pre class="screen"><code class="prompt">%</code> alignment-thin --remove-crazy=10 <em class="replaceable"><code>file</code></em>.fasta &gt; <em class="replaceable"><code>file</code></em>2.fasta</pre>
    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp81"></a>11.4. alignment-draw</h3></div></div></div>
      
      <p>Draw an alignment to HTML, optionally coloring residues by AU.</p>
<pre class="screen"><code class="prompt">%</code>  alignment-draw <em class="replaceable"><code>file</code></em>.fasta --show-ruler --color-scheme=DNA+contrast &gt; <em class="replaceable"><code>file</code></em>.html
<code class="prompt">%</code>  alignment-draw <em class="replaceable"><code>file</code></em>.fasta --show-ruler --AU=<em class="replaceable"><code>file</code></em>-AU.prob --color-scheme=DNA+contrast+fade+fade+fade+fade &gt; <em class="replaceable"><code>file</code></em>-AU.html</pre>
    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp82"></a>11.5. alignment-find</h3></div></div></div>
      
      <p>Find the last (or first) FastA alignment in a file.</p>
<pre class="screen"><code class="prompt">%</code> alignment-find --first &lt; <em class="replaceable"><code>file</code></em>.fastas &gt; first.fasta
<code class="prompt">%</code> alignment-find &lt; <em class="replaceable"><code>file</code></em>.fastas &gt; last.fasta</pre>
    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp83"></a>11.6. alignment-indices</h3></div></div></div>
      
      <p>Turn columns from a template alignment into alignment constraints:</p>
<pre class="screen"><code class="prompt">%</code> alignment-indices template.fasta &gt; constraints.txt
<code class="prompt">%</code> alignment-indices -c100-110,200,300- template.fasta &gt; constraints.txt</pre>

      <p>Each line in this file corresponds to one
	alignment column.</p>
    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp84"></a>11.7. alignment-chop-internal</h3></div></div></div>
      
      <p>Remove internal-node ancestral sequences from an alignment.  (This
	probably only works for alignments output by bali-phy.) </p>
<pre class="screen"><code class="prompt">%</code> alignment-chop-internal <em class="replaceable"><code>file</code></em>.fasta &gt; <em class="replaceable"><code>file</code></em>-chopped.fasta</pre>
    </div>

  </div>
  <div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="tree-utilities"></a>12. Tree utilities</h2></div></div></div>
    
    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp85"></a>12.1. trees-consensus</h3></div></div></div>
      
      <p>This program analyzes the tree sample contained in
	<em class="replaceable"><code>file</code></em>.  It reports the MAP topology, the
	supported taxa partitions (including partial partitions), and the
	majority consensus topology.
      </p> 
    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp86"></a>12.2. trees-bootstrap</h3></div></div></div>
      
      <p>Usage: trees-bootstrap <em class="replaceable"><code>file1</code></em>
	[<em class="replaceable"><code>file2</code></em> ... ] --predicates
	<em class="replaceable"><code>predicate-file</code></em> [OPTIONS] </p>
      <p>This program analyzes the tree samples contained in
	<em class="replaceable"><code>file1</code></em>, <em class="replaceable"><code>file2</code></em>,
	etc.  It gives the support of each tree sample for each predicate in
	<em class="replaceable"><code>predicate-file</code></em>, and reports a confidence
	interval based on the block bootstrap.
      </p> 

      <p>Each predicate is the intersection of a set of partitions, and
	is specified as a list of partitions or (multifurcating) trees, one
	per line.  Predicates are separated by blank lines.
      </p>
    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp87"></a>12.3. trees-to-SRQ</h3></div></div></div>
      

      <p>Usage: trees-to-SRQ <em class="replaceable"><code>predicate-file</code></em> [OPTIONS] <em class="replaceable"><code>trees-file</code></em> </p>

      <p>This program analyzes the tree samples contained in
	<em class="replaceable"><code>trees-file</code></em>.  It uses them to produce an
	SRQ plot for each predicate in
	<em class="replaceable"><code>predicate-file</code></em>.  Plots are produced in
	<span class="application">gnuplot</span> format, with one point per line
	and with plots separated by a blank line.</p>

      <p>If <strong class="userinput"><code>--mode sum </code></strong> is specified, then a "sum"
	plot is produced instead of an SRQ plot.  In this plot, the slope of
	the curve corresponds to the posterior probability of the event.  If the
	<strong class="userinput"><code>--invert</code></strong> option is used then the slope of the
	curve correspond to the probability of the inverse event.  This is
	recommended if the probability of the event is near 1.0, because the
	sum plot does not distinguish variation in probabilities near 1.0 well.
      </p>

    </div>

  </div>

    <div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="compilation"></a>13. Compiling <span class="application">BAli-Phy</span></h2></div></div></div>
    

    <p>Compiling <span class="application">BAli-Phy</span> is intended to be a relatively painless process.  However, most people will want to use the pre-compiled binaries as described in the standard installation instructions at <a class="xref" href="#installation" title="2. Installation">Section 2, &#8220;Installation&#8221;</a> instead of compiling BAli-Phy themselves.  You might want to compile BAli-Phy yourself if you want to 
    </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem">run BAli-Phy on a non-Intel CPU (such as ARM64 or Alpha).</li><li class="listitem">run BAli-Phy on Mac OS X versions older than 10.9</li><li class="listitem">enable the <span class="application">draw-tree</span> program on Windows or Mac (this requires the Cairo graphics library).</li><li class="listitem">modify the source code and submit a patch with new functionality.</li><li class="listitem">change the optimization options used to compile BAli-Phy in the pre-compiled binaries.</li><li class="listitem">compile with debugging options to find the cause of a bug, and maybe fix it.</li></ul></div><p>
    Otherwise, the pre-compiled binaries will be fine.
    </p>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp91"></a>13.1. Setup</h3></div></div></div>
      

    <p>In order to compile BAli-Phy, you need
    </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem">a <a class="ulink" href="https://en.wikipedia.org/wiki/C%2B%2B14" target="_top">C++14</a> compiler</li><li class="listitem"><a class="ulink" href="http://mesonbuild.com" target="_top">meson</a> (version &gt;= 0.45)</li></ul></div><p>
    We recommend the GNU C++ Compiler (<a class="ulink" href="http://gcc.gnu.org" target="_top">GCC</a>) version 5.0 (or higher) or the <a class="ulink" href="http://clang.llvm.org" target="_top">Clang</a> compiler version 3.5.0 or higher. The <a class="ulink" href="http://www.cairographics.org/" target="_top">Cairo</a> graphics library is optional, but if it is missing, the <span class="command"><strong>drawtree</strong></span> tool that is used to draw consensus trees won't be built. See also <a class="xref" href="#software_req" title="2.8. Install programs used for viewing the results">Section 2.8, &#8220;Install programs used for viewing the results&#8221;</a>. </p>

    <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp88"></a>13.1.1. Linux</h4></div></div></div>
    <p>On Debian and Ubuntu, you can type:
</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>sudo apt-get install g++ git libcairo2-dev pandoc</code></strong></pre><p>

    </p>
If your version of Debian or Ubuntu is recent enough to contain meson version 0.45 or higher, you can install meson with apt-get:
<pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>sudo apt-get install meson</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>dpkg -s meson | grep Version</code></strong>
Version: 0.45.1-2
</pre>
Otherwise you can install meson through pip3:
<pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>sudo apt-get install python3 python3-pip ninja</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>python3 -m venv meson</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>source meson/bin/activate</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>pip3 install meson</code></strong>
</pre>

    </div>

    <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp89"></a>13.1.2. Mac</h4></div></div></div>
    <p>On Mac OS X, the simplest way to get a compiler is to install <a class="ulink" href="https://developer.apple.com/xcode/" target="_top">XCode</a> (version 6 or newer) command line tools, which come with <a class="ulink" href="http://clang.llvm.org" target="_top">clang</a>.
 </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>xcode-select --install</code></strong></pre><p>    To get the other tools, first install <a class="ulink" href="http://brew.sh/" target="_top">homebrew</a>, and then type:
</p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>brew install git meson cairo pandoc</code></strong></pre><p>
    </p>
    

    </div>
    
    <div class="section"><div class="titlepage"><div><div><h4 class="title"><a name="idp90"></a>13.1.3. Windows (native)</h4></div></div></div>
    <p>The <a class="ulink" href="http://www.msys2.org" target="_top">MSYS2</a> project provides an MINGW64 compiler that can create native windows executables.  MSYS2 itself is actually non-native (it is derived from cygwin), and therefore the MSYS2 shell refers to drives as <code class="filename">/c/</code> instead of <code class="filename">C:/</code>.</p>
    <pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>pacman --needed --noconfirm -Sy pacman-mirrors</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>pacman -Sy</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>pacman -S mingw-w64-x86_64-ninja</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>pacman -S mingw-w64-x86_64-toolchain</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>pacman -S mingw-w64-python3-pip</code></strong>
<code class="prompt">%</code> <strong class="userinput"><code>PATH=/c/msys64/mingw64/bin:$PATH</code></strong> # Put the mingw64 executables into your path
<code class="prompt">%</code> <strong class="userinput"><code>pip3 install meson</code></strong></pre>
    <p>Keep in mind that MSYS2 keeps its (non-native) executables in <code class="filename">C:/msys64/usr/bin</code>, while it keeps the (native) MINGW executables in <code class="filename">C:/msys64/mingw64/bin</code>.  If you want to use the native MINGW executables, you need to make sure that <code class="filename">/c/msys64/mingw64/bin/</code> is in your PATH.  If you forget to put the MINGW executables in the path, some of the installed MINGW programs (such as pip3 above) will show up as missing when you try to run them.</p>
    </div>

    
    

	  
    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="quickstart"></a>13.2. Clone, Configure, Compile</h3></div></div></div>
       
    <p>First check out the code using git:
</p><pre class="screen"><code class="prompt">%</code><strong class="userinput"><code> git clone https://github.com/bredelings/BAli-Phy.git</code></strong>
<code class="prompt">%</code><strong class="userinput"><code> cd BAli-Phy</code></strong>
<code class="prompt">%</code><strong class="userinput"><code> git submodule update --init</code></strong></pre><p>
</p>
<p>
Then run meson to configure the build process:
</p><pre class="screen"><code class="prompt">%</code><strong class="userinput"><code> meson build --prefix=$HOME/Applications/bali-phy-3.3/</code></strong></pre><p>
In the MSYS2 environment, the command is called <span class="command"><strong>meson.py</strong></span> instead of <span class="command"><strong>meson</strong></span>:
</p><pre class="screen"><code class="prompt">%</code><strong class="userinput"><code> meson.py build --prefix=$HOME/Applications/bali-phy-3.3/</code></strong></pre><p>
</p>
<p>
Finally, build and install the software:
</p><pre class="screen"><code class="prompt">%</code><strong class="userinput"><code> ninja -C build</code></strong>
<code class="prompt">%</code><strong class="userinput"><code> ninja -C build test</code></strong>
<code class="prompt">%</code><strong class="userinput"><code> ninja -C build install</code></strong>
</pre><p>
The command <span class="command"><strong>bali-phy</strong></span> and its associated tools should then be located in <code class="filename">~/Applications/bali-phy-3.3/bin/</code>. To install to another directory <em class="replaceable"><code>dir</code></em>, specify --prefix=<em class="replaceable"><code>dir</code></em> to <span class="command"><strong>meson</strong></span>.
      </p>

    </div>
      <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp92"></a>13.3. Options: compiler and linker flags</h3></div></div></div>
	
      <p>You can select the C++ compiler by setting the CXX variable.  A useful example of this is to use <span class="command"><strong>g++-7</strong></span> on systems where <span class="command"><strong>g++</strong></span> invokes a compiler that is too old:
 </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>CXX=g++-7 meson build --prefix=$HOME/Applications/bali-phy-3.3</code></strong></pre><p>
 You may also set compiler and linker options using the CPPFLAGS, CXXFLAGS, and LDFLAGS variables.  For example, you can instruct the compiler to use all the features of your chip, instead of producing generic code that will run anywhere:
 </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>CXXFLAGS="-mtune=native -march=native" meson --prefix=$HOME/Applications/bali-phy-3.3</code></strong></pre><p>

 For example, you can set the CPPFLAGS and LDFLAGS variables to instruct the compiler where to look for libraries, such as cairo:
	  </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>CPPFLAGS="-I/usr/local/include" LDFLAGS="-L/usr/local/lib" meson build --prefix=$HOME/Applications/bali-phy-3.3</code></strong></pre><p>
 Another useful example of this is to produce an OS X executable on that can run on older versions of OS X:
	  </p><pre class="screen"><code class="prompt">%</code> <strong class="userinput"><code>CXXFLAGS="-mmacosx-version-min=10.9" LDFLAGS="-mmacosx-version-min=10.9" meson build --prefix=$HOME/Applications/bali-phy-3.3</code></strong></pre><p>	</p>
      </div>

    </div>


  <div class="section"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="FAQ"></a>14. Frequently Asked Questions (FAQ)</h2></div></div></div>
    
    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp93"></a>14.1. Input files</h3></div></div></div>
      <div class="qandaset"><a name="idp2441"></a><dl><dt>14.1.1. <a href="#idp2442">Does BAli-Phy accept the wildcard characters "N" or "X"?  How does it treat them?</a></dt><dt>14.1.2. <a href="#idp2448">Does BAli-Phy accept "?" characters?</a></dt><dt>14.1.3. <a href="#idp2456">Does BAli-Phy accept the characters "R" and "Y", etc.?</a></dt></dl><table border="0" style="width: 100%;"><colgroup><col align="left" width="1%"><col></colgroup><tbody>
	  <tr class="question"><td align="left" valign="top"><a name="idp2442"></a><a name="idp2443"></a><p><b>14.1.1.</b></p></td><td align="left" valign="top"><p>Does BAli-Phy accept the wildcard characters "N" or "X"?  How does it treat them?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>Yes, BAli-Phy accepts the wildcard characters "N"
	    (for DNA) and "X" (for proteins).  These characters
	    indicate that some letter is present (as opposed to a
	    gap), but that you don't know <span class="emphasis"><em>which</em></span>
	    letter it is.  
	    </p></td></tr>
	
	  <tr class="question"><td align="left" valign="top"><a name="idp2448"></a><a name="idp2449"></a><p><b>14.1.2.</b></p></td><td align="left" valign="top"><p>Does BAli-Phy accept "?" characters?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>
	      No.  "?" characters are often used to indicate
	      <span class="emphasis"><em>either</em></span> letter presence (e.g. "N",
	      "X") <span class="emphasis"><em>or</em></span> absence (e.g. "-").
	      BAli-phy will insist that you replace each "?" with
	      either "N"/"X" or "-" to indicate which one you mean. 
	    </p><p>(Most programs ignore indels and consider only
	    substitutions, and in that case "N" and "-" have the same
	    effect on the likelihood or parsimony score.  However,
	    since BAli-Phy takes indels into account, these two
	    alternatives are quite different.)
	    </p></td></tr>
	
	<tr class="question"><td align="left" valign="top"><a name="idp2456"></a><a name="idp2457"></a><p><b>14.1.3.</b></p></td><td align="left" valign="top"><p>Does BAli-Phy accept the characters "R" and "Y", etc.?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>
	      Yes.  BAli-Phy accepts the characters Y, R, W, S, K, M,
	      B, D, H, and V for DNA, RNA, and Codon alphabets.
	      BAli-Phy also accepts the characters B, Z, and J 
	      for amino acids.  These characters indicate partial
	      knowledge about a letter.  For example, R indicates
	      that a nucleotide is present, and is a puRine (A or
	      G). J indicates that an amino acid is present and is
	      either I or L.  
	    </p><p>
	      (Note that sequences sometimes contain such ambiguity
	      codes because the DNA that was sequenced contains
	      <span class="emphasis"><em>both</em></span> values.  This might occur when
	      sequencing a heterozygote or when sequencing pooled DNA
	      from several individuals.  However, the model in
	      BAli-Phy (and other phylogeny inference programs) is
	      that only one letter is correct, but we do not know
	      which one it is.  This is probably not problematic when
	      dealing with pooled sequences, but should be considered.)
	    </p></td></tr>
	</tbody></table></div>
    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp94"></a>14.2. Running <span class="command"><strong>bali-phy</strong></span>.</h3></div></div></div>

      <div class="qandaset"><a name="idp2468"></a><dl><dt>14.2.1. <a href="#idp2469">Can I fix the alignment and ignore indel information, like MrBayes, BEAST, PhyloBayes and other MCMC programs?</a></dt><dt>14.2.2. <a href="#idp2476">Can I fix the tree topology, while allowing the alignment to vary?</a></dt><dt>14.2.3. <a href="#idp2483">Can I fix the tree topology and relative branch lengths, while allowing the alignment to vary?</a></dt><dt>14.2.4. <a href="#idp2491">Can I fix the tree topology and absolute branch lengths in all data partitions, while allowing the alignment to vary?</a></dt></dl><table border="0" style="width: 100%;"><colgroup><col align="left" width="1%"><col></colgroup><tbody>
	  <tr class="question"><td align="left" valign="top"><a name="idp2469"></a><a name="idp2470"></a><p><b>14.2.1.</b></p></td><td align="left" valign="top"><p>Can I fix the alignment and ignore indel information, like MrBayes, BEAST, PhyloBayes and other MCMC programs?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>Yes.  Add <strong class="userinput"><code>-Inone</code></strong> or <strong class="userinput"><code>--imodel=none</code></strong> on the command line.</p></td></tr>
	
	  <tr class="question"><td align="left" valign="top"><a name="idp2476"></a><a name="idp2477"></a><p><b>14.2.2.</b></p></td><td align="left" valign="top"><p>Can I fix the tree topology, while allowing the alignment to vary?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>Yes.  Add <strong class="userinput"><code>--disable=topology --tree=<em class="replaceable"><code>treefile</code></em></code></strong> on the command line.</p></td></tr>
	
	  <tr class="question"><td align="left" valign="top"><a name="idp2483"></a><a name="idp2484"></a><p><b>14.2.3.</b></p></td><td align="left" valign="top"><p>Can I fix the tree topology and <span class="emphasis"><em>relative</em></span> branch lengths, while allowing the alignment to vary?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>Yes.  Add <strong class="userinput"><code>--disable=tree --tree=<em class="replaceable"><code>treefile</code></em></code></strong> on the command line.</p></td></tr>
	
	  <tr class="question"><td align="left" valign="top"><a name="idp2491"></a><a name="idp2492"></a><p><b>14.2.4.</b></p></td><td align="left" valign="top"><p>Can I fix the tree topology and <span class="emphasis"><em>absolute</em></span> branch lengths <span class="emphasis"><em>in all data partitions</em></span>, while allowing the alignment to vary?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>Yes.  Add <strong class="userinput"><code>--disable=tree --tree=<em class="replaceable"><code>treefile</code></em> --scale=1</code></strong> on the command line.</p></td></tr>
	</tbody></table></div>

    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp95"></a>14.3. Run-time error messages</h3></div></div></div>
      
      
      <div class="qandaset"><a name="idp2503"></a><dl><dt>14.3.1. <a href="#idp2504">I tried to use --smodel lg08+Rates.gamma[6] and I got an error message "bali-phy: No match."  What gives?</a></dt></dl><table border="0" style="width: 100%;"><colgroup><col align="left" width="1%"><col></colgroup><tbody>
	  <tr class="question"><td align="left" valign="top"><a name="idp2504"></a><a name="idp2505"></a><p><b>14.3.1.</b></p></td><td align="left" valign="top"><p>I tried to use <strong class="userinput"><code>--smodel lg08+Rates.gamma[6]</code></strong> and I got an error message "bali-phy: No match."  What gives?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>You are probably using the C-shell as your command line shell.  It is trying to interpret <strong class="userinput"><code>lg08+Rates.gamma[6]</code></strong> as an array before running the command, and it is not succeeding.  Therefore, it doesn't even run <span class="command"><strong>bali-phy</strong></span>.</p><p>To avoid this, put quotes around the substitution model, like this: <strong class="userinput"><code>--smodel "lg08+Rates.gamma[6]"</code></strong>.  This will keep the C-shell from interfering with your command.
	    </p></td></tr>
	</tbody></table></div>
    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp96"></a>14.4. Stopping <span class="command"><strong>bali-phy</strong></span>.</h3></div></div></div>
      

      <div class="qandaset"><a name="idp2519"></a><dl><dt>14.4.1. <a href="#idp2520">Why is bali-phy still
	      running? How long will it take?</a></dt><dt>14.4.2. <a href="#idp2526">How do I stop a bali-phy
	  run on my personal computer?</a></dt><dt>14.4.3. <a href="#idp2544">How do I stop a bali-phy
	  run on a computing cluster?</a></dt><dt>14.4.4. <a href="#idp2553">So, how can I know when to stop it?</a></dt><dt>14.4.5. <a href="#idp2558">How can I tell when the chain has converged?</a></dt><dt>14.4.6. <a href="#idp2564">How can I check how many iterations the chain
	      has finished?</a></dt></dl><table border="0" style="width: 100%;"><colgroup><col align="left" width="1%"><col></colgroup><tbody>
	  <tr class="question"><td align="left" valign="top"><a name="idp2520"></a><a name="idp2521"></a><p><b>14.4.1.</b></p></td><td align="left" valign="top"><p>Why is <span class="command"><strong>bali-phy</strong></span> still
	      running? How long will it take?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>It runs until you stop it.  Stop it when its
	      done.</p></td></tr>
	
	  <tr class="question"><td align="left" valign="top"><a name="idp2526"></a><a name="idp2527"></a><p><b>14.4.2.</b></p></td><td align="left" valign="top"><p>How do I stop a <span class="command"><strong>bali-phy</strong></span>
	  run on my personal computer?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>Simply kill the process -- there is no special
	    command to stop <span class="command"><strong>bali-phy</strong></span>. If you are
	    running it on your personal workstation, then you can use
	    the command <span class="command"><strong>kill</strong></span>.  To do that, you need
	    to find the PID (process ID) of the running program.  You
	    can find this by examining the beginning of the file
	    <code class="filename">C1.out</code>.  For 
	    example:
</p><pre class="screen"><code class="prompt">%</code> less 5d-1/C1.out
command: bali-phy 5d.fasta -I none --iter=10 --seed=0
start time: Wed Jul  4 17:13:25 2018

VERSION: 3.3-b1  [HEAD -&gt; logging, origin/logging commit 96a43e550]  (Jul 04 2018 16:25:09)
BUILD: Jul  4 2018 17:12:29
ARCH: linux x86_64
COMPILER: gcc 8.1.0 x86_64
directory: /home/bredelings/Work
subdirectory: 5d-675
hostname: telomere
<span class="emphasis"><em>PID: 18838</em></span>
...
</pre><p>
Here the PID is 18838.  Therefore you can type:
</p><pre class="screen"><code class="prompt">%</code> kill 18838</pre><p>
On some operating systems you can also type:
</p><pre class="screen"><code class="prompt">%</code> killall bali-phy</pre><p>
However, be aware that this will terminate <span class="emphasis"><em>all</em></span> of
your <span class="command"><strong>bali-phy</strong></span> runs on that computer.
	    </p></td></tr>
	
	  <tr class="question"><td align="left" valign="top"><a name="idp2544"></a><a name="idp2545"></a><p><b>14.4.3.</b></p></td><td align="left" valign="top"><p>How do I stop a <span class="command"><strong>bali-phy</strong></span>
	  run on a computing cluster?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>Simply terminate the submitted job.  The specific command
	    to terminate a job will depend on the queue manager that
	    is installed on your cluster.  Examine the documentation
	    for your cluster, or ask your cluster support staff how to delete
	    running jobs on your cluster.
	    </p><p>As an example, if the SGE software is used
	    to submit jobs, then the command <span class="command"><strong>qstat</strong></span>
	    should list your jobs and their job ID numbers (which is
	    different than the process ID number).  You can then use
	    the command <span class="command"><strong>qdel</strong></span> to delete jobs by ID
	    number.  The SGE documentation describes how to use these
	    commands. 
	    </p></td></tr>
	
	  <tr class="question"><td align="left" valign="top"><a name="idp2553"></a><a name="idp2554"></a><p><b>14.4.4.</b></p></td><td align="left" valign="top"><p>So, how can I know when to stop it?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>You can stop when it has both converged and also run for long enough to give
	      you &gt;1000 effectively independent samples.  </p></td></tr>
	
	  <tr class="question"><td align="left" valign="top"><a name="idp2558"></a><a name="idp2559"></a><p><b>14.4.5.</b></p></td><td align="left" valign="top"><p>How can I tell when the chain has converged?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>See section <a class="xref" href="#mixing_and_convergence" title="10. Convergence and Mixing: Is it done yet?">Section 10, &#8220;Convergence and Mixing: Is it done yet?&#8221;</a>.</p></td></tr>
	
	  <tr class="question"><td align="left" valign="top"><a name="idp2564"></a><a name="idp2565"></a><p><b>14.4.6.</b></p></td><td align="left" valign="top"><p>How can I check how many iterations the chain
	      has finished?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>Run <span class="command"><strong>wc -l C1.log</strong></span> inside the output
	      directory, and subtract 2.
	    </p></td></tr>
	</tbody></table></div>
    </div>


    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp97"></a>14.5. Running <span class="command"><strong>bp-analyze</strong></span>.</h3></div></div></div>
      <div class="qandaset"><a name="idp2575"></a><dl><dt>14.5.1. <a href="#idp2576">Why does bp-analyze say "Program 'draw-tree' not found.  Tree pictures will not be generated"?</a></dt><dt>14.5.2. <a href="#idp2584">Why does bp-analyze say "Program 'gnuplot' not found.  Trace plots will not be generated"?</a></dt><dt>14.5.3. <a href="#idp2591">Why does bp-analyze say "Program 'R' not found.  Some mixing graphs will not be generated"?</a></dt><dt>14.5.4. <a href="#idp2598">Why is bp-analyze stopping early, or failing to generate some files?</a></dt></dl><table border="0" style="width: 100%;"><colgroup><col align="left" width="1%"><col></colgroup><tbody>
	  <tr class="question"><td align="left" valign="top"><a name="idp2576"></a><a name="idp2577"></a><p><b>14.5.1.</b></p></td><td align="left" valign="top"><p>Why does <span class="command"><strong>bp-analyze</strong></span> say "Program 'draw-tree' not found.  Tree pictures will not be generated"?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>The program <span class="command"><strong>draw-tree</strong></span> was not distributed on this platform (Windows, Mac).  This is not a fatal error message, it just means that a pretty picture of the tree will not be generated automatically.  You can still view the tree with <span class="application">FigTree</span>, for example.</p></td></tr>
	
	  <tr class="question"><td align="left" valign="top"><a name="idp2584"></a><a name="idp2585"></a><p><b>14.5.2.</b></p></td><td align="left" valign="top"><p>Why does <span class="command"><strong>bp-analyze</strong></span> say "Program 'gnuplot' not found.  Trace plots will not be generated"?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>This is because you have not installed <span class="application">gnuplot</span>.  This is not a fatal error message, it just means that pictures of partition support, and SRQ plots will not be generated automatically.</p></td></tr>
	
	  <tr class="question"><td align="left" valign="top"><a name="idp2591"></a><a name="idp2592"></a><p><b>14.5.3.</b></p></td><td align="left" valign="top"><p>Why does <span class="command"><strong>bp-analyze</strong></span> say "Program 'R' not found.  Some mixing graphs will not be generated"?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>This is because you have not installed <span class="application">R</span>.  This is not a fatal error message, it just means that a plot showing differences in clade probabilities between runs will not be generated.</p></td></tr>
	
	  <tr class="question"><td align="left" valign="top"><a name="idp2598"></a><a name="idp2599"></a><p><b>14.5.4.</b></p></td><td align="left" valign="top"><p>Why is <span class="command"><strong>bp-analyze</strong></span> stopping early, or failing to generate some files?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>Look in the file <code class="filename">Results/bp-analyze.log</code>.  This should contain the actual commands that were run, along with error message from these commands.  These error message should give you a hint as to what the problem might be.</p></td></tr>
	</tbody></table></div>
    </div>


    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp98"></a>14.6. Interpreting the results.</h3></div></div></div>
      

      <div class="qandaset"><a name="idp2608"></a><dl><dt>14.6.1. <a href="#idp2609">How do I compute the clade support?</a></dt><dt>14.6.2. <a href="#idp2614">How do I compute the split/bi-partition support?</a></dt></dl><table border="0" style="width: 100%;"><colgroup><col align="left" width="1%"><col></colgroup><tbody>
	  <tr class="question"><td align="left" valign="top"><a name="idp2609"></a><a name="idp2610"></a><p><b>14.6.1.</b></p></td><td align="left" valign="top"><p>How do I compute the clade support?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>Actually, BAli-Phy uses unrooted trees, so it only estimates bi-partition support.  A bi-partition is a division of taxa into two groups, but it does not specify which group contains the root. </p></td></tr>
	
	  <tr class="question"><td align="left" valign="top"><a name="idp2614"></a><a name="idp2615"></a><p><b>14.6.2.</b></p></td><td align="left" valign="top"><p>How do I compute the split/bi-partition support?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>After you analyze the output (<a class="xref" href="#analysis" title="5.4. Summarizing the output - scripted">Section 5.4, &#8220;Summarizing the output - scripted&#8221;</a>), the partition support is indicated in
	      <code class="filename">Results/consensus</code> and in <code class="filename">Results/c50.PP.tree</code>. </p></td></tr>
	</tbody></table></div>
    </div>

    <div class="section"><div class="titlepage"><div><div><h3 class="title"><a name="idp99"></a>14.7. How do I...</h3></div></div></div>
      
      <div class="qandaset"><a name="idp2625"></a><dl><dt>14.7.1. <a href="#idp2626">How do I concatenate alignments?</a></dt><dt>14.7.2. <a href="#idp2635">How do I select columns from an alignment?</a></dt><dt>14.7.3. <a href="#idp2643">How do I create an alignment-constraint file
	      from an alignment?</a></dt></dl><table border="0" style="width: 100%;"><colgroup><col align="left" width="1%"><col></colgroup><tbody>
	  <tr class="question"><td align="left" valign="top"><a name="idp2626"></a><a name="idp2627"></a><p><b>14.7.1.</b></p></td><td align="left" valign="top"><p>How do I concatenate alignments?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>
 	    </p><pre class="screen"><code class="prompt">%</code> alignment-cat <em class="replaceable"><code>filename1.fasta</code></em> <em class="replaceable"><code>filename2.fasta</code></em> &gt; result.fasta</pre><p>
	      The alignments must have the same sequence names, but
	      the names need not be in the same order.
	    </p></td></tr>
	
	  <tr class="question"><td align="left" valign="top"><a name="idp2635"></a><a name="idp2636"></a><p><b>14.7.2.</b></p></td><td align="left" valign="top"><p>How do I select columns from an alignment?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>
	    </p><pre class="screen"><code class="prompt">%</code> alignment-cat -c1-10,50-100,600- <em class="replaceable"><code>filename.fasta</code></em> &gt; result.fasta</pre><p>
	    The resulting alignment will contain the selected columns
	    in the order you specified.
	  </p></td></tr>
	
	  <tr class="question"><td align="left" valign="top"><a name="idp2643"></a><a name="generating_constraint_files"></a><p><b>14.7.3.</b></p></td><td align="left" valign="top"><p>How do I create an alignment-constraint file
	      from an alignment?</p></td></tr>
	  <tr class="answer"><td align="left" valign="top"></td><td align="left" valign="top"><p>To constrain the alignment to match some alignment
	      file <em class="replaceable"><code>filename.fasta</code></em> in columns
	      100, 200-250, and 300, run:
	      </p><pre class="screen"><code class="prompt">%</code> alignment-indices -c100,200-250,300 <em class="replaceable"><code>filename.fasta</code></em> &gt; filename.constraint</pre><p>
	    </p></td></tr>	
	</tbody></table></div>
    </div>
  </div>

</div></body></html>