**partitionfinder 2 & RAxML v8.1.24**

*Last updated 03/09/2017*

This notebook will run your molecular data through a program called partitionfinder 2, which will simultaneously choose a partition scheme for the data and models of molecular evolution for the partitioned blocks. The result will be automatically run through the CIPRES REST API (CRA) recommended RAxML interface: the RAxML v8.1.24 tool (tool ID: raxmlhpc8_rest_xsede).  

- partitionfinder 2 manual: http://www.robertlanfear.com/partitionfinder/assets/Manual_v2.1.x.pdf
- RAxML manual: http://sco.h-its.org/exelixis/resource/download/NewManual.pdf

Note: To make the use of this notebook smoother, it is recommended that you place this notebook in your home directory. The working directory is the directory in which this notebook is found (working directory is denoted with a '.'). If the paths to directories/files presented in this notebook differ from your own, please change them accordingly.

** TO DO before running code in this notebook **

1) Make an account (if you have one, skip this step)
- Register for a CIPRES REST API account at https://www.phylo.org/restusers/login.action. 

2) Create an application (if you have one, skip this step)
- Once you have logged into your account, go to Developer > Application Management (https://www.phylo.org/restusers/developer). Create a new application by following the instructions listed. 
- ** An application is just an easy way to name, organize, and keep track of your processes.**

3) Download the **Python 2.7 version COMMAND LINE INSTALLER** from https://www.continuum.io/downloads, and follow the directions to install it (bash ./Anaconda2-4.3.0-MacOSX-x86_64.sh).

4) Create the configuration (.cfg) file for the partitionfinder 2 analysis as specified in this partitionfinder 2 tutorial (Step 3): http://www.robertlanfear.com/partitionfinder/tutorial/.

5) Place your input alignment file (PHYLIP format) and configuration file together in same folder.

The following code checks if partitionfinder 2 (folder named partitionfinder) is in your working directory: if it isn't, partitionfinder 2 will be downloaded.

In [31]:
import os
have_pfinder = os.path.isdir("./partitionfinder/")
if have_pfinder == False:
    !git clone https://github.com/brettc/partitionfinder.git
else:
    print("partitionfinder directory already exists, proceed...")

partitionfinder directory already exists, proceed...


The partitionfinder 2 file you run depends on the type of data you have, please refer to the following list and change the file accordingly (currently the command uses PartitionFinder.py (for DNA data)):
- **DNA:** PartitionFinder.py
- **Protein:** PartitionFinderProtein.py
- **Morphological:** PartitionFinderMorphological.py

Replace the **<span style="color:red">PATH_TO_DATA</span>** field with the path to the directory containing your phylip data file and configuration file.

In [None]:
./partitionfinder/PartitionFinder.py --raxml PATH_TO_DATA

The following lines import/install necessary packages.

In [None]:
# install the CIPRES REST API client library 
!pip install python_cipres
!pip install biopython

# provides support for both Python 2 and 3
from __future__ import print_function

import python_cipres.client as CipresClient
import requests

**BEFORE RUNNING NEXT LINE**
<br>Look at your Application Information.</br>
Replace the following fields with your own information:
 - **<span style="color:red">NAME_OF_YOUR_APPLICATION</span>** is the name of your application
 - **<span style="color:red">YOUR_APPLICATION_ID</span>** is the Application ID listed under your app

This should look similar to this example:

- **<span style="color:purple">appname = "demo"</span>**
- **<span style="color:purple">appid = "mydemo-8126CB900A964FA1AD14174512F9403C"</span>**

In [None]:
url = "https://cipresrest.sdsc.edu/cipresrest/v1"
appname = "NAME_OF_YOUR_APPLICATION"
appid = "YOUR_APPLICATION_ID"

**BEFORE RUNNING NEXT LINE**
<br>Replace the following fields with your own information:</br>
 - **<span style="color:red">REPLACE_WITH_YOUR_USERNAME</span>** with your CRA account username
 - **<span style="color:red">REPACE_WITH_YOUR_PASSWORD</span>** with your CRA account password

In [None]:
username = "REPLACE_WITH_YOUR_USERNAME"
password = "REPLACE_WITH_YOUR_PASSWORD"

Now we create a CipresClient to communicate with the CIPRES REST API:

In [None]:
client = CipresClient.Client(appname, appid, username, password, url)

**BEFORE RUNNING NEXT LINE**
<br>Set the variable 'inputFilename' to the path of your input XML file from your home directory.

In [1]:
inputFilename = "./PATH_TO_YOUR_FILE/YOUR_FILE_NAME"

The following code submits the job to CIPRES with the file specified in the above code. 

This is a basic submission. 
The parameters mean the following:
- **<span style="color:purple"> 'tool' : 'RAXMLHPC8_REST_XSEDE'</span>**
  - selects RAXMLHPC8_REST_XSEDE as the CIPRES tool you want to run
- **<span style="color:purple">'input.infile_': 'inputFilename'</span>**
  - specifies the file to run (the file you specified earlier)
- **<span style="color:purple">'input.partition_': 'PATH_TO_DATA/analysis/start_tree/partitions.txt'</span>**
  - specifies the partition file output from partitionfinder 2 analysis
- **<span style="color:purple">'metadata.statusEmail' : 'true'</span>**
  - an email will be sent to user when job has finished
     
**If you would like to add parameters, please reference: **

1) the RAXMLHPC8 tool info page: http://www.phylo.org/index.php/rest/raxmlhpc8_rest_xsede.html
- This lists all the possible parameters you can run

2) the tool configuration helper, selecting BEAST2 as your tool:
    https://www.phylo.org/restusers/docs/cipresXml
- This is an easy-to-use tool that generates a command based on the parameter specifications you wish to apply to your data in RAxML. 
- **Using the tool configuration helper:** Fill out the Simple/Advanced Parameters fields you are interested in, then press 'View'. This will give you a list of parameters. You can copy each of the parameters headed by 'vparam.' inside the curly braces after the 'tool' parameter (do not include the 'vparam.' header). Example:
 
         job = client.submitJob(
             {'tool' : 'RAXMLHPC8_REST_XSEDE', 
              'bootstrap_value_' : '100',
              'choose_bootstrap_' : 'x'},
             {'input.infile_': inputFilename, 
              'input.parition_': 'PATH_TO_DATA/analysis/start_tree/partitions.txt'},
             {'metadata.statusEmail': 'true'});
  
Please note that each parameter from the Tool Configuration Helper is listed in the following format (example):
- **<span style="color:purple">'vparam.parameter_name' = 'parameter_value'</span>**

You must convert the format of each parameter to the following before adding it to your submitJob function:</br>

- **<span style="color:purple">'parameter_name' : 'parameter_value'</span></br>**

**BEFORE RUNNING NEXT LINE**

Replace the **<span style="color:red">PATH_TO_DATA</span>** field with the path to the directory containing your phylip data file and configuration file.

In [None]:
job = client.submitJob(
        {'tool' : 'RAXMLHPC8_REST_XSEDE'},
        {'input.infile_': inputFilename, 
         'input.partition_': 'PATH_TO_DATA/analysis/start_tree/partitions.txt'},
        {'metadata.statusEmail': 'true'})

Show job status information:

In [None]:
job.show(messages="true")

The job was submitted successfully if no errors are shown. To update (refresh) job status:

In [None]:
job.update()
job.show(messages="true")

The code below ensures the program will wait for your job to finish before proceeding. 
<br>**Wait for the "Job ... is finished" output to appear before running anything else:**</br>

In [None]:
job.waitForCompletion()
print("Job %s finished.  isError() returns %s" % (job.jobHandle, job.isError()))

To download your result files to a new directory (called RAxMLResults here):

In [None]:
downloadDir = "./RAxMLResults"
try:
    os.mkdir(downloadDir)
except:
    pass 
job.downloadResults(downloadDir)

To view all of your result output files, go to your home directory and find the results directory you named with the previous code (ex. RAxMLResults). We can look at one of the results, RAxML_bestTree.results (./RAxMLResults/RAxML_bestTree.result), as a raw newick text file.

<p>Using the Biopython package (installed through this notebook), refer to your RAxML best tree result here:</p>

In [None]:
from Bio import Phylo
try:
    tree = Phylo.read("./RAxMLResults/RAxML_bestTree.result", "newick")
    Phylo.draw_ascii(tree)
except:
    print("Cannot print best tree because no best tree was created, refer to the following RAxML output: \n")
    f = open('./RAxMLResults/STDOUT', 'r')
    print(f.read())
    f.close()
    pass 

Once you've downloaded a job's results you should delete the job to conserve space on CIPRES.  If you delete a job that is still running or is queued to run, it will be cancelled.

In [None]:
job.delete()