## 1. Find a tree to update

#### 1.a.  Use the command `find_trees.py` and the name of your organism of interest

In [1]:
find_trees.py -t bufo -tb -o bufo.txt
# -t indicates the taxon name that you want to update
# -tb indicates that you want to find a tree that has an alignment availabe on TreeBASE
# -o indicates the name of the output file

OTT id 187219
Gathering references (slow)
.......


Note that the first line printed to screen is the OTT id of our taxa of interest. Write it down, it will be useful for the next step!

#### 1.b. Explore the output file with `less`, `cat`, or `more`. In our case it is called `bufo.txt`:

In [2]:
less bufo.txt

Members of bufo present in the following studies in the OpenTree Phylesystem
Only returning studies with TreeBase links

Study pg_423 tree(s) tree2857
OpenTreeUrl: https://tree.opentreeoflife.org/curator/study/view/pg_423
Reference: Pyron, R.A., & Wiens J.J. 2011. A large-scale phylogeny of Amphibia including over 2800 species, and a revised classification of extant frogs, salamanders, and caecilians. Molecular Phylogenetics Evolution 61 (2): 543-583.
Data Deposit URL: http://purl.org/phylo/treebase/phylows/study/TB2:S11742
bufo.txt (END)


#### 1.c) Get the OpenTree study ID and tree ID of the tree you choose to update

We only have one choice of study to update, it has study id `pg_423`, and tree id `tree2857`. We will use these ids on the next step.

## 2. Download the tree and alignment

**2.a. Use the command `physcraper_run.py` with the `--no_est` option to automatically download the tree and alignment only**

In [3]:
time physcraper_run.py --study_id pg_423 --tree_id tree2857 --treebase --no_est --output /project/linked_dir/bufo_pg_423
# --study_id indicates the OpenTree study ID of the tree to update
# --tree_id indicates the OpenTree tree ID of the tree to update
# --treebase indicates to automatically download a corresponding alignment from treeBASE
# -no_est indicates that you just want to download tree and alignment
# --bootstrap reps indicates the number of bootstrap reps to perform
# --output indicates the name of the directory that will save results from the update analysis

No config file, using defaults
Configuration Settings
[blast]
Entrez.email = None
e_value_thresh = 1e-05
hitlist_size = 10
location = remote
localblastdb = None
num_threads = 4
delay = 90
[physcraper]
spp_threshold = 5
min_length = 0.8
max_length = 1.2

downloading best match alignment from treebase to pg_423.2/pg_423tree2857.aln
https://raw.githubusercontent.com/TreeBASE/supertreebase/master/data/treebase/S11742.xml
get_mrca_ott


**2.b. Check the downloaded alignment and tree**

Manually verify that the tree and alignment correspond to the same organism, and that you have the alignment that you want to update.

**2.c. Solving `ERROR: Problems reading the input data from treebase`**

This error is solved by manually downloading the aligment from TreeBASE.
Often, Phsycraper's automatic alignment download does not work. For example, running the following for *Plasmodium*: 

```
physcraper_run.py --study_id ot_766 --tree_id Tr85440 --treebase --no_est --output ot_766
```


we obtained the following error message:

```
ERROR: Problems reading the input data from treebase:
DENDROPY ERROR:'HTTP Error 404: Not Found'

It appears this may not be a DNA alignment - physcraper can only use DNA currently.
Investigate, and potentialy download alignment directly at
https://raw.githubusercontent.com/TreeBASE/supertreebase/master/data/treebase/S16424.xml
```

To be able to do a full Physcraper run for this tree and alignment, we will have to download the alignment manually from TreeBASE. For that we will do the following: 

- The Physcraper error message gives us a URL from which we can get the information needed to download the alignment for TreeBASE. The TreeBASE study id is the alphanumeric string right before the `.xml` extension. In the case of this example, it is `S16424`. 

- Now, go to TreeBASE at www.treebase.org. Type down or paste the TreeBASE study id (S16424 in this example) in the treeBASE search bar. Make sure to choose search by study_id option.

- The treeBASE search will take you to the study homepage. Go to the `Matrices` tab and pick a nucleic acid alignment. 

- Download the "reconstructed nexus" file instead of the one marked as "original" and save it in your computer. If you are using a Physcraper docker, save it in the `linked_dir` folder. 


**2.d. Solving `dendropy.dataio.nexusreader.NexusReaderError: Error parsing data source 'alignment-file.nex' ...`**

Reconstructed nexus files sometimes have characters unexpected by [Dendropy](https://dendropy.org/)'s nexus alignment reader.

To solve this issue, we have to manually remove the unexpected characters from the reconstructed nexus file.

For example, the following example for _Primates_ uses the reconstructed `M585.nex` file as downloaded from TreeBASE:

```
physcraper_run.py --study_id pg_2407 --tree_id tree5076 --search_taxon ott:913935 --alignment linked_dir_alignments/M585-reconstructed.nex --aln_schema nexus --output test
```

It returns the following error:

```
dendropy.dataio.nexusreader.NexusReaderError: Error parsing data source '/project/linked_dir/alignments/M585-reconstructed.nex' on line 164 at column 21: Expecting "=" after character set name "ambiguous", but instead found "("
```

Visual inspection of the M585-reconstructed.nex file shows an additional set at the very end of the file. Removing everything after the sequence set ends, fixes the problem:

```
[!...]
GAA{AG}GCTGACATTGGCGTCGCTATGGGTATCGCCGGAAGTGACGTCAGTAAACAGGCGGCTGACATGATCCTATTGGATGACAACTTTGCTTCTATCGTAACGGGTGTCGAAGAAGGTCGACTTATCTTTGACAACTTGAAAAAATCCATCGCTTATACTCTGACCAGTAACATCCCCGAGATTACTCCATTCTTATTTTTCATCTTGGCTGACGTTCCACTGCCCCTCGGTACCGTCACCATCTTGTGTATTGATTTA----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
;
END;

[!Remove everything after this]

BEGIN SETS;
	CHARSET ambiguous (CHARACTERS = ATPaseNucleotideReduced1) =  1135-1161 1399-1437;
	CHARSET ambigand3s (CHARACTERS = ATPaseNucleotideReduced1) =  2697 2700 2703 2706 2709 2712 2715 2718 2721 2724 2727 2730 2733 2736 2739 2742 2745 2748 2751 2754 2757 2760 2763 2766 2769 2772 2775 2778 2781 2784 2787 2790 2793 2796 2799 2802 2805 2808 2811 2814 2817 2820 2823 
[!...]    
2640 2643 2646 2649 2652 2655 2658 2661 2664 2667 2670 2673 2676 2679 2682 2685 2688 2691 2694;
END;
BEGIN CODONS;
CODONPOSSET * CodonPositions (CHARACTERS = ATPaseNucleotideReduced1) = 
		1: 1-3007\3,
		2: 2-3008\3,
		3: 3-3009\3;
CODONPOSSET  CodonPositions (CHARACTERS = ATPaseNucleotideReduced1) = 
		1: 1-3007\3,
		2: 2-3008\3,
		3: 3-3009\3;
END;
```

## 3. Run a full Physcraper update!

Now we have downloaded the alignment, and we can leave out the `--treebase` flag and give the path to our alignment with the flag `--alignment`. We have to also specify the format of our alignment (nexus or fasta) following the `--aln_schema` flag.

In [3]:
time physcraper_run.py --study_id pg_423 --tree_id tree2857 --search_taxon ott:187219 --alignment /project/linked_dir/bufo_pg_423/pg_423tree2857.aln --aln_schema nexus --bootstrap_reps 2 --output /project/linked_dir/bufo_pg_423
# --study_id indicates the OpenTree study ID of the tree to update
# --tree_id indicates the OpenTree tree ID of the tree to update
# --search_taxon indicates the taxon OTT ID that we want to update
# --alignment indicates the local path to the downloaded alignment
# --aln_schema indicates the format of the downloaded alignment
# --bootstrap reps indicates the number of bootstrap reps to perform
# --output indicates the name of the directory that will save results from the update analysis

No config file, using defaults
Configuration Settings
[blast]
Entrez.email = None
e_value_thresh = 1e-05
hitlist_size = 10
location = remote
localblastdb = None
num_threads = 4
delay = 90
[physcraper]
spp_threshold = 5
min_length = 0.8
max_length = 1.2



This will take a while, especially because BLAST searches are time consuming!