# Installing and running `Physcraper` with docker

#### 1. Get the Physcraper Docker image from Docker Hub:

In [1]:
docker pull mctavishlab/physcraper

Using default tag: latest
latest: Pulling from mctavishlab/physcraper
Digest: sha256:24bd4f1d49a5ffca850a5b580f7bd4c8f21647334546f2b6683d68e78c0ba71b
Status: Image is up to date for mctavishlab/physcraper:latest
docker.io/mctavishlab/physcraper:latest


#### 2.a) Start a container:

In [2]:
docker run --name physcraperdocker -i -t mctavishlab/physcraper bash
# -i makes the container interactive.
# -t specifies the image to use as a template.
# --name specifies the container name.

docker: Error response from daemon: Conflict. The container name "/physcraperdocker" is already in use by container "58578efff28e90e0e75bb4ebc98a86bfe984ab9c721203d3d2bae64079e07e15". You have to remove (or rename) that container to be able to reuse that name.
See 'docker run --help'.


_This error means that the container is already running somewhere else on my computer._

_We need to exit it and restart it:_

#### 2.b) Exit a container:

In [3]:
exit

#### 2.c) Restart a container:

In [4]:
docker container restart physcraperdocker
docker exec -it physcraperdocker bash

physcraperdocker

root@58578efff28e:/project/physcraper# 

#### 3. Creating a "linked" container


We have to create a link between the virtual machine and our machine, so we can pass files between the two. To achieve this, we will create a "linked" physcraper docker container using the following command:

```
docker run --name physcraperdocker_link -i -t -v [/local/path/to/linked_dir]:/project/linked_dir  mctavishlab/physcraper  bash
```

Remember to remove the square brackets from the example path `[local/path/to/linked_dir]` before running the command. 

In my case, the `linked_dir` folder is in home, in the `collab-paper` directory, so I ran:

In [5]:
docker run --name physcraperdocker_link -i -t -v ~/collab-paper/linked_dir:/project/linked_dir mctavishlab/physcraper bash

physcraperdocker_link

root@6a1b02905a5d:/project/physcraper# 

#### 4. Check that dependencies and `Physcraper` are available:

In [6]:
which muscle

/usr/bin/muscle


In [7]:
which raxmlHPC

/usr/bin/raxmlHPC


In [8]:
python --version

Python 3.8.10


In [9]:
physcraper_run.py --help

usage: physcraper_run.py [-h] [-s STUDY_ID] [-t TREE_ID] [-a ALIGNMENT] [-as ALN_SCHEMA]
                         [-o OUTPUT] [-c CONFIGFILE] [-tb] [-re RELOAD_FILES]
                         [-tf TREE_FILE] [-tfs TREE_SCHEMA] [-ti TAXON_INFO] [-tag TAG]
                         [-st SEARCH_TAXON] [-spn SPECIES_NUMBER]
                         [-bl BLOCK_LIST [BLOCK_LIST ...]] [-tp TRIM_PERC]
                         [-rlmax RELATIVE_LENGTH_MAX] [-rlmin RELATIVE_LENGTH_MIN] [-r]
                         [-db BLAST_DB] [-u BLAST_URL] [-e EMAIL] [-ak API_KEY]
                         [-ev EVAL] [-hl HITLIST_LEN] [-nt NUM_THREADS] [-de DELAY]
                         [-no_est] [-bs BOOTSTRAP_REPS] [-tx TAXONOMY] [-v]
optional arguments:
-h, --help            show this help message and exit
-s STUDY_ID, --study_id STUDY_ID
                      OpenTree study id.
-t TREE_ID, --tree_id TREE_ID
                      OpenTree tree id.
-a ALIGNMENT, --alignment ALIGNMENT
      

#### 4. Test that `Physcraper` runs by starting an example:

First, copy the preloaded BLAST sequences:

In [10]:
cp -r docs/examples/pg_55_web pg_55_test

Next, start a physcraper run with 2 to 10 bootstraps. Measure running time by adding the `time` command at the beginning.

As you can see at the end, a Physcraper run with 10 bootstraps took 395 minutes and 10.226 seconds, that is a bit more than 6hrs 35minutes!

In [11]:
time physcraper_run.py --study_id pg_55 --tree_id tree5864 --treebase --bootstrap_reps 2 --output pg_55_test

No config file, using defaults
Configuration Settings
[blast]
Entrez.email = None
e_value_thresh = 1e-05
hitlist_size = 10
location = remote
localblastdb = None
num_threads = 4
delay = 90
[physcraper]
spp_threshold = 5
min_length = 0.8
max_length = 1.2

Using alignment file found at pg_55_test/pg_55tree5864.aln.
get_mrca_ott
restricting blast search to taxon ncbi:3629 (ott:279960; Malvaceae)
Tree file written to /project/physcraper/pg_55_test/inputs_pg_55tree5864/taxonname.tre
31 taxa in alignment and tree
Blasting 'otu376438'
Blasting 'otu376443'
Blasting 'otu376431'
Blasting 'otu376442'
Blasting 'otu376429'
Blasting 'otu376441'
Blasting 'otu376453'
Blasting 'otu376426'
Blasting 'otu376434'
Blasting 'otu376451'
Blasting 'otu376433'
Blasting 'otu376425'
Blasting 'otu376448'
Blasting 'otu376449'
Blasting 'otu376432'
Blasting 'otu376447'
Blasting 'otu376437'
Blasting 'otu376445'
Blasting 'otu376454'
Blasting 'otu376446'
Blasting 'otu376435'
Blastin