# Reproducing Research Paper Results: TreeSwift

> Moshiri N (2020). "TreeSwift: a massively scalable Python package for trees." *SoftwareX*. 11:100436. [doi:10.1016/j.softx.2020.100436](https://doi.org/10.1016/j.softx.2020.100436)

# Setting Up the Environment

Before we can run any analyses, we need to first install the packages we are benchmarking. 

| Tool        | Version                                             |
| :---------: | :-----------------------------------------------:   |
| TreeSwift   | [1.1.45](https://pypi.org/project/treeswift/1.1.45) |
| Biopython   | [1.85](https://pypi.org/project/biopython/1.85)     |
| DendroPy    | [4.4.0](https://pypi.org/project/DendroPy/4.4.0)    |
| ETE Toolkit | [3.1.1](https://pypi.org/project/ete3/3.1.1)        |
| NetworkX    | [3.4.2](https://pypi.org/project/networkx/3.4.2)    |

In [8]:
# install tree packages using pip
!pip install -q treeswift==1.1.45
!pip install -q biopython==1.85
!pip install -q dendropy==4.4.0
!pip install -q ete3==3.1.1
!pip install -q networkx==3.4.2


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip

# Downloading the Data and Code

In [2]:
!git clone https://github.com/BrandonWerbel/TreeSwift.git

Cloning into 'TreeSwift'...
remote: Enumerating objects: 269, done.[K
remote: Counting objects: 100% (269/269), done.[K
remote: Compressing objects: 100% (162/162), done.[K
remote: Total 269 (delta 118), reused 252 (delta 105), pack-reused 0 (from 0)[K
Receiving objects: 100% (269/269), 14.04 MiB | 7.70 MiB/s, done.
Resolving deltas: 100% (118/118), done.


After running the cell above, if you open the "Files" tab on Google Colab (the icon in the left navigation bar that looks like a folder), you will see that we now have a folder called `TreeSwift`, which contains the cloned contents of the GitHub repository.

# Installing Analysis Dependencies

If you look through the Python scripts within the [`scripts`](https://github.com/niemasd/TreeSwift-Paper/tree/master/scripts) folder of the aforementioned GitHub repository, you'll notice that they [depend on](https://github.com/niemasd/TreeSwift-Paper/blob/ca754155014cd116ce48fdc570b5b47f9bac4cd0/scripts/figures.py#L16-L20) the following Python packages that are not part of the [Python Standard Library](https://docs.python.org/3/library/index.html) and must thus be installed: [NumPy](https://numpy.org/), [Matplotlib](https://matplotlib.org/), and [seaborn](https://seaborn.pydata.org/).

These packages are simply used to visualize the results of the benchmark experiment (i.e., they are not integral to the benchmark experiment itself), so it isn't critical for us to use the exact same versions that were used in the original paper. If we *really* wanted to, we could perform the same exploration as we did with the tree packages to find the most recent version of these packages released before [July 30, 2019](https://github.com/niemasd/TreeSwift-Paper/commit/179f6b6a3c8de30dee305af8aff472c1636b1b7b), but for the sake of simplicity, we will use the most recent versions of each of these packages.

We can install all of the packages needed for visualizing the results of our experiment by running the following cell, which should take around 30 seconds to finish running.

In [3]:
# install packages needed for visualizing our results
!pip install -q numpy
!pip install -q matplotlib
!pip install -q seaborn


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


# Running Our Own Benchmark

In [10]:
!python TreeSwift/scripts/benchmark.py

=== Running task: height ===
- X = 100 leaves. dendropy.['python3', 'TreeSwift/scripts/time.py', 'TreeSwift/data/tree_n100.tre.gz', 'dendropy', 'height']
.['python3', 'TreeSwift/scripts/time.py', 'TreeSwift/data/tree_n100.tre.gz', 'dendropy', 'height']
.['python3', 'TreeSwift/scripts/time.py', 'TreeSwift/data/tree_n100.tre.gz', 'dendropy', 'height']
.['python3', 'TreeSwift/scripts/time.py', 'TreeSwift/data/tree_n100.tre.gz', 'dendropy', 'height']
.['python3', 'TreeSwift/scripts/time.py', 'TreeSwift/data/tree_n100.tre.gz', 'dendropy', 'height']
.['python3', 'TreeSwift/scripts/time.py', 'TreeSwift/data/tree_n100.tre.gz', 'dendropy', 'height']
.['python3', 'TreeSwift/scripts/time.py', 'TreeSwift/data/tree_n100.tre.gz', 'dendropy', 'height']
.['python3', 'TreeSwift/scripts/time.py', 'TreeSwift/data/tree_n100.tre.gz', 'dendropy', 'height']
.['python3', 'TreeSwift/scripts/time.py', 'TreeSwift/data/tree_n100.tre.gz', 'dendropy', 'height']
.['python3', 'TreeSwift/scripts/time.py', 'TreeSwift/d

Now, let's create plots from our own benchmarking results! This cell should take around 30 seconds to finish running.

In [11]:
!python3 TreeSwift/scripts/figures.py data.pkl.gz

After running the cell above, if you open the "Files" tab on Google Colab (the icon in the left navigation bar that looks like a folder), you will see all of the PDFs produced by [`figures.py`](https://github.com/niemasd/TreeSwift-Paper/blob/master/scripts/figures.py).

*Note: You may need to close and reopen the "Files" tab to get the PDFs to appear after running the cell above.*