Skip to content

Commit

Permalink
homology
Browse files Browse the repository at this point in the history
  • Loading branch information
LunaSare committed Aug 10, 2021
1 parent 8a3394e commit 495e4e6
Show file tree
Hide file tree
Showing 6 changed files with 60 additions and 32 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ tests/data/precooked/*
docs/example_scripts/output/*
tests/data/tmp/*
tests/debugging/*
tests/tmp
tests/fromfile
.Rproj.user
*Rproj
*Rhistory
Expand Down
2 changes: 1 addition & 1 deletion README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ IS_README <- TRUE
<p></p>


## Automated gene tree updating!
## Automated gene tree updating with the Open Tree of Life!


```{r child="docs/mds/intro.md"}
Expand Down
37 changes: 26 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,23 @@ Status](https://travis-ci.org/McTavishLab/physcraper.svg?branch=main)](https://t

</p>

## Automated gene tree updating\!

Use a phylogenetic tree, Open Tree of Life tools, and a single locus
alignment to automatically find and add homologous sequences to improve
and advance phylogenetic inference in a group.

Physcraper relies on [The Open Tree of Life
Project](https://tree.opentreeoflife.org/opentree/argus/opentree12.3@ott93302):
*Open Tree of Life, Benjamin Redelings, Luna Luisa Sanchez Reyes, Karen
A. Cranston, Jim Allman, Mark T. Holder, & Emily Jane McTavish. (2019).
Open Tree of Life Synthetic Tree (Version 12.3). Zenodo* doi:
## Automated gene tree updating with the Open Tree of Life\!

Use a phylogenetic tree and a DNA alignment to automatically find and
add nucleotide sequences from a genetic database, to reproducibly
improve and advance phylogenetic knowledge in a group.

Physcraper relies on
[taxonomic](https://tree.opentreeoflife.org/about/taxonomy-version/ott3.3)
and [phylogenetic](https://github.com/OpenTreeOfLife/phylesystem-1)
resources and [programmatic
tools](https://github.com/OpenTreeOfLife/germinator/wiki/Open-Tree-of-Life-Web-APIs)
from the [Open Tree of
Life](https://tree.opentreeoflife.org/opentree/argus/opentree12.3@ott93302)
project: <br/> *Open Tree of Life, Benjamin Redelings, Luna Luisa
Sanchez Reyes, Karen A. Cranston, Jim Allman, Mark T. Holder, & Emily
Jane McTavish. (2019). Open Tree of Life Synthetic Tree (Version 12.3).
Zenodo* doi:
[10.5281/zenodo.3937741](https://doi.org/10.5281/zenodo.3937741)

You are now on the code repository. Please refer to Physcraper’s
Expand Down Expand Up @@ -96,6 +102,15 @@ Physcraper requires the user to install:
analysis and post-analysis of large phylogenies.” Bioinformatics
30.9 (2014): 1312-1313.* doi:
[10.1093/bioinformatics/btu033](https://doi.org/10.1093/bioinformatics/btu033)
- [BLAST
+](https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download)
if using a local genetic database. Note that BLAST + is
automatically installed with a [Physcraper installation using
Anaconda](https://physcraper.readthedocs.io/en/stable/install.html#anaconda-virtual-environment).
<br> *Camacho, C., Coulouris, G., Avagyan, V. et al. BLAST+:
architecture and applications. BMC Bioinformatics 10, 421 (2009).*
doi:
[10.1186/1471-2105-10-421](https://doi.org/10.1186/1471-2105-10-421).

Physcraper relies on the following Python packages that are
automatically installed:
Expand Down
31 changes: 16 additions & 15 deletions docs/mds/description.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,33 +7,34 @@ These data are often appropriate for looking at phylogenetic relationships, and
have the advantage of being homologous to genetic sequences used to construct existing
trees.

If you have access to a single gene DNA alignment and a tree, Physcraper automates
adding new lineage samples into your tree by using [Open Tree of Life](#the-open-tree-of-life) tools coupled to the [Blast](https://blast.ncbi.nlm.nih.gov/Blast.cgi) algorithm to search for loci in [GenBank](https://www.ncbi.nlm.nih.gov/genbank/statistics/) that are likely to be homologous to sequences in the initial DNA alignment.
If you have access to a single gene or multilocus DNA alignment, and a phylogenetic tree, Physcraper automates
adding DNA sequences of new lineage samples into your tree by using [Open Tree of Life](#the-open-tree-of-life) tools to reconcile taxonomies, and the [BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi) algorithm to search for loci in the [GenBank](https://www.ncbi.nlm.nih.gov/genbank/statistics/) genetic database that are likely to be
homologous (both orthologs and paralogs, see FAQs) to sequences in the initial DNA alignment.

By using a starting alignment and tree, Physcraper takes advantage of DNA loci and homology hypotheses (ideally orthology) that
previous researchers have assessed, curated, and deemed appropriate for the phylogenetic scope.
The sequences added during the BLAST search are limited either to a user specified taxon or
monophyletic group, or within the taxonomic scope of the ingroup of the starting tree.

These automated trees can provide a quick inference of potential phylogenetic relationships,
of problems in the taxonomic assignments of sequences, paralogy and orthology, as well as flag areas of potential systematic interest.

<br/>

![](../img/schematic-final.svg)

Figure 1 from [Sanchez-Reyes et al. 2021](https://doi.org/10.1186/s12859-021-04274-6):
The Physcraper framework consists of 4 general steps. The methodology is extensively described in the [Implementation](https://physcraper.readthedocs.io/en/latest/implementation.html) section of this documentation.
The Physcraper framework consists of 4 general steps. The methodology is further described in the [Implementation](https://physcraper.readthedocs.io/en/latest/implementation.html) section of this documentation.

<br/>

By using a starting alignment and tree, Physcraper takes advantage of loci that
previous researchers have assessed and deemed appropriate for the phylogenetic scope.
The sequences added in the search are limited either to a user specified taxon or
monophyletic group, or within the taxonomic scope of the ingroup of the starting tree.

These automated trees can provide a quick inference or potential relationships,
of problems in the taxonomic assignments of sequences, and flag areas of potential systematic interest.

<br/>


## The Open Tree of Life

The Open Tree of Life ([OpenTree](https://tree.opentreeoflife.org/opentree/argus/opentree13.4@ott93302)) is a project that unites phylogenetic inferences and taxonomy
to provide a synthetic estimate of species relationships across the entire tree of life.
The Open Tree of Life (OpenTree) is a project that unites expert, peer-reviwed [phylogenetic inferences](https://github.com/OpenTreeOfLife/phylesystem-1) and
[taxonomy](https://tree.opentreeoflife.org/about/taxonomy-version/ott3.3)
to generate a [synthetic tree](https://tree.opentreeoflife.org/opentree/argus/opentree13.4@ott93302) estimate of species relationships across all life.

<br/>

Expand All @@ -50,7 +51,7 @@ Currently the tree comprises 2.3 million tips.
However, only around 90,000 of those taxa are represented by phylogenetic estimates -
the rest are placed in the tree based on their taxonomic names.

To achieve this, the OpenTree Taxonomy (OTT) constructs a reference taxonomy through
To achieve this, the OpenTree Taxonomy (OTT) constructs a reference taxonomy for taxonomic reconciliation, through
an algorithmic combination of several source taxonomies, such as:
- [Hibbet et al. 2007](https://doi.org/10.1016/j.mycres.2007.03.004),
- [SILVA](http://www.arb-silva.de/),
Expand Down
14 changes: 11 additions & 3 deletions docs/mds/faq.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
Frequently asked questions

## How does Physcraper handle paralogs?

Physcraper allows multiple tips to be mapped to the same taxon, with an added unique identifier that allows linking the sequence back to the original one on GenBank.
This means that newly added homologous DNA sequences can include both orthologs
and paralogs and that a phylogenetic analysis can be performed.
While we expect that most curated alignments are composed of ortholog sequences, the algorithm used to find new sequences is unable to distinguish paralogs, so it is likely that -- if existing, they will be automatically added to the dataset.
Users should check the output phylogenetic trees to detect paralogs and filter them
appropriately if needed.

## I have to learn to use OpenTree to use Physcraper, is the learning curve worth it?

We think that this decision depends on the goals of the user.
Expand All @@ -19,7 +28,7 @@ For this goal, having a minimum familiarity with the OpenTree tools is needed.
We realize that this might initially discourage some users, but we believe that the benefits brought by connecting taxonomic data with the OpenTree services will encourage users to familiarize with the OpenTree services, and to adopt the use of Physcraper.


## How does Physcraper handle polytomies of starting trees?
## How does Physcraper handle polytomies in a starting trees?

The Physcraper starting tree is a phylogeny whose tip labels must have been standardized to the OpenTree Taxonomy (as described in the Introduction section:
[Mapping names to taxa](https://physcraper.readthedocs.io/en/latest/quick-start.html#updating-your-own-tree-and-alignment)).
Expand All @@ -45,8 +54,7 @@ around the early diverging branches,
the automatic rooting is problematic and can have multiple solutions.



## How does Physcraper use the starting alignment?

Physcraper uses the input DNA alignment (single or multiple marker) to mine the GenBank database with the goal of increasing the
Physcraper uses all unique DNA sequences in the input alignment to mine a genetic database using the BLAST algorithm, with the goal of increasing the
lineage sampling of the alignment within a given biological group.
6 changes: 4 additions & 2 deletions docs/mds/intro.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
Use a phylogenetic tree, Open Tree of Life tools, and a single locus alignment to automatically find and add homologous sequences to improve and advance phylogenetic inference in a group.
Use a phylogenetic tree and a DNA alignment to automatically find and add nucleotide sequences from a genetic database, to reproducibly improve and advance phylogenetic knowledge in a group.

Physcraper relies on [The Open Tree of Life Project](https://tree.opentreeoflife.org/opentree/argus/opentree12.3@ott93302): *Open Tree of Life, Benjamin Redelings, Luna Luisa Sanchez Reyes, Karen A. Cranston, Jim Allman, Mark T. Holder, & Emily Jane McTavish. (2019). Open Tree of Life Synthetic Tree (Version 12.3). Zenodo* doi: [10.5281/zenodo.3937741](https://doi.org/10.5281/zenodo.3937741)
Physcraper relies on [taxonomic](https://tree.opentreeoflife.org/about/taxonomy-version/ott3.3) and [phylogenetic](https://github.com/OpenTreeOfLife/phylesystem-1) resources and [programmatic tools](https://github.com/OpenTreeOfLife/germinator/wiki/Open-Tree-of-Life-Web-APIs) from the [Open Tree of Life](https://tree.opentreeoflife.org/opentree/argus/opentree12.3@ott93302) project:
<br/>
*Open Tree of Life, Benjamin Redelings, Luna Luisa Sanchez Reyes, Karen A. Cranston, Jim Allman, Mark T. Holder, & Emily Jane McTavish. (2019). Open Tree of Life Synthetic Tree (Version 12.3). Zenodo* doi: [10.5281/zenodo.3937741](https://doi.org/10.5281/zenodo.3937741)

0 comments on commit 495e4e6

Please sign in to comment.