Skip to content

Commit

Permalink
docs
Browse files Browse the repository at this point in the history
  • Loading branch information
snacktavish committed Jul 2, 2020
1 parent ebe2040 commit d669257
Show file tree
Hide file tree
Showing 4 changed files with 76 additions and 20 deletions.
4 changes: 1 addition & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

<p></p>

## Continual gene tree updating!
## Automated gene tree updating!

Use a tree (from the literature, a synthetic tree from Open Tree of Life, or your own tree) and a single locus alignment to find and add homologous sequences to (hopefully) improve and advance phylogenetic inference in a group.

Expand All @@ -19,7 +19,5 @@ Please post an issue at https://github.com/McTavishLab/physcraper/issues or cont

This is the code repository, please refer to Physcraper's [documentation website](https://physcraper.readthedocs.io/en/latest/) for more details on how to install it and run!

Here, some emojis for you :bowtie: :sparkles: :notes:


:hamster: :palm_tree: :frog: :ear_of_rice: :panda_face: :tulip: :octopus: :blossom: :whale: :mushroom: :ant: :cactus: :fish: :maple_leaf: :water_buffalo: 🦠 :shell: :bug: :octocat:
2 changes: 1 addition & 1 deletion docs/mds/PhyscraperRun.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ To setup see doc/LocalDB.md

You can use your own blast database, for example set up on an AWS server.

\

## Sequence filtering parameters

-tp TRIM_PERC, --trim_perc TRIM_PERC
Expand Down
88 changes: 73 additions & 15 deletions docs/mds/Tutorial.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,67 @@
### Find a study with your taxon of interest

### Updating gene trees

If you have access to a single gene alignment, and a tree, you can automate adding homologous data into your tree by searching GenBank.

While genome scale data is increasing rapidly - there are still large quantities of gene-sequence data being uploaded to NCBI GenBank.

<img src="img/seq_data.png" alt="drawing" width="400"/>


These data are often appropriate for looking at phylogenetic relationships.

Using Physcraper we can use Blast to search for loci that are likely to be homologous to sequences in an existing alignment.

By using a starting tree and alignment, Physcraper, takes advantage of loci that previous researchers have assessed and deemed appropriate for the phylogentic scope.
The sequences added in the search are limited to a user specified taxon or monophyletic group, or within the taxonomic scope of the in-group of the starting tree.

These automated tree can provide a quick inference or potential relationships, of problems in the taxonomic assignments of sequences, and flag areas of potential systematic interest.


## The Open Tree of Life

The Open Tree of Life (https://opentreeoflife.github.io/) is a project that unites phylogenetic inferences and taxonomy to provide a synthetic estimate of species relationships across the entire tree of life.
![](img/otol_logo.png)


Open Tree of Life aims to construct a comprehensive, dynamic and digitally-available tree of life by synthesizing published phylogenetic trees along with taxonomic data.
Currently the tree comprises 2.3 million tips.
However, only around 90,000 of those taxa are represented by phylogenetic estimates - the rest are placed in the tree based on their taxonomic names.

https://opentreeoflife.github.io/browse/


You can upload your own tree to OpenTree to update it.
See [Submitting-phylogenies-to-Open-Tree-of-Life](https://github.com/OpenTreeOfLife/opentree/wiki/Submitting-phylogenies-to-Open-Tree-of-Life)

## Updating a tree from OpenTree of Life

For this example we'll use find a tree that is already in the database.
The Open Tree of Life data store, [Phylesystem](https://academic.oup.com/bioinformatics/article/31/17/2794/183373), contains more than 4,500 phylogenetic trees from published studies.
The tips in these trees are mapped a unified taxonomy, which makes these data searchable in a phylogenetically explicit way.
This is a great place to start of finding existing estimates of phylogenetic relationships,
and assessing regions of the tree of life which are lacking available phylogenetic estimates.
There is a lot of sequence data that has been generated, but has never been incorporated into any phylogenetic estimates.


### Find a study with your taxon of interest


For this example we'll use find a tree that is already in the OpenTree of Life database.
Search on OpenTree of life using your taxon of interest, e.g. 'Malvaceae'

$ find_trees.py --taxon_name "Malvaceae"

This prints a bunch of studies out to the screen. We will need an alignment to update (which OpenTree doesn't store), so lets just look at trees that have data stored in tree base.

Lets use Wilkie et al (2006). https://tree.opentreeoflife.org/curator/study/view/pg_55
$ find_trees.py --taxon_name "Malvaceae" --treebase

While this study was focussed on the family "Sterculiacea",
phylogenetic inference have suggested that this family is not monophyletic.
There are a bunch of options!

https://tree.opentreeoflife.org/opentree/argus/ottol@996482
Lets update Wilkie et al (2006).
You can view the study on the OpenTree database: [Wilkie2006](https://tree.opentreeoflife.org/curator/study/view/pg_55)

In order to further assess, lets update the tree!
While this study was focussed on the family "Sterculiacea",
phylogenetic inference have suggested that this family is not [monophyletic]((https://tree.opentreeoflife.org/opentree/argus/ottol@996482))

Lets take a look at how recent data affect our inferences of relationships, and if there is sequence data for taxa that don;t have any phylogenetic information available in the tree.

### Run the auto update

Expand All @@ -35,7 +77,7 @@ The blast search part of updating trees takes a long time (for example, this ana

We have put example outputs from this command in `docs/examples/pg_55`, so that you can explore the outputs without waiting for the searches to complete.

### Output stucture
### Output files

The analysis folder has several sub directories.
each folder is labeled with a 'tag', which by default is the alignment name, but can be set in the `physcraper_run.py` arguments.
Expand All @@ -56,19 +98,35 @@ The structure consists of:
- outputs
-- final tree and alignment

-- CSV file with information about sequences

-- CSV file with information about each sequence



### Compare your new tree to existing relationships

A correctly rooted phylogeny is needed to compare taxonomic groups.
Rooting phylogenies can be tricky. While physcraper places a suggested root based on the taxonomic relationships in OpenTree, it often is unreliable.
Rooting phylogenies can be tricky. While physcraper places a suggested root based on the taxonomic relationships in OpenTree,
this root can be unreliable, especially if taxonomy is a poor fit to true evolutionary relationships.

There is a simple tree comparison script at `tree_comparison.py`
There is a simple tree comparison script, `tree_comparison.py`

Detailed explanation of that script, amre more ways to explore the data are described in [DataExploration](./DataExploration.md)
Detailed explanation of that script, and more ways to explore the data are described in [DataExploration](./DataExploration.md)


tree_comparison.py -d docs/examples/pg_55/ -og otu376420 otu376439 otu376452 -o pg_55_comparison


## Using your own tree and alignment

You can upload your own tree to OpenTree to update it, and that way it will be included in the synthetic tree!
See [Submitting-phylogenies-to-Open-Tree-of-Life](https://github.com/OpenTreeOfLife/opentree/wiki/Submitting-phylogenies-to-Open-Tree-of-Life)

If you aren't ready to share your tree publicly, you can update it without posting to OpenTree.

You need an alignment (single locus) and a tree. The taxon labels in these two files should be the same.

You also need a file linking the labels in your tree and alignment to broader taxonomy. This can be easily generated vis OPenTrees Bulk Taxonomic Name Resolution Service. [Bulk TNRS](https://tree.opentreeoflife.org/curator/tnrs/)

Example names:

physcraper_run.py -tf tests/data/tiny_test_example/test.tre -tfs newick -a tests/data/tiny_test_example/test.fas --taxon_info tests/data/tiny_test_example/main.json -as fasta -o owndata
2 changes: 1 addition & 1 deletion docs/mds/intro.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<img align="right" width="250" src="https://cdn.rawgit.com/snacktavish/physcraper/main/docs/physcraper.svg">
<img align="left" width="250" src="https://raw.githubusercontent.com/McTavishLab/physcraper/main/docs/physcraper.svg">

[![Build Status](https://travis-ci.org/McTavishLab/physcraper.svg?branch=main)](https://travis-ci.org/McTavishLab/physcraper)[![Documentation](https://readthedocs.org/projects/physcraper/badge/?version=latest&style=flat)](https://physcraper.readthedocs.io/en/latest/)[![codecov](https://codecov.io/gh/McTavishLab/physcraper/branch/main/graph/badge.svg)](https://codecov.io/gh/McTavishLab/physcraper)

Expand Down

0 comments on commit d669257

Please sign in to comment.