Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Documentation (concept and input_output) #2

Merged
merged 35 commits into from
Mar 25, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
98daa45
Update documentation (concept.rst)
Mar 9, 2021
68212ac
Update Documentation (input_output.rst)
Mar 9, 2021
bd5d3ef
Update usage.rst (file paths and descriptions)
Mar 9, 2021
1a53355
Apply suggestions for concept.rst
Cecilia-Sensalari Mar 17, 2021
1b797d4
Apply suggestions in input_output.txt
Cecilia-Sensalari Mar 17, 2021
a37c907
Apply suggestions in usage.rst
Cecilia-Sensalari Mar 17, 2021
b44ab6b
Merge master into docs
Mar 18, 2021
69f9fcb
Update concept.rst
Cecilia-Sensalari Mar 18, 2021
51ea14b
Merge branch 'master' into docs
Cecilia-Sensalari Mar 18, 2021
9f784ab
Delete .buildinfo
Cecilia-Sensalari Mar 18, 2021
8895d4c
Describe ks.tsv file format
Mar 19, 2021
72de507
Remove "qsub -b y"
Mar 19, 2021
8ac5d92
Add (E)LMM file format description; internal links
Mar 20, 2021
6a71e34
Add (E)LMM file figures
Mar 20, 2021
78bbc84
Rename header in ELMM TSV file
Mar 20, 2021
cac9706
Restructure mixture model section and add table
Mar 20, 2021
e8999dc
Reduce paralog analyses description in docs
Mar 21, 2021
cf82d99
Add Note block in input_output.rst about filenames
Mar 22, 2021
f76c6eb
Minor change in Note block
Mar 22, 2021
9fa0dc6
Rename headers in main.nf
Cecilia-Sensalari Mar 23, 2021
9b14e55
Apply suggestions to concept.rst
Cecilia-Sensalari Mar 24, 2021
6aca23c
Apply suggestions to input_output.rst
Cecilia-Sensalari Mar 24, 2021
5506420
Apply suggestions to usage.rst
Cecilia-Sensalari Mar 24, 2021
b48a844
Apply suggestions to paralogs_analyses.rst
Cecilia-Sensalari Mar 24, 2021
ba4a9d4
Nextflow header with "ksrates"
Mar 24, 2021
3f48511
Italics "ksrates" in FAQs header
Mar 24, 2021
8ef4f00
Add preprint DOI in "How to cite us" doc page
Mar 24, 2021
42b4075
Remove "distribution_peak_estimate"
Mar 24, 2021
26e5d04
Update input_output.rst with other suggestions
Mar 24, 2021
0343c1d
Update usage.rst with other suggestions
Mar 24, 2021
9e0000e
Update par_analys.rst with other suggestions
Mar 24, 2021
5f67a59
Update installation.rst with other suggestions
Mar 24, 2021
96e9069
Apply suggestions to How to cite us
Cecilia-Sensalari Mar 25, 2021
2534128
Reintroduce "distribution_peak_estimate" parameter
Mar 25, 2021
106deb9
Replace <focal species> with species (filename)
Cecilia-Sensalari Mar 25, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 21 additions & 18 deletions doc/source/concept.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,43 +4,46 @@
Substitution rate-adjustment strategy in a nutshell
===================================================

``ksrates`` is a package for substitution rate-adjustment in mixed ortholog and paralog *K*:sub:`S` distributions.
To position ancient WGD events with respect to speciation events in a phylogeny, it is common practice to superimpose a paralog *K*:sub:`S` distribution for a species of interest with ortholog *K*:sub:`S` distributions between this species and other species to obtain a mixed plot.
Cecilia-Sensalari marked this conversation as resolved.
Show resolved Hide resolved
However, when the lineages involved exhibit different substitution rates, the *K*:sub:`S` distributions are built on different *K*:sub:`S` scales and a direct comparison among them is likely to mislead the phylogenetic interpretation of WGD signatures or the divergence order.
Cecilia-Sensalari marked this conversation as resolved.
Show resolved Hide resolved

Mixed *K*:sub:`S` distributions are one of the approaches applied to detect whole-genome duplications (WGDs) and to locate them in a phylogeny. A mixed plot is composed of ortholog *K*:sub:`S` distributions - representing divergence events - overlapped onto paralog *K*:sub:`S` distributions - representing the duplication history of a species genome. The relative positions of the ortholog peaks and the WGDs peaks are informative about the order of the depicted evolutionary events, allowing to place the occurrence of a WGDs in a specific branch of the evolutionary history of the species.
``ksrates`` is an open-source tool offering a rate-adjustment strategy that brings all the distributions to a common *K*:sub:`S` scale by compensating for the synonymous substitution rate differences relative to one species. The final mixed plot produced by ``ksrates`` features adjustemnts in the position of the ortholog *K*:sub:`S` distributions that help in the clarification of WGD placement in the context of the provided phylogenetic tree.
Cecilia-Sensalari marked this conversation as resolved.
Show resolved Hide resolved

The reliability of a mixed plot can be jeopardized in case of (remarkable) substitution rate differences between the involved species. In fact, since the *K*:sub:`S` value of a homolog pair depends on the substitution rate of the species, different distributions end up to be built on different *K*:sub:`S` scales. A direct overlap of distributions is therefore likely to lead to unreliable interpretations.

The *K*:sub:`S` rate-adjustment package offers an adjustment procedure that brings all the distributions to a common *K*:sub:`S` scale by compensating for the substitution rate differences relatively to one "main" species.
The rate-adjusted mixed plot obtained through ``ksrates`` is composed of a) a single paralog distribution coming from the main species and b) one or more ortholog distributions between the main species and the another species. The analysis is thus focused on the genome duplication history of the main species in the context of its evolutionary history with the other species.

The rate-adjustment is applied to all the ortholog distributions. For each ortholog distribution, principles from the relative rate test (RRT) are used to detect the relative rates between the main species and the other species. During the rate-adjustment, the ortholog *K*:sub:`S` peak is re-encoded as twice the relative rate of the main species, so that the age of the ortholog distribution is adapted to the *K*:sub:`S` scale of the paralog distribution. At the end, all ortholog distributions are seen from the perspective of the main species rate.
The rate-adjustment generates horizontal shifts of the ortholog distribution peak towards left if the main species is slower than the other species, or towards right if it is faster. The new disposition of the divergence events can lead to a different and more reliable interpretation of WGD placement or of the order of the divergences themselves.
For more details about the rate-adjustment strategy, see [...].
For more details about the rate-adjustment strategy, see our `preprint <https://www.biorxiv.org/content/10.1101/2021.02.28.433234v1>`__.
Cecilia-Sensalari marked this conversation as resolved.
Show resolved Hide resolved


.. _`explained_example`:

Explained example
=================

This example studies the phylogenetic placement of WGD signatures present in oil palm (*Elaeis guineensis*) paralog distribution. The rate-adjustment pipeline needs a input phylogenetic tree and the sequence data of all involved species. The minimum input tree is composed by the focal species (palm), another species (rice) and their outgroup (asparagus): ``((palm, rice)), asparagus)``.
.. The mixed plot will show the palm paralog distribution overlapped with the rate-adjusted ortholog distributions involving palm and the other species in the input tree.
This explained example studies the phylogenetic placement of WGD signatures present in oil palm (*Elaeis guineensis*) paralog *K*:sub:`S` distribution in the context of a small monocots phylogeny composed by the species of interest (oil palm), *Oryza sativa* (rice) and their outgroup *Asparagus officinalis* (asparagus). Such input tree is provided in Newick format: ``((palm, rice)), asparagus)``.
Cecilia-Sensalari marked this conversation as resolved.
Show resolved Hide resolved
From the perspective of oil palm history there are two divergence nodes (i.e. ortholog *K*:sub:`S` distributions) to be rate-adjusted, namely palm-rice and palm-asparagus.
Cecilia-Sensalari marked this conversation as resolved.
Show resolved Hide resolved

The detection of substituion rate differences makes use of principles of the relative rate test (REF) and requires therefore an outgroup species.
Cecilia-Sensalari marked this conversation as resolved.
Show resolved Hide resolved
The pipeline breaks down the tree into *trios* composed by the species pair of a ortholog distribution and the outgroup used for its rate-adjustment. The example tree gives only one trio, "palm, rice, asparagus", where palm-rice divergence will be rate-adjusted with outgroup asparagus.
Cecilia-Sensalari marked this conversation as resolved.
Show resolved Hide resolved

From the perspective of palm history there are two divergence events (i.e. ortholog distributions) in this tree, namely palm-rice and palm-asparagus. The pipeline breaks down the tree into *trios* composed by the species pair of a ortholog distribution and an outgroup used for its rate-adjustment. The example tree gives only one trio, "palm, rice, asparagus", where palm-rice divergence is rate-adjusted with outgroup asparagus. Palm-asparagus divergence has instead no outgroup in this tree and will be ignored; to avoid this, add another outgroup to the phylogeny, e.g. ``(((palm, rice), asparagus), spirodela)``. The user can also decide to perform multiple rate-adjustments for a divergence if the tree structure allows it: for example in this latter tree palm-rice can be rate-adjusted both with asparagus and spirodela (*Spirodela polyrhiza*).
.. note ::
Palm-asparagus divergence has no outgroup in this tree and can't be adjusted; to be able to take it into account one should extend the phylogeny with one extra species that can function as their outgroup, e.g. *Spirodela polyrhiza*: ``(((palm, rice), asparagus), spirodela)``.
Cecilia-Sensalari marked this conversation as resolved.
Show resolved Hide resolved
By default, if more than one outgroup is available for a species pair, multiple rate-adjustments are performed and the mean among them is taken as consensus. For example, in the extended tree palm-rice would be adjusted both with ``asparagus`` and ``spirodela`` outgroups.
Cecilia-Sensalari marked this conversation as resolved.
Show resolved Hide resolved

Further on, the pipeline breaks down the trios into the three possible species pairs they are composed of, which in this case are palm-rice, palm-asparagus and rice-asparagus. ``wgd`` package then estimates the ortholog *K*:sub:`S` distribution for each of them. The ortholog distributions are simplified to a vertical line centered on their peak value (Figure 1).
The three ortholog *K*:sub:`S` distributions obtained from palm-rice-asparagus trio are approximated to their estimated mode (1.53 *K*:sub:`S`) with associated standard deviation (Figure 1; for more details please refer to Supplementary Materials, currently in preprint).
Cecilia-Sensalari marked this conversation as resolved.
Show resolved Hide resolved

.. figure:: _images/ortholog_distribution_peak.svg
:align: center
:width: 350
:alt: The figure shows the bell-shaped ortholog KS distribution obtained for palm and rice approximated to a vertical line passing through the estimated mode (1.53 KS). A thin colored rectangular box behind this line highlights the associated standard deviation (0.01 KS).
Cecilia-Sensalari marked this conversation as resolved.
Show resolved Hide resolved

The ortholog distribution for palm and rice is approximated to a vertical line passing through its esitmated mode (1.53 *K*:sub:`S`).
Cecilia-Sensalari marked this conversation as resolved.
Show resolved Hide resolved

Through principles of the relative rate test the ortholog *K*:sub:`S` estimate between palm and rice (1.53 *K*:sub:`S`) is decomposed into the two branch-specific *K*:sub:`S` contributions: palm contributes with 0.365 while rice with 1.17. The difference between them suggests that palm has a much lower substitution rate than rice.
Cecilia-Sensalari marked this conversation as resolved.
Show resolved Hide resolved

The ortholog distribution for palm and rice is approximated to its mode (1.53 *K*:sub:`S`).

The RRT uses the *K*:sub:`S` values of the three ortholog peaks to compute the relative rates of the divergent pair: palm has a relative rate of about 0.36 while rice of 1.17, therefore palm accumulates substitution much more slowly than rice. Lastly, the rate-adjustment reinterprets the ortholog *K*:sub:`S` peak of palm-rice by encoding it as twice the relative rate of palm (*K*:sub:`S`' = 0.73). The ortholog peak has therefore been largely shifted to the left from 1.53 to 0.73 *K*:sub:`S` (Figure 2), and it is now adapted to the slow scale of palm paralog distribution. The shift has important consequences in the interpretation of the mixed plot concerning the older WGD signal around 0.9 *K*:sub:`S`.
The ortholog *K*:sub:`S` estimate of palm-rice is then adjusted by rescaling it as twice the branch contribution of palm (*K*:sub:`S` --> 0.365 + 0.365 = 0.73). The position of the divergence line results then largely shifted towards left from 1.53 to 0.73 *K*:sub:`S` (Figure 2) and it is now adapted to the slower scale of palm paralog distribution. Interestingy, the rate-adjusted mixed plot offers a new interpretation for the placement of the older WGD signature located around 0.9 *K*:sub:`S`, from being palm-specific to be shared with rice and potentially other monocots.
Cecilia-Sensalari marked this conversation as resolved.
Show resolved Hide resolved

.. figure:: _images/mixed_palm_corrected.svg
:align: center
:width: 800
:alt: In this figure the mixed plot shows the rate-adjusted ortholog KS distribution for oil palm and rice as a vertical line superimposed to the paralog KS distribution of oil palm. The vertical line has been shifted towards left and has crossed a WGD peak from its right side to its left side, as highlighted by an arrow.
Cecilia-Sensalari marked this conversation as resolved.
Show resolved Hide resolved

The ortholog distribution peak (red line) has been shifted towards left after rate-adjustment, as highlighted by the red arrows starting from the original position and pointing at the new rate-adjusted position.
The ortholog *K*:sub:`S` estimate (red vertical line) has been shifted towards left after rate-adjustment, as highlighted by the red arrows starting from the original position and pointing at the new rate-adjusted position.
Cecilia-Sensalari marked this conversation as resolved.
Show resolved Hide resolved

Loading