Skip to content

Commit

Permalink
Switch to relative links for embedded figures
Browse files Browse the repository at this point in the history
  • Loading branch information
trvrb committed Jan 11, 2017
1 parent b11b8e6 commit 25d1faf
Show file tree
Hide file tree
Showing 5 changed files with 56 additions and 58 deletions.
4 changes: 2 additions & 2 deletions README.md
Expand Up @@ -24,5 +24,5 @@ Cross-species transmission (CST) has led to many devastating epidemics, but is s
Install Python packages with:

pip install -r requirements.txt
![](/figures/png/Fig3.png)

![](figures/png/Fig3.png)
20 changes: 10 additions & 10 deletions beast/README.md
@@ -1,29 +1,29 @@
## Constructing phylogenies for each portion of the lentiviral genome
To assess cross-species transmission, we first built a posterior distribution of ~2000-4000 trees for each segment of the genome [(as identified by GARD)](https://github.com/blab/siv-cst/tree/master/recombination). Each tip was labeled with the known host state; host state at internal nodes was inferred.
To assess cross-species transmission, we first built a posterior distribution of ~2000-4000 trees for each segment of the genome [(as identified by GARD)](../recombination/). Each tip was labeled with the known host state; host state at internal nodes was inferred.
#### To reproduce
XMl files can be regenerated by running
`python path/to/siv-cst/scripts/beastSetup/empiricalTrees_makexml.py path/to/siv-cst/scripts/beastSetup/xmlTemplates/empiricalTrees_template.xml`
(just adjust chain length and sampling frequency in the template per Methods for each dataset) from within the
`path/to/siv-cst/data/select_your_dataset/segments` directory.
Run with BEAST v. 2.4.0 and dynamic BEAGLE scaling.

#### Results
Trees represent the maximum clade credibility trees for each segment of the main dataset alignment after the discrete trait analysis (below), color coded by host state. Notably, the topologies vary widely between trees, emphasizing the extent of recombination and the variable selective pressures experienced by each region.
![](https://github.com/blab/siv-cst/blob/master/figures/png/FigS5.png)

![](../figures/png/FigS5.png)

## Using discrete trait analysis to identify ancient cross-species transmissions
In phylogenetic trees of viral sequences, cross-species transmission appears as a mismatch between the host of a virus and the host of that virus’s ancestor. Heuristically, in the trees above this appears like a change in color between the tips and the internal nodes. To identify this pattern and estimate how frequently each pair of hosts has exchanged lentiviruses, we used the posterior distribution of phylogenies generated for each segment to estimate rates of host state transition between each pair of hosts.

#### To reproduce
Run `python path/to/siv-cst/scripts/beastSetup/rates_makexml.py path/to/siv-cst/scripts/beastSetup/xmlTemplates/rates_mastertemplate.xml`
(again, just adjust the chain length and sampling for your dataset per the Methods) from within
`path/to/siv-cst/data/select_your_dataset/segments`.
Run with BEAST v.2.4.0 and dynamic BEAGLE scaling.

Parse results with `scripts/beastAnalysis/parse_matrix.py`. You may need to add the included `beastmatrix.py` to your `PYTHONPATH` environment variable.

#### Results
For the main dataset, we identify 14 novel cross-species transmission events with high certainty (Bayes factor >= 10, black arrows). We identify numerous other transmissions with 3 <= BF < 10 (shaded in gray). Arrow width corresponds to actual rate value averaged over posterior samples; opacity corresponds to Bayes Factor. Circle size for each tip corresponds to [network centrality scores](https://github.com/blab/siv-cst/blob/master/scripts/beastAnalysis/eigen.ipynb). The [host mitochondrial DNA maximum likelihood phylogeny](../data/hosts/host_mtdna.nexus) forms the outer circle. Raw text values for Bayes factors and actual rates can be seen for the main dataset [here](https://github.com/blab/siv-cst/tree/master/beast/main/discreteTraits/results) and visualized [here](https://github.com/blab/siv-cst/blob/master/figures/png/FigS4.png).
![](https://github.com/blab/siv-cst/blob/master/figures/png/Fig3.png)
For the main dataset, we identify 14 novel cross-species transmission events with high certainty (Bayes factor >= 10, black arrows). We identify numerous other transmissions with 3 <= BF < 10 (shaded in gray). Arrow width corresponds to actual rate value averaged over posterior samples; opacity corresponds to Bayes Factor. Circle size for each tip corresponds to [network centrality scores](../scripts/beastAnalysis/eigen.ipynb). The [host mitochondrial DNA maximum likelihood phylogeny](../data/hosts/host_mtdna.nexus) forms the outer circle. Raw text values for Bayes factors and actual rates can be seen for the main dataset [here](main/discreteTraits/results) and visualized [here](../figures/png/FigS4.png).

![](../figures/png/Fig3.png)
19 changes: 9 additions & 10 deletions figures/main-text/README.md
@@ -1,19 +1,18 @@
![F1](https://github.com/blab/siv-cst/blob/master/figures/png/Fig1.png)
![F1](../png/Fig1.png)
###Figure 1: There have been at least 13 interlineage recombination events among SIVs.
The SIV LANL compendium, slightly modified to reduce overrepresentation of HIV, was analyzed with GARD to identify the 13 recombination breakpoints across the genome (dashed lines in B; numbering according to the accepted HXB2 reference genome--accession K03455, illustrated). Two of these breakpoints were omitted from further analyses because they created extremely short fragments (< 500 bases; gray dashes in **B**). For each of the 11 remaining breakpoints used in further analyses, we split the compendium alignment along these breakpoints and built a maximum likelihood tree, displayed in **(A)**. Each viral sequence is color-coded by host species, and its phylogenetic position is traced between trees. Heuristically, straight, horizontal colored lines indicate congruent topological positions between trees (likely not a recombinant sequence); criss-crossing colored lines indicate incongruent topological positions between trees (likely a recombinant sequence).
![F2](https://github.com/blab/siv-cst/blob/master/figures/png/Fig2.png)

![F2](../png/Fig2.png)
###Figure 2: Cross-species transmissions are inferred from tree topologies; SIVcpz has mosaic origins.
A,B,C - Bayesian maximum clade credibility (mcc) trees are displayed for segments 2 (gag - A), 6 (int and vif - B), and 9 (env – C) of the main dataset (N=423). Tips are color coded by known host species; internal nodes and branches are colored by inferred host species, with saturation indicating the confidence of these assignments. Monophyletic clades of viruses from the same lineage are collapsed, with the triangle width proportional to the number of represented sequences. An example of likely cross-species transmission is starred in each tree, where the host state at the internal node (red / mona monkeys) is incongruent with the descendent tips' known host state (green / talapoin monkeys), providing evidence for a transmission from mona monkeys to talapoin monkeys. Another example of cross-species transmission of a recombinant virus among African green monkeys is marked with a dagger.
D - The genome map of SIVcpz, with breakpoints used for the discrete trait analysis, is color coded and labeled by the most likely ancestral host for each segment of the genome.
![F3](https://github.com/blab/siv-cst/blob/master/figures/png/Fig3.png)



![F3](../png/Fig3.png)
###Figure 3: Most lentiviruses are the product of ancient cross-species transmissions.
The phylogeny of the host species' mitochondrial genomes forms the outer circle. Arrows represent transmission events inferred by the model with Bayes' factor (BF) >= 3.0; black arrows have BF >= 10, with opacity of gray arrows scaled for BF between 3.0 and 10.0. Width of the arrow indicates the rate of transmission (actual rates = rates * indicators). Circle sizes represent network centrality scores for each host. Transmissions from chimps to humans; chimps to gorillas; gorillas to humans; sooty mangabeys to humans; sabaeus to tantalus; and vervets to baboons have been previously documented. To our knowledge, all other transmissions illustrated are novel identifications.
![F4](https://github.com/blab/siv-cst/blob/master/figures/png/Fig4.png)

![F4](../png/Fig4.png)
###Figure 4: Cross-species transmission is driven by exposure and constrained by host genetic distance.
For each pair of host species, we (A) calculated the log ratio of their average body masses and (B) found the patristic genetic distance between them (from a maximum-likelihood tree of mtDNA). To investigate the association of these predictors with cross-species transmission, we treated transmission as a binary variable: 0 if the Bayes factor for the transmission (as inferred by the discrete traits model) was < 3.0, and 1 for a Bayes factor >= 3.0. Each plot shows raw predictor data in gray; the quintiles of the predictor data in green; and the logistic regression and 95% CI in blue.

55 changes: 27 additions & 28 deletions figures/supplement/README.md
@@ -1,59 +1,58 @@
![S1](https://github.com/blab/siv-cst/blob/master/figures/png/FigS1.png)
![S1](../png/FigS1.png)
###Figure S1: Extensive divergence makes sitewise measures of genetic linkage ineffective
For pairs of biallelic sites (ignoring rare variants), R^2 was used to estimate how strongly the allele in one site predicts the allele in the second site, with values of 0 indicating no linkage and 1 indicating perfect linkage. The mean value of R^2 was 0.044, indicating very low levels of linkage overall.


----------
![S2](https://github.com/blab/siv-cst/blob/master/figures/png/FigS2.png)
![S2](../png/FigS2.png)
###Figure S2: No evidence of linkage between nonadjacent segments of the SIV genome.
The alignment used for GARD analyses (LANL compendium with HIV overrepresentation reduced) was split along the breakpoints identified by GARD to yield the 12 genomic segments, and a maximum likelihood tree was constructed for each. The number of steps required to turn one tree topology into another was assessed for each pair of trees with the Rooted Subtree-Prune-and-Regraft (rSPR) package. Segment pairs with similar topologies have lower scores than segments with less similar topologies.



----------
![S3](https://github.com/blab/siv-cst/blob/master/figures/png/FigS3.png)
![S3](../png/FigS3.png)
###Figure S3 Distribution of the number of sequences per host included in analyses
A: All available high-quality lentivirus sequences were randomly subsampled up to 25 sequences per host for the main dataset. We included the 24 hosts with at least 5 sequences available in this dataset. B: For the supplemental dataset, we randomly subsampled up to 40 sequences per host, and included the 15 hosts with at least 16 sequences available in this dataset. For both datasets, a small number of additional sequences were permitted for the few hosts that are infected by multiple viral lineages in order to represent the full breadth of known genetic diversity of lentiviruses in each host population.



----------
![S4](https://github.com/blab/siv-cst/blob/master/figures/png/FigS4.png)
![S4](../png/FigS4.png)
###Figure S4: Actual rates and Bayes factors for main dataset discrete trait analyses
Values for the asymmetric transition rates between hosts, as estimated by the CTMC, were calculated as rate * indicator (element-wise for each state logged). We report the average posterior values above. Bayes factors represent a ratio of the posterior odds / prior odds that a given actual rate is non-zero. Because each of the 12 segments contributes to the likelihood, but they have not evolved independently, we divide all Bayes factors by 12 and report the adjusted values above (and throughout the text).



----------
![S5](https://github.com/blab/siv-cst/blob/master/figures/png/FigS5.png)
![S5](../png/FigS5.png)
###Figure S5: Maximum clade credibility trees for each of the 12 GARD-identified genomic segments of the lentiviral genome
Tips are color coded by known host state; branches and internal nodes are color coded by inferred host state, with color saturation indicating the confidence of these assignments. Monophyletic clades of viruses from the same lineage are collapsed, with the triangle width proportional to the number of represented sequences.



----------
![S6](https://github.com/blab/siv-cst/blob/master/figures/png/FigS6.png)
![S6](../png/FigS6.png)
###Figure S6: Most lentiviruses are the product of ancient cross-species transmissions (supplemental dataset).
The phylogeny of the host species' mitochondrial genomes forms the outer circle (gray: not included in supplemental dataset). Arrows with filled arrowheads represent transmission events inferred by the model with Bayes' factor (BF) >= 3.0; black arrows have BF >= 10, with opacity of gray arrows scaled for BF between 3.0 and 10.0. Transmissions with 2.0 <= BF < 3.0 have open arrowheads (see discussion). Width of the arrow indicates the rate of transmission (actual rates = rates * indicators). Circle sizes represent network centrality scores for each host. Transmissions from chimps to humans; chimps to gorillas; gorillas to humans; sooty mangabeys to humans; sabaeus to tantalus; and vervets to baboons have been previously documented. To our knowledge, all other transmissions illustrated are novel identifications.



----------
![S7](https://github.com/blab/siv-cst/blob/master/figures/png/FigS7.png)
![S7](../png/FigS7.png)
###Figure S7: Actual rates and Bayes factors for supplemental dataset discrete trait analyses
Values for the asymmetric transition rates between hosts, as estimated by the CTMC, were calculated as rate * indicator (element-wise for each state logged). We report the average posterior values above. Bayes factors represent a ratio of the posterior odds / prior odds that a given actual rate is non-zero. Because each of the 12 segments contributes to the likelihood, but they have not evolved independently, we divide all Bayes factors by 12 and report the adjusted values above (and throughout the text).



----------
![S8](https://github.com/blab/siv-cst/blob/master/figures/png/FigS8.png)
![S8](../png/FigS8.png)
###Figure S8: Maximum clade credibility trees for each of the 12 GARD-identified genomic segments of the lentiviral genome (supplemental dataset)
Tips are color coded by known host state; branches and internal nodes are color coded by inferred host state, with color saturation indicating the confidence of these assignments. Monophyletic clades of viruses from the same lineage are collapsed, with the triangle width proportional to the number of represented sequences.


----------
![S9](https://github.com/blab/siv-cst/blob/master/figures/png/FigS9.png)
![S9](../png/FigS9.png)
###Figure S9: Comparison of Main and Supplemental Dataset Discrete Trait Analysis Results

Each datapoint represents on of the 210 possible transmissions between each pair of the 15 hosts present in both datasets. The black dashed line shows y=x; the linear regression and 95% CI are shown in gray.

16 changes: 8 additions & 8 deletions recombination/README.md
@@ -1,11 +1,11 @@
##Recombination complicates the natural history of lentiviruses

Lentiviruses notoriously recombine, which effectively means that different portions of the lentiviral genome have different evolutionary--and phylogenetic--histories. Thus, in order to reconstruct the history of cross-species transmission among primate lentiviruses, we had to first assess the extent of recombination _between_ SIV lineages. We used GARD to look for evidence of recombination breakpoints across a [modified version](./lanlCompendium15_lessHIV.fasta) of the LANL compendium alignment of HIV and SIVs.

To ease computational intensity, we ran GARD on 3kb regions of the genome (with 1kb overlaps on either end). We repeated this with the windows offset in order to control for proximity to region edges. Alignment coordinates listed. All analyses used the 012234 nucleotide model and 3 bins of site variation.

Window|Start|End|N breakpoints found
---|---|---|---
---|---|---|---
A|1|3000|1
B|2000|5000|2
C|4000|7000|3
Expand All @@ -19,9 +19,9 @@ J|5500|8500|2
K|7500|10500|2
L|9500|12500|3
M|11500|end|2

__Consensus breakpoint coordinates:__ 1, 2317, 2987, 3593, 4888, 5824, _6320 (omitted)_, 7474, 8136, 9084, 9734, 10716, _11068 (omitted)_, 11766, 13500

We then split the [full dataset alignments](../data/) along these breakpoints (all alignments held the compendium fixed and discarded insertions, coordinates are the same) and built maximum likelihood trees with RAxML for each segment. Tracing each taxa across segments allows us to visualize how each taxa's phylogenetic placement varies across breakpoints.
![Recombination summary](https://github.com/blab/siv-cst/blob/master/figures/png/Fig1.png)

![Recombination summary](../figures/png/Fig1.png)

0 comments on commit 25d1faf

Please sign in to comment.