Permalink
Browse files

Merged branch master into master

  • Loading branch information...
2 parents cfc0164 + 4f91a45 commit b5ff9af47b5c699ef29db6ce9b2313eea8f98a0c @sidneymbell sidneymbell committed Jan 11, 2017
Showing with 75 additions and 7 deletions.
  1. +1 −7 figures/README.md
  2. +15 −0 figures/main-text/README.md
  3. +59 −0 figures/supplement/README.md
View
@@ -1,7 +1 @@
-# Figures
-
-## Supplement
-
-### [Figure S3: data_distrib](supplement/data_distrib.ipynb)
-
-![](png/FigS3_data_distrib.png)
+# Figures: [Main text](main-text/), [Supplement](supplement/)
@@ -0,0 +1,15 @@
+![F1](https://github.com/blab/siv-cst/blob/master/figures/png/Fig1.png)
+###Figure 1: There have been at least 13 interlineage recombination events among SIVs.
+The SIV LANL compendium, slightly modified to reduce overrepresentation of HIV, was analyzed with GARD to identify the 13 recombination breakpoints across the genome (dashed lines in B; numbering according to the accepted HXB2 reference genome--accession K03455, illustrated). Two of these breakpoints were omitted from further analyses because they created extremely short fragments (< 500 bases; gray dashes in **B**). For each of the 11 remaining breakpoints used in further analyses, we split the compendium alignment along these breakpoints and built a maximum likelihood tree, displayed in **(A)**. Each viral sequence is color-coded by host species, and its phylogenetic position is traced between trees. Heuristically, straight, horizontal colored lines indicate congruent topological positions between trees (likely not a recombinant sequence); criss-crossing colored lines indicate incongruent topological positions between trees (likely a recombinant sequence).
+
+![F2](https://github.com/blab/siv-cst/blob/master/figures/png/Fig2.png)
+###Figure 2:
+
+![F3](https://github.com/blab/siv-cst/blob/master/figures/png/Fig3.png)
+###Figure 3: Most lentiviruses are the product of ancient cross-species transmissions.
+The phylogeny of the host species' mitochondrial genomes forms the outer circle. Arrows represent transmission events inferred by the model with Bayes' factor (BF) >= 3.0; black arrows have BF >= 10, with opacity of gray arrows scaled for BF between 3.0 and 10.0. Width of the arrow indicates the rate of transmission (actual rates = rates * indicators). Circle sizes represent network centrality scores for each host. Transmissions from chimps to humans; chimps to gorillas; gorillas to humans; sooty mangabeys to humans; sabaeus to tantalus; and vervets to baboons have been previously documented. To our knowledge, all other transmissions illustrated are novel identifications.
+
+![F4](https://github.com/blab/siv-cst/blob/master/figures/png/Fig4.png)
+###Figure 4: Cross-species transmission is driven by exposure and constrained by host genetic distance.
+For each pair of host species, we (A) calculated the log ratio of their average body masses and (B) found the patristic genetic distance between them (from a maximum-likelihood tree of mtDNA). To investigate the association of these predictors with cross-species transmission, we treated transmission as a binary variable: 0 if the Bayes factor for the transmission (as inferred by the discrete traits model) was < 3.0, and 1 for a Bayes factor >= 3.0. Each plot shows raw predictor data in gray; the quintiles of the predictor data in green; and the logistic regression and 95% CI in blue.
+
@@ -0,0 +1,59 @@
+![S1](https://github.com/blab/siv-cst/blob/master/figures/png/FigS1.png)
+###Figure S1: Extensive divergence makes sitewise measures of genetic linkage ineffective
+For pairs of biallelic sites (ignoring rare variants), R^2 was used to estimate how strongly the allele in one site predicts the allele in the second site, with values of 0 indicating no linkage and 1 indicating perfect linkage. The mean value of R^2 was 0.044, indicating very low levels of linkage overall.
+
+
+ ----------
+ ![S2](https://github.com/blab/siv-cst/blob/master/figures/png/FigS2.png)
+###Figure S2: No evidence of linkage between nonadjacent segments of the SIV genome.
+The alignment used for GARD analyses (LANL compendium with HIV overrepresentation reduced) was split along the breakpoints identified by GARD to yield the 12 genomic segments, and a maximum likelihood tree was constructed for each. The number of steps required to turn one tree topology into another was assessed for each pair of trees with the Rooted Subtree-Prune-and-Regraft (rSPR) package. Segment pairs with similar topologies have lower scores than segments with less similar topologies.
+
+
+
+----------
+![S3](https://github.com/blab/siv-cst/blob/master/figures/png/FigS3.png)
+###Figure S3 Distribution of the number of sequences per host included in analyses
+A: All available high-quality lentivirus sequences were randomly subsampled up to 25 sequences per host for the main dataset. We included the 24 hosts with at least 5 sequences available in this dataset. B: For the supplemental dataset, we randomly subsampled up to 40 sequences per host, and included the 15 hosts with at least 16 sequences available in this dataset. For both datasets, a small number of additional sequences were permitted for the few hosts that are infected by multiple viral lineages in order to represent the full breadth of known genetic diversity of lentiviruses in each host population.
+
+
+
+----------
+![S4](https://github.com/blab/siv-cst/blob/master/figures/png/FigS4.png)
+###Figure S4: Actual rates and Bayes factors for main dataset discrete trait analyses
+Values for the asymmetric transition rates between hosts, as estimated by the CTMC, were calculated as rate * indicator (element-wise for each state logged). We report the average posterior values above. Bayes factors represent a ratio of the posterior odds / prior odds that a given actual rate is non-zero. Because each of the 12 segments contributes to the likelihood, but they have not evolved independently, we divide all Bayes factors by 12 and report the adjusted values above (and throughout the text).
+ 
+
+
+----------
+![S5](https://github.com/blab/siv-cst/blob/master/figures/png/FigS5.png)
+###Figure S5: Maximum clade credibility trees for each of the 12 GARD-identified genomic segments of the lentiviral genome
+Tips are color coded by known host state; branches and internal nodes are color coded by inferred host state, with color saturation indicating the confidence of these assignments. Monophyletic clades of viruses from the same lineage are collapsed, with the triangle width proportional to the number of represented sequences.
+
+
+
+----------
+![S6](https://github.com/blab/siv-cst/blob/master/figures/png/FigS6.png)
+###Figure S6: Most lentiviruses are the product of ancient cross-species transmissions (supplemental dataset).
+The phylogeny of the host species' mitochondrial genomes forms the outer circle (gray: not included in supplemental dataset). Arrows with filled arrowheads represent transmission events inferred by the model with Bayes' factor (BF) >= 3.0; black arrows have BF >= 10, with opacity of gray arrows scaled for BF between 3.0 and 10.0. Transmissions with 2.0 <= BF < 3.0 have open arrowheads (see discussion). Width of the arrow indicates the rate of transmission (actual rates = rates * indicators). Circle sizes represent network centrality scores for each host. Transmissions from chimps to humans; chimps to gorillas; gorillas to humans; sooty mangabeys to humans; sabaeus to tantalus; and vervets to baboons have been previously documented. To our knowledge, all other transmissions illustrated are novel identifications.
+ 
+
+
+----------
+![S7](https://github.com/blab/siv-cst/blob/master/figures/png/FigS7.png)
+###Figure S7: Actual rates and Bayes factors for supplemental dataset discrete trait analyses
+Values for the asymmetric transition rates between hosts, as estimated by the CTMC, were calculated as rate * indicator (element-wise for each state logged). We report the average posterior values above. Bayes factors represent a ratio of the posterior odds / prior odds that a given actual rate is non-zero. Because each of the 12 segments contributes to the likelihood, but they have not evolved independently, we divide all Bayes factors by 12 and report the adjusted values above (and throughout the text).
+
+
+
+----------
+![S8](https://github.com/blab/siv-cst/blob/master/figures/png/FigS8.png)
+###Figure S8: Maximum clade credibility trees for each of the 12 GARD-identified genomic segments of the lentiviral genome (supplemental dataset)
+Tips are color coded by known host state; branches and internal nodes are color coded by inferred host state, with color saturation indicating the confidence of these assignments. Monophyletic clades of viruses from the same lineage are collapsed, with the triangle width proportional to the number of represented sequences.
+ 
+
+----------
+![S9](https://github.com/blab/siv-cst/blob/master/figures/png/FigS9.png)
+###Figure S9: Comparison of Main and Supplemental Dataset Discrete Trait Analysis Results
+
+Each datapoint represents on of the 210 possible transmissions between each pair of the 15 hosts present in both datasets. The black dashed line shows y=x; the linear regression and 95% CI are shown in gray.
+

0 comments on commit b5ff9af

Please sign in to comment.