Skip to content

RCandy: an R package for visualising homologous recombination events in bacterial genomes

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

ChrispinChaguza/RCandy

Repository files navigation

RCandy

RCandy plots a phylogenetic tree in context of strain metadata and recombination events identified by Gubbins (Croucher et al. 2015, Nucleic Acids Research. PMID: 25414349) and BRATNextGen (Marttinen et al. 2012, Nucleic Acids Res. PMID: 22064866).

Installation

You can install RCandy from GitHub with devtools:

install.packages("devtools")
devtools::install_github("ChrispinChaguza/RCandy", build_vignettes = FALSE)

Note, R version >3.6 is required to install the package.

  • Recommended version of R:
    • R (>= 3.6)
  • Required dependencies:
    • ape,
    • dplyr,
    • graphics,
    • grDevices,
    • magrittr,
    • phytools,
    • shape,
    • stats,
    • data.table,
    • stringr,
    • tibble,
    • tidyr,
    • utils,
    • viridis
  • Other optional packages (may be required to build vignettes)
    • knitr,
    • rmarkdown,
    • markdown

Run the code below to build the vignette.

library(RCandy)
library(ape)
library(tidyr)

Load sample data

In this example, we will load sample data for Streptococcus pneumoniae sequence type (ST) 320. This data was generated using genomes described in Gladstone RA et al. EBioMedicine. 2019 May;43:338-346. doi: 10.1016/j.ebiom.2019.04.021. Epub 2019 Apr 16. PMID: 31003929; PMCID: PMC6557916

Note that the metadata file is optional.

tree.file <- system.file("extdata", "ST320.final_tree.tre", package = "RCandy", mustWork = TRUE)
metadata.file <- system.file("extdata", "ST320.tsv", package = "RCandy", mustWork = TRUE)
gubbins.gff <- system.file("extdata", "ST320.recombination_predictions.gff", package = "RCandy", 
    mustWork = TRUE)
ref.genome.gff <- system.file("extdata", "Hungary19A-6.gff", package = "RCandy", 
    mustWork = TRUE)

Running RCandy

The simplest way to run RCandy to generate the phylogenetic tree and taxon metadata data. Here we have selected Country and Source columns in the metadata file. It's highly recommended that the first column in the metadata file should contain taxon names matching the names in the phylogenetic tree. We also specify additional options to ladderize and root the tree at midpoint.

By default the columns in the metadata file are assumed to be separated by tabs "\t", but this can be changed by passing this argument taxon.metadata.delimiter = "," when working with a file with comma separated values.

RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE, 
    taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"))

If the first column in the metadata file does not contain taxon names then the column containing the taxon names should be explicitly specified using taxon.id.column option.

RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE, 
    taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"), 
    taxon.id.column = "ID")

Next we load the tree, metadata file, reference genome and recombination events generated by Gubbins. If the recombination events are generated by BRATNextGen then this option recom.input.type = "BRATNextGen" should be specified. By default, output from Gubbins is assumed (recom.input.type = "Gubbins").

RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE, 
    taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"), 
    taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff)

We can specify recombination events in a specific region of the genome and show the gene labels in the reference genome annotation file. The gene labels are turned off by default.

RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE, 
    taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"), 
    taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff, 
    genome.start = 30000, genome.end = 60000, show.gene.label = TRUE)

We could also colour the phylogenetic data by a column in the metadata column. Here we will colour the tips using Country.

RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE, 
    taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"), 
    taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff, 
    genome.start = 30000, genome.end = 60000, show.gene.label = TRUE, color.tree.tips.by.column = "Country")

Another option, although very slow sometimes, is to map some characters onto the phylogenetic tree nodes using discrete ancestral character reconstruction using the ace function in ape. Below we use the presence/absence patterns of mefA gene as the discrete trait. Warning: Ancestral character reconstruction may take some a while.

RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE, 
    taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"), 
    taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff, 
    genome.start = 30000, genome.end = 60000, show.gene.label = TRUE, color.tree.tips.by.column = "Country", 
    trait.for.ancestral.reconstr = "mefA")
#> Tips will be coloured by trait.for.ancestral.reconstr option and color.tree.tips.by.column will be ignored

Notice that although we specified the taxon to be coloured by Country, the discrete trait used for ancestral reconstruction overrides this option. When the trait for ancestral reconstruction contains only one value, no reconstruction is performed.

We can also customise the recombination diagram slightly by showing the border and genome tracks using these options show.rec.plot.border and show.rec.plot.tracks respectively.

RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE, 
    taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"), 
    taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff, 
    genome.start = 30000, genome.end = 60000, show.gene.label = TRUE, color.tree.tips.by.column = "Country", 
    show.rec.plot.border = TRUE, show.rec.plot.tracks = TRUE)

We could also turn off the background for the recombination diagram using rec.plot.bg.transparency option.

RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE, 
    taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"), 
    taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff, 
    genome.start = 30000, genome.end = 60000, show.gene.label = TRUE, color.tree.tips.by.column = "Country", 
    show.rec.plot.border = TRUE, show.rec.plot.tracks = TRUE, rec.plot.bg.transparency = 0.15)

What if we need to see the specific taxon names the phylogenetic tree in which recombination events occurred. We could specify the show.tip.label certain recombination events.

RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE, 
    taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"), 
    taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff, 
    genome.start = 30000, genome.end = 60000, show.gene.label = TRUE, color.tree.tips.by.column = "Country", 
    rec.plot.bg.transparency = 0.15, show.rec.plot.tracks = TRUE, show.tip.label = TRUE)

We could also change the viridis colour pallette used to represent the metadata columns using color.pallette. There are five options namely "viridis","inferno","magma","cividis", and "plasma". Below we use "viridis" instead of the default (inferno).

RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE, 
    taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"), 
    taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff, 
    genome.start = 30000, genome.end = 60000, show.gene.label = TRUE, color.tree.tips.by.column = "Country", 
    rec.plot.bg.transparency = 0.15, show.rec.plot.tracks = TRUE, color.pallette = "viridis")

What if we want to change the angle of the gene labels? There is a way to do this as well using gene.label.angle option.

RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE, 
    taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"), 
    taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff, 
    genome.start = 30000, genome.end = 60000, show.gene.label = TRUE, color.tree.tips.by.column = "Country", 
    rec.plot.bg.transparency = 0.15, show.rec.plot.tracks = TRUE, color.pallette = "viridis", 
    gene.label.angle = 90)

similarly, the angle of the metadata column panel can also be adjusted using metadata.column.label.angle.

RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE, 
    taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"), 
    taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff, 
    genome.start = 30000, genome.end = 60000, show.gene.label = TRUE, color.tree.tips.by.column = "Country", 
    rec.plot.bg.transparency = 0.15, show.rec.plot.tracks = TRUE, color.pallette = "viridis", 
    gene.label.angle = 90, metadata.column.label.angle = 45)

Sometimes we may want to see recombination events in specific taxa. We could specify the taxon names to include in the figure using subtree.tips. Below we specify a vector containing a subset of 50 taxon names from the full phylogenetic tree.

tree1 <- ape::read.tree(tree.file)
subtree.taxa <- tree1$tip.label[1:50]

RCandyVis(tree.file.name = tree1, midpoint.root = TRUE, ladderize.tree.right = TRUE, 
    taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"), 
    taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff, 
    show.gene.label = FALSE, color.tree.tips.by.column = "Country", rec.plot.bg.transparency = 0.15, 
    show.rec.plot.tracks = TRUE, subtree.tips = subtree.taxa)

Sometimes we may want to identify recombination hotspots or genomic regions containing many unique but overlapping recombination events. Below we specify the show.rec.freq.per.genome option to turn on this feature. Warning: Generating recombination frequency plot may take some a while.

tree1 <- ape::read.tree(tree.file)
subtree.taxa <- tree1$tip.label[1:50]

RCandyVis(tree.file.name = tree1, midpoint.root = TRUE, ladderize.tree.right = TRUE, 
    taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"), 
    taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff, 
    show.gene.label = FALSE, color.tree.tips.by.column = "Country", rec.plot.bg.transparency = 0.15, 
    show.rec.plot.tracks = TRUE, show.rec.freq.per.genome = TRUE, subtree.tips = subtree.taxa)

Sometimes we may want to identify recombination hotspots or genomic regions containing many unique but overlapping recombination events. Below we specify the show.rec.freq.plot option to turn on this feature. Note that with this feature there may be significant latency even when the number of recombination events is small.

RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE, 
    taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"), 
    taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff, 
    show.gene.label = FALSE, color.tree.tips.by.column = "Country", rec.plot.bg.transparency = 0.15, 
    show.rec.plot.tracks = TRUE, show.rec.freq.per.base = TRUE)

Sometimes a user may want to specify custom colours for the metadata columns. This can be done by specifying a vector of column names containing the custom colours for each isolate using the taxon.metadata.columns.colors option as shown in the example below.

RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE, 
    taxon.metadata.file = metadata.file, taxon.metadata.columns = c("ermB", "mefA", 
        "cat"), taxon.metadata.columns.colors = c("ermB_color", "mefA_color", "cat_color"), 
    taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff, 
    show.gene.label = FALSE, color.tree.tips.by.column = "Country", rec.plot.bg.transparency = 0.15, 
    show.rec.plot.tracks = TRUE)

Another interesting feature is the ability to specify preloaded files for plotting. In the example below we use example dataset containing preloaded objects for the phylogenetic tree, metadata, recombination events (GFF) and reference genome (GFF).

data("RCandy")

tree <- RCandy$tree
metadata <- RCandy$metadata
gubbins.GFF <- RCandy$refgenome.GFF
refgenome.GFF <- RCandy$gubbins.GFF

Then we plot the recombination events using the same code as above.

RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE, 
    taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"), 
    taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff, 
    color.tree.tips.by.column = "Country", rec.plot.bg.transparency = 0.15, show.rec.plot.tracks = TRUE, 
    color.pallette = "inferno", gene.label.angle = 90)

The data objects used above can be loaded in the correct format using the code below.

tree.file <- system.file("extdata", "ST320.final_tree.tre", package = "RCandy", mustWork = TRUE)
metadata.file <- system.file("extdata", "ST320.tsv", package = "RCandy", mustWork = TRUE)
gubbins.gff <- system.file("extdata", "ST320.recombination_predictions.gff", package = "RCandy", 
    mustWork = TRUE)
ref.genome.gff <- system.file("extdata", "Hungary19A-6.gff", package = "RCandy", 
    mustWork = TRUE)

Then we read the location of the phylogenetic tree file name stored in the variable tree.file, metadata file name in metadata.file, recombination events file in GFF format in gubbins.gff, and reference genome file in GFF format in ref.genome.gff.

tree <- ape::read.tree(tree.file)
metadata <- tidyr::as_tibble(read.table(metadata.file, header = T, sep = "\t", comment.char = "?"))
gubbins.GFF <- load.gubbins.GFF(gubbins.gff)
refgenome.GFF <- load.genome.GFF(ref.genome.gff)

Let's draw the recombination plot again to see if the data was loaded correctly.

RCandyVis(tree.file.name = tree.file, midpoint.root = TRUE, ladderize.tree.right = TRUE, 
    taxon.metadata.file = metadata.file, taxon.metadata.columns = c("Source", "Country"), 
    taxon.id.column = "ID", gubbins.gff.file = gubbins.gff, ref.genome.name = ref.genome.gff, 
    color.tree.tips.by.column = "Country", rec.plot.bg.transparency = 0.15, show.rec.plot.tracks = TRUE, 
    color.pallette = "inferno", gene.label.angle = 90)

Other options

There are many options that can be used to customise the plots. These include hiding the figure legend using show.fig.legend, increasing the font size for the tips or taxa using tree.tip.label.cex, hiding the number of recombination events identified in each genome show.rec.freq.per.genome, hiding metadata columns show.metadata.columns, save the plot to a PDF file (or other types including PNG, JPG and SVG) using save.to.this.file for use in publications either directly or editing in Inkscape and Adobe Illustrator, and adjusting plot height and width using plot.height and plot.width respectively.

Some suggestions

We recommend using readseq to convert the reference genome annotation to the GFF format. Other tools can also be used but readseq is recommended to get the best results.

Another similar tool for interactive visualisation of recombination events is Phandango available here and described here. If you are only interested in visualising the phylogenetic tree and associated metadata without recombination events, ggtree and microreact offer many useful functionalities.

Session info

sessionInfo()
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Mojave 10.14.6
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] tidyr_1.1.2  ape_5.4-1    RCandy_1.0.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] phangorn_2.5.5          shape_1.4.5             gtools_3.8.2           
#>  [4] tidyselect_1.1.0        xfun_0.19               purrr_0.3.4            
#>  [7] lattice_0.20-41         phytools_0.7-70         colorspace_2.0-0       
#> [10] vctrs_0.3.6             generics_0.1.0          expm_0.999-6           
#> [13] htmltools_0.5.1.1       viridisLite_0.3.0       yaml_2.2.1             
#> [16] rlang_0.4.10            pillar_1.4.7            glue_1.4.2             
#> [19] lifecycle_1.0.0         stringr_1.4.0           munsell_0.5.0          
#> [22] combinat_0.0-8          gtable_0.3.0            coda_0.19-4            
#> [25] evaluate_0.14           knitr_1.30              parallel_4.0.3         
#> [28] Rcpp_1.0.6              scales_1.1.1            formatR_1.7            
#> [31] plotrix_3.8-1           clusterGeneration_1.3.7 scatterplot3d_0.3-41   
#> [34] tmvnsim_1.0-2           gridExtra_2.3           fastmatch_1.1-0        
#> [37] mnormt_2.0.2            ggplot2_3.3.3           digest_0.6.27          
#> [40] stringi_1.5.3           dplyr_1.0.2             numDeriv_2016.8-1.1    
#> [43] grid_4.0.3              quadprog_1.5-8          tools_4.0.3            
#> [46] magrittr_2.0.1          maps_3.3.0              tibble_3.0.6           
#> [49] crayon_1.4.1            pkgconfig_2.0.3         MASS_7.3-53            
#> [52] ellipsis_0.3.1          Matrix_1.2-18           data.table_1.13.2      
#> [55] rmarkdown_2.5           viridis_0.5.1           R6_2.5.0               
#> [58] igraph_1.2.6            nlme_3.1-150            compiler_4.0.3

References

  • Croucher NJ, Page AJ, Connor TR, Delaney AJ, Keane JA, Bentley SD, Parkhill J, Harris SR. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res. 2015 Feb 18;43(3):e15. doi: 10.1093/nar/gku1196. Epub 2014 Nov 20. PMID: 25414349; PMCID: PMC4330336. https://pubmed.ncbi.nlm.nih.gov/25414349/.

  • Marttinen P, Hanage WP, Croucher NJ, Connor TR, Harris SR, Bentley SD, Corander J. Detection of recombination events in bacterial genomes from large population samples. Nucleic Acids Res. 2012 Jan;40(1):e6. doi: 10.1093/nar/gkr928. Epub 2011 Nov 7. PMID: 22064866; PMCID: PMC3245952. https://pubmed.ncbi.nlm.nih.gov/22064866/.

  • Chaguza C, Tonkin-Hill G, Lo SW, Hadfield J, Croucher NJ, Harris SR, Bentley SD. RCandy: an R package for visualising homologous recombinations in bacterial genomes. Bioinformatics. 2021 Dec 2;38(5):1450–1. doi: 10.1093/bioinformatics/btab814. Epub ahead of print. PMID: 34864895; PMCID: PMC8826011. https://pubmed.ncbi.nlm.nih.gov/34864895/.

About

RCandy: an R package for visualising homologous recombination events in bacterial genomes

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages