In [1]:
# ggtree v3.2.0
library(ggtree)
# ggnewscale for multiple independent scales in the same plot
library(ggnewscale)
# tidyverse 1.3.1
# ✔ ggplot2 3.3.5     ✔ purrr   0.3.4
# ✔ tibble  3.1.6     ✔ dplyr   1.0.7
# ✔ tidyr   1.1.4     ✔ stringr 1.4.0
# ✔ readr   2.1.1     ✔ forcats 0.5.1
library(tidyverse)

ggtree v3.2.0  For help: https://yulab-smu.top/treedata-book/

If you use ggtree in published research, please cite the most appropriate paper(s):

1. Guangchuang Yu. Using ggtree to visualize data on tree-like structures. Current Protocols in Bioinformatics. 2020, 69:e96. doi:10.1002/cpbi.96
2. Guangchuang Yu, Tommy Tsan-Yuk Lam, Huachen Zhu, Yi Guan. Two methods for mapping and visualizing associated data on phylogeny using ggtree. Molecular Biology and Evolution. 2018, 35(12):3041-3043. doi:10.1093/molbev/msy194
3. Guangchuang Yu, David Smith, Huachen Zhu, Yi Guan, Tommy Tsan-Yuk Lam. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution. 2017, 8(1):28-36. doi:10.1111/2041-210X.12628



── [1mAttaching packages[22m ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

# Tree viz for all Delta deer and related public seqs

In [2]:
subtree = read.tree('tree-pruned.newick')

In [3]:
subtree


Phylogenetic tree with 21 tips and 20 internal nodes.

Tip labels:
  QC-4205, QC-4204, Canada/QC-L00415260001B/2021, QC-4249, QC-4055, OL718610.1, ...

Rooted; includes branch lengths.

In [4]:
mb_deer_names = "
QC-4249
QC-4204
QC-4055
QC-4205
"
mb_deer_names = strsplit(trimws(mb_deer_names), '\n')

In [5]:
mb_deer_names

In [6]:
subtree2 = groupClade(subtree, MRCA(subtree, mb_deer_names))

In [7]:
df = read_tsv('2023-03-30-QC-WTD-SARS-CoV-2-NCBI-GISAID-metadata.tsv')

[1mRows: [22m[34m703[39m [1mColumns: [22m[34m9[39m

[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m "\t"
[31mchr[39m (9): sample, host, date, country, province_state, region, location, host...


[36mℹ[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.



In [8]:
hmd_sub = as.data.frame(sapply(df, as.character))

In [9]:
hmd_sub[['host']] = hmd_sub$host

In [10]:
hmd_sub

sample,host,date,country,province_state,region,location,host_country,other_sample_id
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
QC-4055,White-tailed deer,2021-11-06,Canada,Quebec,,CAN-QC,White-tailed deer|Canada,
QC-4204,White-tailed deer,2021-11-06,Canada,Quebec,,CAN-QC,White-tailed deer|Canada,
QC-4205,White-tailed deer,2021-11-06,Canada,Quebec,,CAN-QC,White-tailed deer|Canada,
QC-4249,White-tailed deer,2021-11-06,Canada,Quebec,,CAN-QC,White-tailed deer|Canada,
MN908947.3,Human,2022-01-09,China,Wuhan,Wuhan,CHN-HB,Human|China,
Canada/BC-BCCDC-195041/2021,Human,2021-08-30,Canada,British Columbia,,CAN-BC,Human|Canada,
Canada/AB-ABPHL-32268/2021,Human,2021-08-29,Canada,Alberta,,CAN-AB,Human|Canada,
USA/MT-FYR-545/2021,Human,2021-10-04,USA,Montana,,USA-MT,Human|USA,
USA/MT-MTPHL-3869051/2021,Human,2021-08-31,USA,Montana,Missoula County,USA-MT,Human|USA,
Canada/AB-ABPHL-29354/2021,Human,2021-08-19,Canada,Alberta,,CAN-AB,Human|Canada,


In [11]:
dfaa = read_tsv('aa-matrix.tsv')

[1mRows: [22m[34m97[39m [1mColumns: [22m[34m63[39m

[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m "\t"
[31mchr[39m (63): sample, ORF1a:M85-, ORF1a:V1143F, ORF1a:A1306S, ORF1a:Q1784H, ORF1...


[36mℹ[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.



In [12]:
rownames(dfaa) = dfaa$sample

“Setting row names on a tibble is deprecated.”


In [13]:
rownames(dfaa)

In [14]:
dfaa_heatmap = as.data.frame(dfaa[,colnames(dfaa)[2:ncol(dfaa)]])

In [15]:
rownames(dfaa_heatmap) = dfaa$sample

In [16]:
rownames(hmd_sub) = hmd_sub$sample

# Subtree with AA mutation annotation

- colours are set manually with `scale_fill_manual`
- gene labels are ordered using `breaks` arg for `scale_fill_manual`
- margins are adjusted with `theme`

In [17]:
unique(as.character(as.matrix(dfaa_heatmap)))

In [18]:
hmd_sub['Country'] = hmd_sub['country']
hmd_sub['State/Province'] = hmd_sub['province_state']

In [19]:
df

sample,host,date,country,province_state,region,location,host_country,other_sample_id
<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
QC-4055,White-tailed deer,2021-11-06,Canada,Quebec,,CAN-QC,White-tailed deer|Canada,
QC-4204,White-tailed deer,2021-11-06,Canada,Quebec,,CAN-QC,White-tailed deer|Canada,
QC-4205,White-tailed deer,2021-11-06,Canada,Quebec,,CAN-QC,White-tailed deer|Canada,
QC-4249,White-tailed deer,2021-11-06,Canada,Quebec,,CAN-QC,White-tailed deer|Canada,
MN908947.3,Human,2022-01-09,China,Wuhan,Wuhan,CHN-HB,Human|China,
Canada/BC-BCCDC-195041/2021,Human,2021-08-30,Canada,British Columbia,,CAN-BC,Human|Canada,
Canada/AB-ABPHL-32268/2021,Human,2021-08-29,Canada,Alberta,,CAN-AB,Human|Canada,
USA/MT-FYR-545/2021,Human,2021-10-04,USA,Montana,,USA-MT,Human|USA,
USA/MT-MTPHL-3869051/2021,Human,2021-08-31,USA,Montana,Missoula County,USA-MT,Human|USA,
Canada/AB-ABPHL-29354/2021,Human,2021-08-19,Canada,Alberta,,CAN-AB,Human|Canada,


In [20]:
dark_gray = '#555555'
label_size = 2.0

p = ggtree(subtree2, color=dark_gray, size=0.3) %<+% df
p = p + scale_color_manual(values=c(dark_gray, 'darkgreen'), guide='none')
p = p + new_scale_color()
# p = p + geom_tiplab(align=T, size=1.5, linetype='dotted', show.legend=F, linesize = 0.1)

p = p + geom_tiplab(aes(color=host), align=T, size=label_size, linetype='dotted', show.legend=T, linesize = 0.1)
p = p + geom_tiplab(geom='text', 
                    aes(label=location, color=host), 
                    align=T, 
                    size=label_size,
                    linetype=NA,
                    show.legend=F,
                    linesize = 0.1,
                    offset=0.00025)
p = p + scale_color_manual(values=c(
    dark_gray,
    'darkred'
), guide='none')
p = p + new_scale_color()
p = p + geom_tippoint(aes(color=host), align=T, size=0.8, linetype='dotted', show.legend=F, linesize = 0.1)
p = p + scale_color_manual(values=c(
    'transparent',
    'darkred'
), guide='none')


# offset and offset step needs to be set manually and adjusted based on plot size to ensure proper placement of
# plot elements beside tree
offset = 0.00032
offset_step = 0.0002

hm_col_fontsize = 0

# p = p + new_scale_fill()

# p = gheatmap(p, hmd_sub['State/Province'], font.size=2,
#              colnames_offset_y=1, hjust=0.5, colnames_angle=0, colnames_position="top",
#              show.legend=F,
#              width=0.025, 
#              offset=offset, 
#              colnames=T, 
#              color=NULL)

# p = p + scale_fill_brewer(palette = 'Set3', na.translate=F, guide=guide_legend(title='Province/State'))

# offset = offset + offset_step
p = p + new_scale_fill()

p = gheatmap(p, dfaa_heatmap,
             width=1,
             offset=offset, 
             colnames=T,
             colnames_angle=45,
             colnames_position='top',
             font.size=1.5,
             hjust=0,
             colnames_offset_y=-0.4,
             colnames_offset_x=-0.000002,
             color='lightgray'
            )

# expand y-axis by a little bit so that AA mutation matrix y-axis labels don't get cut off
p = p + scale_y_continuous(expand = c(0.07,0))
# manually set gene colours and order of genes for the legend; default is alphabetical

# ['*No Coverage',
#  '-',
#  'E',
#  'N',
#  'ORF1a',
#  'ORF1b',
#  'ORF3a',
#  'ORF7a',
#  'ORF8',
#  'S']

p = p + scale_fill_manual(breaks=c(
'ORF1a',
'ORF1b',
'S',
'ORF3a',
'E',
'M',
'ORF6',
'ORF7a',
'ORF7b',
'ORF8',
'N',
'ORF9b',
'*No Coverage',
'-'
), 
values=c(
'#1b9e77', # ORF1a
'#d95f02', # ORF1b
'#7570b3', # S
'#04ccc2',  # ORF3a
'red', # E
'#e7298a', # M
'#66a61e', # ORF6
'#e6ab02', # ORF7a
'#cc1e04',  # ORF7b
'#a6761d', # ORF8
'#377eb8', # N
'purple',  # ORF9b
'gray',
'white'
), na.translate=F, guide=guide_legend(title='Gene'))


p = p + geom_treescale(y=1.25, offset=0.25, width=0.0001, fontsize=2, linesize=0.25, color=dark_gray)

p = p + theme(plot.margin=margin(0, -0.8, 0, -1, "cm"), legend.margin=margin(-1,0,-0,0,'cm'),
              plot.background=element_rect(fill = "white", color='white'),
              panel.background=element_rect(fill='white', color='white'),
              legend.key.size = unit(0.3, 'cm'), 
              legend.text=element_text(size=7),
              legend.position='bottom',
              legend.justification=c(0.9,0)
             )


# saving to PDF and viewing with evince for high res, hot-reloading viz of figure
filename_base = 'tree-pruned-21-taxa'
ggsave(plot = p, filename = paste0(filename_base, '.pdf'), height = 6.2, width = 11)
ggsave(plot = p, filename = paste0(filename_base, '.png'), height = 6.2, width = 11, dpi = 600)

“Ignoring unknown parameters: align, linetype, linesize”
Scale for 'y' is already present. Adding another scale for 'y', which will
replace the existing scale.

Scale for 'fill' is already present. Adding another scale for 'fill', which
will replace the existing scale.

