The data/
directory contains a single file, all_variant_counts.csv
.
This file contains the 7-day rolling average for the number of genomes
from a given variant sequenced by ESR. Data in the most recent weeks
will be updated as new cases reported in this time are added to the
sequencing dataset Data begins at the start of the August 2021 Delta
outbreak.
It should be noted that the cases contributing to this data are not a random or representative sample of all cases. Information on the demographic samplig of cases is provided in the June 2023 Covid Genomics Insight report pdf.
Changes to the set of lineages tracked in this repository are described
in changes.md
In addition the data/archived
subdirectory contians no-longer updated
files
-
rbd_levels.csv
, which includes information on the number of key mutations observed in the receptor binding domain of the spike protein. Higher values of this measure are associated with increased immune evasiveness. This data was part of ESR’s reporting in late 2022 but is no longer useful. -
community_variant_counts.csv
andborder_variant_counts.csv
break down the lineage data into those cases associated with community-acquired infections and border-associated infections respectively. Border and community cases can no longer be reliably differentiated, so this data is no longer updated. -
May_23_variant_classes.csv
, which includes variant classified following reporting criteria used prior to May 2023 (inducing BQ.1.1, XBF and XBC which are no longer tracked separately)
A brief example of how the data might be used, using the R language.
library(lubridate)
library(tidyr)
library(ggplot2)
library(forcats)
voc <- read.csv("data/all_variant_counts.csv")
voc$report_date <- ymd(voc$report_date)
tidy_voc <- pivot_longer(voc, !report_date, names_to = "variant", values_to = "n_genomes")
stack_order <- c("Delta",
"BA.1",
"BA.2",
"BA.5",
"BA.4",
"BA.2.75",
"CH.1.1",
"FK.1.1",
"XBB",
"XBB.1.5",
"XBB.1.16",
"EG.5",
"HK.3",
"BA.2.86",
"JN.1",
"JN.1.4",
"Recombinant",
"XBC"
)
tidy_voc$variant <- fct_relevel(tidy_voc$variant, stack_order)
voc_pal <-c(Delta = "#3182bd",
BA.1 = "#bd0026",
BA.2 = "#fd8d3c",
BA.4 = "#B14B02",
BA.5 = "#FD3A6B",
BA.2.75 = "#fecc5c",
CH.1.1 = "#bd0026",
FK.1.1 = "#E81634",
XBB = "#b6db3a",
XBB.1.5 = "#5CCC34",
XBB.1.16 = "#9CC33C",
EG.5 = "#599945",
HK.3 = "#3e5836",
BA.2.86 = "#B399D4",
JN.1 = "#6f4f97",
JN.1.4 = "#422f5a",
Recombinant = "#57badb",
XBC = "#9AD6EA"
)
ggplot(tidy_voc, aes(report_date, n_genomes, fill=variant)) +
geom_area(position = 'fill', colour='black', size=0.3) +
scale_fill_manual(values=voc_pal)+
theme_bw(base_size = 16)
library(lubridate)
recent <- filter(tidy_voc, report_date >= today() - months(3), n_genomes > 0)
recent_pal <- voc_pal[ as.character(unique(recent$variant)) ]
ggplot(recent, aes(report_date, n_genomes, fill=variant)) +
geom_area(position = 'fill', colour='black', size=0.3) +
scale_fill_manual(values=recent_pal)+
theme_bw(base_size = 16)
The data is released under a CC-BY 4.0 international license. You are free to copy, distribute or adapt this data as long as you acknowledge ESR.