# Variants of concern in Canada
*April 21, 2022*

This week, a graphic was requested showing variants and their rise and fall overtime. This data exists on Public Health Agency of Canada's website [here](https://health-infobase.canada.ca/covid-19/epidemiological-summary-covid-19-cases.html#VOC).

We start by importing pandas.

In [1]:
import pandas as pd
import datawrappergraphics

Then read in the data directly from the source and take a quick peek at the structure.

In [2]:
raw = pd.read_csv("https://health-infobase.canada.ca/src/data/covidLive/covid19-epiSummary-variants.csv")

raw.head()

Unnamed: 0,Variant Grouping,_Identifier,Lineage Grouped,%CT Count of Sample #,Collection (week)
0,Alpha,Alpha,B.1.1.7,0.001,2020-04-05
1,Alpha,Alpha,B.1.1.7,0.001,2020-04-19
2,Alpha,Alpha,B.1.1.7,0.008,2020-05-10
3,Alpha,Alpha,B.1.1.7,0.008,2020-05-17
4,Alpha,Alpha,B.1.1.7,0.002,2020-05-31


Now, we'll pivot this so we have a nice series to plot.

In [28]:
pivot = raw.pivot_table(columns=["_Identifier"], index="Collection (week)", values="%CT Count of Sample #", aggfunc="sum")

pivot.head()

_Identifier,Alpha,BA.1,BA.2,BA.3,BA.4,BA.5,Beta,Delta,Eta,Gamma,Other,Recombinants
Collection (week),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2019-12-29,,,,,,,,,,,1.003,
2020-02-23,,,,,,,,,,,1.0,
2020-03-01,,,,,,,,,,,1.0,
2020-03-08,,,,,,,,,,,1.002,
2020-03-15,,,,,,,,,,,0.996,


For most of these variants, we just want to include data for the variant as is (the _Identifier column). But for Omicron, we want more detailed data. Let's break out Omicron using a separate analysis. First, we'll get just our omicron rows from the raw dataset.

In [12]:
omicron = raw[raw["_Identifier"].isin(["BA.1", "BA.2", "BA.3", "BA.4", "BA.5"])]

Let's see what values Omicron is grouped into.

In [13]:
omicron["Lineage Grouped"].unique()

array(['BA.1.1', 'BA.1.1.16', 'BA.1.15', 'Other BA.1', 'BA.2',
       'BA.2.12.1', 'BA.2.3', 'Other BA.2', 'BA.3', 'BA.3.1', 'BA.4',
       'BA.5', 'BA.5.1'], dtype=object)

There are four variants here, plus an "Other omicron" category.

In [14]:
omicron.loc[:, "Lineage Grouped"] = (omicron
                              .loc[:, "Lineage Grouped"]
                              )

omicron.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  omicron.loc[:, "Lineage Grouped"] = (omicron


Unnamed: 0,Variant Grouping,_Identifier,Lineage Grouped,%CT Count of Sample #,Collection (week)
512,Omicron,BA.1,BA.1.1,0.0,2021-01-17
513,Omicron,BA.1,BA.1.1,0.001,2021-11-21
514,Omicron,BA.1,BA.1.1,0.006,2021-11-28
515,Omicron,BA.1,BA.1.1,0.06,2021-12-05
516,Omicron,BA.1,BA.1.1,0.12,2021-12-12


Now, we'll use groupby and sum to consolidate our omicron values together, just like we did for all the other variants above. We'll join onto the other table shortly.

In [15]:
omicron = (omicron
           .groupby(["Lineage Grouped", "Collection (week)"]).sum()
           .reset_index()
           .pivot(index="Collection (week)", columns="Lineage Grouped", values="%CT Count of Sample #")
           )

We're also going to rename these so they're a little cleaner, before we join onto the other table.

In [16]:
omicron.columns = "Omicron - " + omicron.columns

Now, we'll drop some variants we don't care too much about (and also the Omicron aggregate column), join on our new omicron breakdowns, and do some renaming of columns for clarity.

In [29]:
pivot = (pivot
         .drop(columns=["Eta", "Beta"])
         )

We'll also multiple by 100 so we get real percentages and can visualize it more clearly.

In [30]:
pivot = pivot * 100
pivot = pivot[pivot.index >= "2021-01-01"]

pivot

_Identifier,Alpha,BA.1,BA.2,BA.3,BA.4,BA.5,Delta,Gamma,Other,Recombinants
Collection (week),Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2021-01-03,1.4,,,,,,,,96.7,
2021-01-10,2.8,,,,,,,0.1,95.8,
2021-01-17,5.2,0.0,,,,,,,93.8,
2021-01-24,7.8,,,,,,0.0,,90.5,
2021-01-31,14.8,,,,,,,0.3,84.8,
...,...,...,...,...,...,...,...,...,...,...
2022-05-01,,2.6,97.2,0.1,0.2,0.1,,,0.0,
2022-05-08,,1.6,97.6,0.1,0.5,0.2,,,,
2022-05-15,,0.9,97.0,0.1,0.8,1.1,,,0.0,
2022-05-22,,1.1,92.7,0.0,3.1,3.0,,,,


Let's plot it using pandas before we take it over to datawrapper for the public-facing version.

In [31]:
datawrappergraphics.Chart("jJRa0").data(pivot).show()

INFO:root:SUCCESS: Data added to chart.
INFO:root:SUCCESS: Metadata updated.


\-30\-