# CDPH variant totals by type

By [Matt Stiles](https://www.latimes.com/people/matt-stiles)

Downloads [variant totals](https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/COVID-19/COVID-Variants.aspx) published by the California Department of Public Health.

## Import

Code formatting with [black](https://pypi.org/project/nb-black/).

In [1]:
%load_ext lab_black

Import dependencies.

In [2]:
import os
import pytz
from datetime import datetime
import requests
from bs4 import BeautifulSoup
import pandas as pd
import lxml

In [3]:
tz = pytz.timezone("America/Los_Angeles")

In [4]:
today = datetime.now(tz).date()

## Scrape

### Get url for CDPH's variants summary page

In [5]:
url = "https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/COVID-19/COVID-Variants.aspx"

### Read the data on the page

In [6]:
# page = pd.read_html(url, header=0)

In [7]:
response = requests.get(url)

### Get the 'known variants of concern in California' table

In [8]:
df1 = pd.read_html(response.text, attrs={"class": "ms-rteTable-default"}, header=0)[0]

In [9]:
df1["update_date"] = today

In [10]:
df1.rename(
    columns={
        "Variant": "variant_name",
        "Number of Cases Caused by Variant": "cases_caused_by_variant",
        "update_date": "update_date",
    },
    inplace=True,
)

In [11]:
df1.head()

Unnamed: 0,variant_name,cases_caused_by_variant,update_date
0,B.1.1.7,1937,2021-04-17
1,B.1.351,27,2021-04-17
2,P.1,166,2021-04-17
3,B.1.427,4416,2021-04-17
4,B.1.429,9074,2021-04-17


### Get the 'known variants of interest in California' table

In [12]:
df2 = pd.read_html(response.text, attrs={"class": "ms-rteTable-default"}, header=0)[1]

In [13]:
df2["update_date"] = today

In [14]:
df2.rename(
    columns={
        "Variant": "variant_name",
        "Number of Cases Caused by Variant": "cases_caused_by_variant",
        "update_date": "update_date",
    },
    inplace=True,
)

In [15]:
df2.head()

Unnamed: 0,variant_name,cases_caused_by_variant,update_date
0,B.1.526,78,2021-04-17
1,B.1.525,5,2021-04-17
2,P.2,33,2021-04-17


## Export

Save out the data as CSVs that are datestamped to California time.

In [16]:
data_dir = os.path.join(os.path.abspath(""), "data")

In [17]:
df1.to_csv(
    os.path.join(data_dir, f"variants_of_concern_ca_{today}_.csv"),
    index=False,
)

In [18]:
df2.to_csv(
    os.path.join(data_dir, f"variants_of_interest_ca_{today}_.csv"),
    index=False,
)