# CDC variants proportions by state

By [Matt Stiles](https://www.latimes.com/people/matt-stiles)

Downloads variant totals and proportions from a [Tableau dashboard](https://covid.cdc.gov/covid-data-tracker/#variant-proportions) published by the U.S. Centers for Disease Control and Prevention.

## Import

Code formatting with [black](https://pypi.org/project/nb-black/).

In [1]:
%load_ext lab_black

Import dependencies.

In [2]:
import os
import pytz
from datetime import datetime

In [3]:
import pandas as pd
from tableauscraper import TableauScraper as TS

In [4]:
# !pipenv install tableauscraper=='0.1.10'

In [5]:
tz = pytz.timezone("America/Los_Angeles")

In [6]:
today = datetime.now(tz).date()

## Scrape

### Get url for CDPH's variants summary page

In [7]:
url = "https://www.cdph.ca.gov/Programs/CID/DCDC/Pages/COVID-19/COVID-Variants.aspx"

### Read the data on the page

In [8]:
page = pd.read_html(url, header=0)

### Get the 'known variants of concern in California' table

In [9]:
df1 = page[0]

In [10]:
df1["update_date"] = today

In [11]:
df1.rename(
    columns={
        "Variant": "variant_name",
        "Number of Cases Caused by Variant": "cases_caused_by_variant",
        "update_date": "update_date",
    },
    inplace=True,
)

### Get the 'known variants of interest in California' table

In [12]:
df2 = page[1]

In [13]:
df2["update_date"] = today

In [14]:
df2.rename(
    columns={
        "Variant": "variant_name",
        "Number of Cases Caused by Variant": "cases_caused_by_variant",
        "update_date": "update_date",
    },
    inplace=True,
)

## Export

Save out the data as a CSV that's datestamped to California time.

In [15]:
data_dir = os.path.join(os.path.abspath(""), "data")

In [16]:
df1.to_csv(
    os.path.join(data_dir, f"variants_of_concern_ca_{today}_.csv"),
    index=False,
)

In [17]:
df2.to_csv(
    os.path.join(data_dir, f"variants_of_interest_ca_{today}_.csv"),
    index=False,
)