# CDC variants proportions by state

By [Matt Stiles](https://www.latimes.com/people/matt-stiles)

Downloads variant totals and proportions from a [Tableau dashboard](https://covid.cdc.gov/covid-data-tracker/#variant-proportions) published by the U.S. Centers for Disease Control and Prevention.

## Import

Code formatting with [black](https://pypi.org/project/nb-black/).

In [1]:
%load_ext lab_black

Import dependencies.

In [2]:
import os
import pytz
from datetime import datetime

In [3]:
import pandas as pd
from tableauscraper import TableauScraper as TS

In [4]:
# !pipenv install tableauscraper=='0.1.10'

In [5]:
tz = pytz.timezone("America/Los_Angeles")

In [6]:
today = datetime.now(tz).date()

## Scrape

Set the URL

In [9]:
url = "https://public.tableau.com/views/WeightedStateVariantTable/StateVBMTable"

In [10]:
ts = TS()
ts.loads(url)

In [None]:
workbook = ts.getWorkbook()

In [None]:
target = "State Proportions"

In [None]:
sheet = next(w for w in workbook.worksheets if w.name == target)

In [None]:
src = sheet.data

In [None]:
df = src[["State-value", "Measure Names-alias", "Measure Values-alias"]].copy()

In [None]:
df.rename(
    columns={
        "State-value": "state",
        "Measure Names-alias": "variable",
        "Measure Values-alias": "value",
    },
    inplace=True,
)

In [None]:
df.value = df.value.str.replace(",", "", regex=False).str.replace("%", "", regex=False)

In [None]:
df.value = pd.to_numeric(df.value)

In [None]:
df_pivot = df.pivot_table(
    values="value", index="state", columns="variable"
).reset_index()

In [None]:
df_pivot.columns = (
    df_pivot.columns.str.lower()
    .str.replace(".", "", regex=False)
    .str.replace(" ", "_", regex=False)
    .str.replace("/", "_", regex=False)
)

In [None]:
df_pivot["update_date"] = today

In [None]:
df_pivot.head()

## Export

Save out the data as a CSV that's datestamped to California time.

In [None]:
data_dir = os.path.join(os.path.abspath(""), "data")

In [None]:
df_pivot.to_csv(
    os.path.join(data_dir, f"variants_cdc_proportions_timeseries_{today}_.csv"),
    index=False,
)