# CDC variants proportions by state

By [Matt Stiles](https://www.latimes.com/people/matt-stiles)

Downloads variant totals and proportions from a [Tableau dashboard](https://covid.cdc.gov/covid-data-tracker/#variant-proportions) published by the U.S. Centers for Disease Control and Prevention.

## Import

Code formatting with [black](https://pypi.org/project/nb-black/).

In [3]:
%load_ext lab_black

Import dependencies.

In [4]:
import os
import pytz
from datetime import datetime

In [5]:
import pandas as pd
from tableauscraper import TableauScraper as TS

In [6]:
# !pipenv install tableauscraper=='0.1.10'

In [7]:
tz = pytz.timezone("America/Los_Angeles")

In [8]:
today = datetime.now(tz).date()

## Scrape

Set the URL

In [11]:
url = "https://public.tableau.com/profile/ca.open.data#!/vizhome/LHJVaccineEquityPerformance/MapView"

In [12]:
ts = TS()
ts.loads(url)

AttributeError: 'NoneType' object has no attribute 'text'

In [9]:
ws = ts.getWorksheet("State Proportions")

In [10]:
workbook = ts.getWorkbook()

In [11]:
target = "State Proportions"

In [12]:
sheet = next(w for w in workbook.worksheets if w.name == target)

In [13]:
src = sheet.data

In [14]:
df = src[["State-value", "Measure Names-alias", "Measure Values-alias"]].copy()

In [15]:
df.rename(
    columns={
        "State-value": "state",
        "Measure Names-alias": "variable",
        "Measure Values-alias": "value",
    },
    inplace=True,
)

In [16]:
df.value = df.value.str.replace(",", "", regex=False).str.replace("%", "", regex=False)

In [17]:
df.value = pd.to_numeric(df.value)

In [18]:
df_pivot = df.pivot_table(
    values="value", index="state", columns="variable"
).reset_index()

In [19]:
df_pivot.columns = (
    df_pivot.columns.str.lower()
    .str.replace(".", "", regex=False)
    .str.replace(" ", "_", regex=False)
    .str.replace("/", "_", regex=False)
)

In [20]:
df_pivot["update_date"] = today

In [21]:
df_pivot.head()

variable,state,b117,b1351,b1427_b1429,other_lineages,p1,total_available_sequences,update_date
0,Arizona,14.1,,36.0,49.2,0.7,411.0,2021-04-17
1,California,15.9,0.3,53.8,28.4,1.6,6919.0,2021-04-17
2,Colorado,29.1,0.1,28.1,42.0,0.8,915.0,2021-04-17
3,Connecticut,29.2,0.8,7.5,61.8,0.8,651.0,2021-04-17
4,Florida,52.2,0.3,7.5,37.6,2.4,6093.0,2021-04-17


## Export

Save out the data as a CSV that's datestamped to California time.

In [22]:
data_dir = os.path.join(os.path.abspath(""), "data")

In [23]:
df_pivot.to_csv(
    os.path.join(data_dir, f"variants_cdc_proportions_timeseries_{today}_.csv"),
    index=False,
)