# vaccine-doses-administered

By [Ryan Murphy](https://www.latimes.com/people/ryan-murphy)

Downloads the number of vaccine doses administered by county and statewide from a Tableau dashboard published by the California Department of Public Health.

## Import

Code formatting with [black](https://pypi.org/project/nb-black/).

In [27]:
%load_ext lab_black

The lab_black extension is already loaded. To reload it, use:
  %reload_ext lab_black


In [28]:
import os
import re
import pytz
import json
import requests
import pandas as pd
from datetime import datetime
from bs4 import BeautifulSoup

## Scrape

Our collection of variables.

In [29]:
host_url = "https://public.tableau.com"
path = "/interactive/views/COVID-19VaccineDashboardPublic/Vaccine"
url = f"{host_url}{path}"

sheet_id = "County Map"
value_index = 4
value_key = "aliasIndices"
label_index = 1
label_key = "aliasIndices"

Download the embed so we can scrape it and find the VizQL root ID to build our query.

In [30]:
response = requests.get(url, params={":embed": "y", ":showVizHome": "no"})

In [31]:
soup = BeautifulSoup(response.text, "html.parser")

In [32]:
context = json.loads(soup.find("textarea", {"id": "tsConfigContainer"}).text)

In [33]:
data_url = f'{host_url}{context["vizql_root"]}/bootstrapSession/sessions/{context["sessionid"]}'

Then download the raw data, clean it up, and turn it into usable dictionaries.

In [34]:
response = requests.post(data_url, data={"sheet_id": sheet_id})
raw_text = response.text
json_pieces = [json.loads(d) for d in re.split("\d{2,10};(?={.+})", raw_text) if len(d)]
root = next(d for d in json_pieces if "secondaryInfo" in d)
data = root["secondaryInfo"]["presModelMap"]

Build our value lookup.

In [35]:
value_columns = data["dataDictionary"]["presModelHolder"]["genDataDictionaryPresModel"][
    "dataSegments"
]["0"]["dataColumns"]
lookup = {d["dataType"]: d["dataValues"] for d in value_columns}

Pull out the columns of indexes so we can run them against our lookup.

In [36]:
pres_model_map = data["vizData"]["presModelHolder"]["genPresModelMapPresModel"][
    "presModelMap"
]

columns = pres_model_map[sheet_id]["presModelHolder"]["genVizDataPresModel"][
    "paneColumnsData"
]["paneColumnsList"][0]["vizPaneColumns"]

Using our variables from above, pull out the lists of indexes we need.

In [37]:
values_column = columns[value_index][value_key]

In [38]:
labels_column = columns[label_index][label_key]

Run each one through our lookup.

In [39]:
values = [lookup["integer"][idx] for idx, value in enumerate(values_column)]

In [40]:
labels = [lookup["cstring"][label] for label in labels_column]

`zip` and convert them to a `dict` so they are key/value'ed.

In [41]:
data = [
    {"county": label, "doses_administered": value}
    for label, value in (sorted(zip(labels, values), key=lambda d: d[0]))
]

In [42]:
county_df = pd.DataFrame(data)

Get statewide totals

In [43]:
statewide_total_sheet_names = [
    "Total Doses Admin",
    "Total Doses Delivered",
    "Total Doses Delivered CDC",
    "Total Doses Shipped",
    "Total Doses Shipped CDC",
    "Last Updated Date",
]

In [44]:
values = []
for sheet in statewide_total_sheet_names:
    totals_response = requests.post(data_url, data={"sheet_id": sheet})
    totals_raw_text = totals_response.text
    totals_json_pieces = [
        json.loads(d) for d in re.split("\d{2,10};(?={.+})", totals_raw_text) if len(d)
    ]
    totals_root = next(d for d in totals_json_pieces if "secondaryInfo" in d)
    totals_data = totals_root["secondaryInfo"]["presModelMap"]
    val = totals_data["dataDictionary"]["presModelHolder"][
        "genDataDictionaryPresModel"
    ]["dataSegments"]["0"]["dataColumns"][0]["dataValues"][0]
    values.append(val)

In [45]:
statewide_totals_dict = dict(zip(statewide_total_sheet_names, values))

In [46]:
statewide_totals_df = pd.DataFrame(statewide_totals_dict, index=[0])

In [47]:
statewide_totals_clean = statewide_totals_df.rename(
    columns={
        "Total Doses Admin": "total_doses_administered",
        "Total Doses Delivered": "total_doses_delivered",
        "Total Doses Delivered CDC": "total_doses_delivered_cdc",
        "Total Doses Shipped": "total_doses_shipped",
        "Total Doses Shipped CDC": "total_doses_shipped_cdc",
        "Last Updated Date": "last_updated",
    }
)

In [48]:
statewide_totals_clean["last_updated"] = pd.to_datetime(
    statewide_totals_clean["last_updated"]
)

## Export

Datestamp and write it out

In [49]:
tz = pytz.timezone("America/Los_Angeles")

In [50]:
today = datetime.now(tz).date()

In [51]:
data_dir = os.path.join(os.path.abspath(""), "data")

In [52]:
county_df.to_csv(os.path.join(data_dir, "by-county", f"{today}.csv"), index=False)

In [53]:
statewide_totals_clean.to_csv(
    os.path.join(data_dir, "statewide", f"{today}.csv"), index=False
)