# vaccine-doses-on-hand

By [Sean Greene](https://www.latimes.com/people/sean-greene)

Downloads the number of vaccine doses on hand from a Tableau dashboard published by the California Department of Public Health.

## Import

Code formatting with [black](https://pypi.org/project/nb-black/).

In [1]:
%load_ext lab_black

Import dependencies.

In [2]:
import os
import re
import pytz
import json
import requests
import pandas as pd
from datetime import datetime
from bs4 import BeautifulSoup

## Scrape

Set the URL

In [3]:
host = "https://public.tableau.com"

In [4]:
path = "/views/COVID-19VaccineProviderDashboardPublic/PublicVaccineProviderDashboard"

In [5]:
url = f"{host}{path}"

Set the custom variables we need to manipulate the dashboard.

In [6]:
config = dict(
    sheet_id="Main Vaccine Data Check (6)",
    value_index=3,
    value_key="aliasIndices",
    type_index=2,
    type_key="aliasIndices",
    label_index=1,
    label_key="aliasIndices",
)

Request the page

In [7]:
response = requests.get(url, params={":embed": "y", ":showVizHome": "no"})

Parse the HTML

In [8]:
soup = BeautifulSoup(response.text, "html.parser")

Zero in on the part of the page with the URL where we can get the data. It's in JSON format

In [9]:
json_string = soup.find("textarea", {"id": "tsConfigContainer"}).text

Parse it.

In [10]:
context = json.loads(json_string)

Pull out the URL

In [11]:
data_url = (
    f'{host}{context["vizql_root"]}/bootstrapSession/sessions/{context["sessionid"]}'
)

Then download the raw data, clean it up, and turn it into usable dictionaries.

In [12]:
response = requests.post(data_url, data={"sheet_id": config["sheet_id"]})

In [13]:
raw_text = response.text

In [14]:
json_pieces = [json.loads(d) for d in re.split("\d{2,10};(?={.+})", raw_text) if len(d)]

In [15]:
root = next(d for d in json_pieces if "secondaryInfo" in d)

In [16]:
data = root["secondaryInfo"]["presModelMap"]

Build our value lookup.

In [17]:
value_columns = data["dataDictionary"]["presModelHolder"]["genDataDictionaryPresModel"][
    "dataSegments"
]["0"]["dataColumns"]

In [18]:
lookup = {d["dataType"]: d["dataValues"] for d in value_columns}

Download the embed so we can scrape it and find the VizQL root ID to build our query.

In [19]:
response = requests.get(url, params={":embed": "y", ":showVizHome": "no"})

In [20]:
soup = BeautifulSoup(response.text, "html.parser")

In [21]:
context = json.loads(soup.find("textarea", {"id": "tsConfigContainer"}).text)

In [22]:
data_url = (
    f'{host}{context["vizql_root"]}/bootstrapSession/sessions/{context["sessionid"]}'
)

Then download the raw data, clean it up, and turn it into usable dictionaries.

In [23]:
response = requests.post(data_url, data={"sheet_id": config["sheet_id"]})

In [24]:
raw_text = response.text

In [25]:
json_pieces = [json.loads(d) for d in re.split("\d{2,10};(?={.+})", raw_text) if len(d)]

In [26]:
root = next(d for d in json_pieces if "secondaryInfo" in d)

In [27]:
data = root["secondaryInfo"]["presModelMap"]

Build our value lookup.

In [28]:
value_columns = data["dataDictionary"]["presModelHolder"]["genDataDictionaryPresModel"][
    "dataSegments"
]["0"]["dataColumns"]

In [29]:
lookup = {d["dataType"]: d["dataValues"] for d in value_columns}

Pull out the columns of indexes so we can run them against our lookup.

In [30]:
pres_model_map = data["vizData"]["presModelHolder"]["genPresModelMapPresModel"][
    "presModelMap"
]

In [31]:
columns = pres_model_map[config["sheet_id"]]["presModelHolder"]["genVizDataPresModel"][
    "paneColumnsData"
]["paneColumnsList"][0]["vizPaneColumns"]

Using our variables from above, pull out the lists of indexes we need.

In [32]:
labels_column = columns[config["label_index"]][config["label_key"]]

In [33]:
types_column = columns[config["type_index"]][config["type_key"]]

In [34]:
values_column = columns[config["value_index"]][config["value_key"]]

In [35]:
values_column_b = []
for val in values_column:
    values_column_b.append(abs(val) - 1)

Run each one through our lookup.

In [36]:
labels = [lookup["cstring"][label] for label in labels_column]

In [37]:
types = [lookup["cstring"][type] for type in types_column]

In [38]:
values_start = len(labels) + 4

In [39]:
values_end = len(labels)

In [40]:
values = [lookup["cstring"][value] for value in values_column_b]

In [41]:
try:
    assert len(labels) == len(values)
except AssertionError:
    raise AssertionError("Labels and values list do not add up")

`zip` and convert them to a `dict` so they are key/value'ed.

In [42]:
data = [
    {"agency": label, "doses_on_hand": value}
    for label, value in (sorted(zip(labels, values), key=lambda d: d[0]))
]

## Transform

Convert into a pandas DataFrame

In [43]:
df = pd.DataFrame(data)

Convert strings to integers

In [44]:
def safeint(s):
    """
    Convert the provided string to an integer. Return it.
    """
    s = s.replace(",", "")
    return int(s)

In [45]:
df["doses_on_hand"] = df["doses_on_hand"].apply(safeint)

## Export

Save out the data as a CSV that's datestamped to California time.

In [46]:
tz = pytz.timezone("America/Los_Angeles")

In [47]:
today = datetime.now(tz).date()

In [48]:
data_dir = os.path.join(os.path.abspath(""), "data")

In [49]:
df.sort_values("agency").to_csv(os.path.join(data_dir, f"{today}.csv"), index=False)