Code formatting with [black](https://pypi.org/project/nb-black/).

The lab_black extension is already loaded. To reload it, use:
  %reload_ext lab_black


Add our `utils` directory to the system's `$PATH` so we can import Python files from sibling directories.

In [27]:
import os
import sys
import pathlib

In [28]:
this_dir = pathlib.Path(os.path.abspath(""))

In [29]:
data_dir = this_dir / "data"

In [30]:
import requests
import pandas as pd
from bs4 import BeautifulSoup

Retrieve the page

In [31]:
url = "https://www.ice.gov/coronavirus#tab1"

In [32]:
page = requests.get(url)

Parse it.

In [33]:
soup = BeautifulSoup(page.content, "html.parser")

Get the timestamp.

In [34]:
content = soup.find("caption")

In [35]:
date_string = content.find("div", {"style": "color: #676767; padding:10px;"}).text

In [36]:
date = pd.to_datetime(date_string.split()[-1])

Focus in on the tables.

In [37]:
table_list = soup.find_all("table")

Find the right table.

In [38]:
tbody = table_list[1].tbody

In [39]:
all_info = tbody.find_all("tr")

In [40]:
def get_text(x):
    return x.text

In [41]:
def get_vals(a):
    return [get_text(a[x]) for x in range(0, len(a))]

In [42]:
ice_list = []
for x in range(0, len(all_info)):
    vals = get_vals(all_info[x].find_all("td"))
    ice_list.append(vals)

In [43]:
ice_df = pd.DataFrame(ice_list)

In [44]:
ice_df = ice_df.rename(
    columns={
        0: "custody_aor_facility",
        1: "confirmed_isolation_or_monitoring",
        2: "deaths",
        3: "total_confirmed",
    }
)

Add the last updated date.

In [45]:
ice_df["date"] = pd.to_datetime(date)

Drop header rows.

In [46]:
ice_df.dropna(subset=["deaths"], inplace=True)

Read in exisiting raw data to append new entries

In [56]:
existing = pd.read_csv(data_dir / "timeseries.csv", parse_dates=["date"])

In [57]:
if ice_df["date"].unique() in existing["date"].unique():
    pass
else:
    existing = pd.concat([existing, ice_df])

Write out combined - or not depending on statement above

In [61]:
existing.to_csv(data_dir / "timeseries.csv", index=False)