Code formatting with [black](https://pypi.org/project/nb-black/).

In [1]:
%load_ext lab_black

Add our `utils` directory to the system's `$PATH` so we can import Python files from sibling directories.

In [2]:
import os
import sys
import pathlib

In [3]:
this_dir = pathlib.Path(os.path.abspath(""))

In [4]:
data_dir = this_dir / "data"

In [5]:
import requests
import pandas as pd
from bs4 import BeautifulSoup

Retrieve the page

In [6]:
url = "https://www.ice.gov/coronavirus#tab1"

In [7]:
page = requests.get(url)

Parse it.

In [8]:
soup = BeautifulSoup(page.content, "html.parser")

Get the timestamp.

In [9]:
content = soup.find("caption")

In [10]:
date_string = content.find("div", {"style": "color: #676767; padding:10px;"}).text

In [11]:
date = pd.to_datetime(date_string.split()[-1])

Focus in on the tables.

In [13]:
table_list = soup.find_all("table")

Find the right table.

In [14]:
tbody = table_list[1].tbody

In [15]:
all_info = tbody.find_all("tr")

In [16]:
def get_text(x):
    return x.text

In [17]:
def get_vals(a):
    return [get_text(a[x]) for x in range(0, len(a))]

In [18]:
ice_list = []
for x in range(0, len(all_info)):
    vals = get_vals(all_info[x].find_all("td"))
    ice_list.append(vals)

In [19]:
ice_df = pd.DataFrame(ice_list)

In [20]:
ice_df = ice_df.rename(
    columns={
        0: "custody_aor_facility",
        1: "confirmed_isolation_or_monitoring",
        2: "deaths",
        3: "total_confirmed",
    }
)

Add the last updated date.

In [21]:
ice_df["date"] = pd.to_datetime(date)

Drop header rows.

In [22]:
ice_df.dropna(subset=["deaths"], inplace=True)

Read in exisiting raw data to append new entries

In [23]:
existing = pd.read_csv(data_dir / "timeseries.csv", parse_dates=["date"])

In [24]:
if ice_df["date"].unique() in existing["date"].unique():
    pass
else:
    existing = pd.concat([existing, ice_df])

Write out combined - or not depending on statement above

In [25]:
existing.to_csv(data_dir / "timeseries.csv")