## Configuration
_Initial steps to get the notebook ready to play nice with our repository. Do not delete this section._

Code formatting with [black](https://pypi.org/project/nb-black/).

In [1]:
%load_ext lab_black

In [2]:
import os
import pytz
import glob
import pathlib

this_dir = pathlib.Path(os.path.abspath(""))
data_dir = this_dir / "data"

In [3]:
import requests
import pandas as pd
from datetime import datetime

## Download

Retrieve the page

In [4]:
url = "https://services3.arcgis.com/JmPiYilyU1x5zuxM/arcgis/rest/services/CoronavirusCases_current/FeatureServer/0/query?f=json&where=name%20%3D%20%27Siskiyou%20County%27&returnGeometry=false&spatialRel=esriSpatialRelIntersects&outFields=*&resultOffset=0&resultRecordCount=50&resultType=standard&cacheHint=true"

In [5]:
r = requests.get(url)

In [6]:
data = r.json()

## Parse

In [7]:
d = data["features"][0]["attributes"]

In [8]:
df = pd.DataFrame(d.items(), columns=["area", "confirmed_cases"])

Map of the rows we want

In [9]:
m = {
    "casesregion1": "North",
    "casesregion2": "South",
    "casesregion3": "East",
    "casesregion4": "West",
}

In [10]:
regions = ["casesregion1", "casesregion2", "casesregion3", "casesregion4"]

In [11]:
trim_df = df[df["area"].isin(regions)]

In [12]:
trim_df["area"] = trim_df["area"].map(m)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  trim_df["area"] = trim_df["area"].map(m)


Get timestamp

In [13]:
timestamp = data["features"][0]["attributes"]["EditDate_1599078092734"]

In [14]:
timestamp = datetime.fromtimestamp((timestamp / 1000))

In [15]:
latest_date = pd.to_datetime(timestamp).date()

In [16]:
trim_df["county_date"] = latest_date

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  trim_df["county_date"] = latest_date


Packin a county name for export

In [17]:
trim_df.insert(0, "county", "Siskiyou")

## Vet

In [18]:
try:
    assert not len(trim_df) < 4
except AssertionError:
    raise AssertionError("Siskiyou County's scraper is missing rows")

In [19]:
try:
    assert not len(trim_df) > 4
except AssertionError:
    raise AssertionError("Siskiyou County's scraper has more rows than before")

## Export

Set date

In [20]:
tz = pytz.timezone("America/Los_Angeles")

In [21]:
today = datetime.now(tz).date()

In [22]:
slug = "siskiyou"

In [23]:
trim_df.to_csv(data_dir / slug / f"{today}.csv", index=False)

## Combine

In [24]:
csv_list = [
    i
    for i in glob.glob(str(data_dir / slug / "*.csv"))
    if not str(i).endswith("timeseries.csv")
]

In [25]:
df_list = []
for csv in csv_list:
    if "manual" in csv:
        df = pd.read_csv(csv, parse_dates=["date"])
    else:
        file_date = csv.split("/")[-1].replace(".csv", "")
        df = pd.read_csv(csv, parse_dates=["county_date"])
        df["date"] = file_date
    df_list.append(df)

In [26]:
df = pd.concat(df_list).sort_values(["date", "area"])

In [27]:
df.to_csv(data_dir / slug / "timeseries.csv", index=False)