-
Notifications
You must be signed in to change notification settings - Fork 67
Description
Andrew found the following problem in the covid_hosp endpoint:
Alaska has a fairly light caseload. However, you wouldn't know it from the Epidata response:
In [5]: Epidata.covid_hosp("ak", Epidata.range(20210117,20210117))["epidata"][0]["previous_day_admission_adult_covid_confirmed"]
Out[5]: 1715
Indeed, when you check the source file from HHS for the same date, we see a much smaller number:
In [24]: df = pd.read_csv("/home/andrew/Downloads/reported_hospital_utilization_timeseries_20210118_1604.csv")
In [25]: df.loc[(df.state=="AK") & (df.date=="2021-01-17"), "previous_day_admission_adult_covid_confirmed"]
Out[25]:
53 5.0
Name: previous_day_admission_adult_covid_confirmed, dtype: float64
Suspiciously, the value 6 columns away (inpatient_beds) matches the 1715 figure:
hospital_onset_covid 9.0
hospital_onset_covid_coverage 24
inpatient_beds 1715.0
inpatient_beds_coverage 24
inpatient_beds_used 895.0
inpatient_beds_used_coverage 24
inpatient_beds_used_covid 58.0
inpatient_beds_used_covid_coverage 24
previous_day_admission_adult_covid_confirmed 5.0
previous_day_admission_adult_covid_confirmed_coverage 24
previous_day_admission_adult_covid_suspected 1.0
It seems this problem was introduced when we added the 6 critical staffing columns to the database on 13 January 2021 to comply with HHS data file format version 2.4.
Recommended implementation:
The current code assumes a fixed column order in the database. However, this is not compatible with the ALTER TABLE ADD COLUMN migration we did on 13 January, which did not include positioning information. There are two ways to fix this:
- do another migration to put the columns in the expected order
- make the code robust to varying column order
We'd like to go with (2).
To that end, we will:
- convert the
ORDERED_CSV_COLUMNSdefinition instate_timeseries/database.pyto a mapping from csv column header to a tuple of sql column name and data type - use the mapping to convert csv-named columns to database-named columns
- modify the database insertion routines to accept column names