Skip to content

covid_hosp is inserting data under the wrong columns #387

@krivard

Description

@krivard

Andrew found the following problem in the covid_hosp endpoint:

Alaska has a fairly light caseload. However, you wouldn't know it from the Epidata response:

In [5]: Epidata.covid_hosp("ak", Epidata.range(20210117,20210117))["epidata"][0]["previous_day_admission_adult_covid_confirmed"]                                     
Out[5]: 1715

Indeed, when you check the source file from HHS for the same date, we see a much smaller number:

In [24]: df = pd.read_csv("/home/andrew/Downloads/reported_hospital_utilization_timeseries_20210118_1604.csv")
In [25]: df.loc[(df.state=="AK") & (df.date=="2021-01-17"), "previous_day_admission_adult_covid_confirmed"]
Out[25]: 
53    5.0
Name: previous_day_admission_adult_covid_confirmed, dtype: float64

Suspiciously, the value 6 columns away (inpatient_beds) matches the 1715 figure:

hospital_onset_covid                                                                   9.0
hospital_onset_covid_coverage                                                           24
inpatient_beds                                                                      1715.0
inpatient_beds_coverage                                                                 24
inpatient_beds_used                                                                  895.0
inpatient_beds_used_coverage                                                            24
inpatient_beds_used_covid                                                             58.0
inpatient_beds_used_covid_coverage                                                      24
previous_day_admission_adult_covid_confirmed                                           5.0
previous_day_admission_adult_covid_confirmed_coverage                                   24
previous_day_admission_adult_covid_suspected                                           1.0

It seems this problem was introduced when we added the 6 critical staffing columns to the database on 13 January 2021 to comply with HHS data file format version 2.4.


Recommended implementation:

The current code assumes a fixed column order in the database. However, this is not compatible with the ALTER TABLE ADD COLUMN migration we did on 13 January, which did not include positioning information. There are two ways to fix this:

  1. do another migration to put the columns in the expected order
  2. make the code robust to varying column order

We'd like to go with (2).

To that end, we will:

  • convert the ORDERED_CSV_COLUMNS definition in state_timeseries/database.py to a mapping from csv column header to a tuple of sql column name and data type
  • use the mapping to convert csv-named columns to database-named columns
  • modify the database insertion routines to accept column names

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions