covid_hosp is inserting data under the wrong columns

Andrew found the following problem in the covid_hosp endpoint:

Alaska has a fairly light caseload. However, you wouldn't know it from the Epidata response:

```
In [5]: Epidata.covid_hosp("ak", Epidata.range(20210117,20210117))["epidata"][0]["previous_day_admission_adult_covid_confirmed"]                                     
Out[5]: 1715
```

Indeed, when you check the source file from HHS for the same date, we see a much smaller number:

```
In [24]: df = pd.read_csv("/home/andrew/Downloads/reported_hospital_utilization_timeseries_20210118_1604.csv")
In [25]: df.loc[(df.state=="AK") & (df.date=="2021-01-17"), "previous_day_admission_adult_covid_confirmed"]
Out[25]: 
53    5.0
Name: previous_day_admission_adult_covid_confirmed, dtype: float64
```

Suspiciously, the value 6 columns away (`inpatient_beds`) matches the 1715 figure:

```
hospital_onset_covid                                                                   9.0
hospital_onset_covid_coverage                                                           24
inpatient_beds                                                                      1715.0
inpatient_beds_coverage                                                                 24
inpatient_beds_used                                                                  895.0
inpatient_beds_used_coverage                                                            24
inpatient_beds_used_covid                                                             58.0
inpatient_beds_used_covid_coverage                                                      24
previous_day_admission_adult_covid_confirmed                                           5.0
previous_day_admission_adult_covid_confirmed_coverage                                   24
previous_day_admission_adult_covid_suspected                                           1.0
```

It seems this problem was introduced when we added the 6 critical staffing columns to the database on 13 January 2021 to comply with HHS data file format version 2.4.

----
**Recommended implementation:**

The current code assumes a fixed column order in the database. However, this is not compatible with the ALTER TABLE ADD COLUMN migration we did on 13 January, which did not include positioning information. There are two ways to fix this:

1. do another migration to put the columns in the expected order
2. make the code robust to varying column order

We'd like to go with (2). 

To that end, we will:
* convert the `ORDERED_CSV_COLUMNS` definition in `state_timeseries/database.py` to a mapping from csv column header to a tuple of sql column name and data type
* use the mapping to convert csv-named columns to database-named columns
* modify the database insertion routines to accept column names

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

covid_hosp is inserting data under the wrong columns #387

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

covid_hosp is inserting data under the wrong columns #387

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions