Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSEW CPR documentation does not indicate actual data source #911

Closed
capnrefsmmat opened this issue May 23, 2022 · 3 comments · Fixed by #912
Closed

DSEW CPR documentation does not indicate actual data source #911

capnrefsmmat opened this issue May 23, 2022 · 3 comments · Fixed by #912
Assignees

Comments

@capnrefsmmat
Copy link
Contributor

The DSEW CPR documentation says:

For more information, see the official description and data dictionary at healthdata.gov for “COVID-19 Community Profile Report”.

But that page just provides a PDF of slides of the report. It does not provide the data or data dictionary. I had to search HealthData.gov to find:

They report a different set of columns, so it'd be useful for the documentation to be explicit about where the data comes from and which data dictionary we should be looking at. Also, some of the signals are marked as "not available below state level"; does that mean they come from a different dataset from the two above, since one is county-level and one is national? Are we pulling from separate state-level reports?

@krivard
Copy link
Contributor

krivard commented May 23, 2022

The part of the healthdata.gov page that you are looking for is here:

image

The data for this indicator is made available as attachments. Each issue includes a PDF in the style of the preview (slides about the data) and a Microsoft Excel file which contains the actual data.

The data sources and methods are described in the Data Notes tab in each Excel file. Unfortunately, we have reason to question the veracity of those notes, since while they claim they get their hospital admissions figures from HHS, the CPR figures do not match the HHS ones. We and others have not been able to get a response on why this might be so.

It's true there is no data dictionary provided; I will remove that clause from the link text. I will also clarify the instructions for how to locate the data on the healthdata.gov page.

@krivard
Copy link
Contributor

krivard commented May 23, 2022

Regarding the sourcing for different geo-levels -- here is the language we currently use to explain this:

In the summary:

County, MSA, state, and HHS-level values are pulled directly from CPR when available; nation-level values are aggregated up from the state level.

In the Estimation section:

The confirmed_admissions_covid_1d_7dav signal mirrors the Confirmed COVID-19 admissions - last 7 days CPR field for all geographic resolutions except nation. Nation-level admissions is calculated by summing state-level values.

The doses_admin_7dav and booster_doses_admin_7dav signals mirror the Doses administered - last 7 days and Booster doses administered - last 7 days CPR fields for all geographic resolutions except nation. Nation-level doses are calculated by summing state-level values.

If that is not sufficient, can you suggest some language that would help make it clearer?

@capnrefsmmat
Copy link
Contributor Author

I was confused about the geographic level because I found the county-level CPR dataset, so I assumed we were pulling from that. Having not looked at the Excel files you're actually using, I don't know what's in them. So supposing I hadn't found the county-level dataset, I'd be less confused by the geographic levels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants