-
Notifications
You must be signed in to change notification settings - Fork 16
New indicator: test positivity and volume from DSEW Community Profile Report #1363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
431d8c1
Initial add of community profile indicator
krivard ff3b7da
Changes to support all known backissues
krivard af692a6
Documentation
krivard 07559f3
[dsew] Add suggested dynamic validator params
krivard ac57ecc
Fixes from code review
krivard e363c06
Actually fix rate aggregation this time
krivard 691e3e3
remove whitespace
krivard ecd4e4c
initial add hospital admissions
nmdefries f93c2ed
simplify date keys logic
nmdefries 2ad52eb
fix state to nation calc
nmdefries cfbb8a9
linting and tests
nmdefries 691727e
automate tests
nmdefries 3564a39
keep most recent publish date
nmdefries 9e2ab62
fix rate weight calculation when multiple days present
nmdefries 1546af6
add production params
nmdefries 671ae9c
make weight normalization more robust to number states per day
nmdefries 0090c01
prevent errors from missing hosp admissions before early jan 2021
nmdefries f07792a
Merge pull request #1466 from cmu-delphi/ndefries/community_profile_hosp
krivard d7cc1be
Merge branch 'main' into krivard/community_profile
nmdefries 481d875
[cpr] fix bugs in export date handling
krivard 7c41663
Merge branch 'krivard/community_profile' of github.com:cmu-delphi/cov…
krivard a58388c
Update Makefile for new ci
krivard 2be28c5
Update package description
krivard File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| { | ||
| "common": { | ||
| "export_dir": "./receiving", | ||
| "log_filename": "dsew_cpr.log" | ||
| }, | ||
| "indicator": { | ||
| "input_cache": "./input_cache", | ||
| "reports": "new" | ||
| }, | ||
| "validation": { | ||
| "common": { | ||
| "data_source": "dsew_cpr", | ||
| "span_length": 14, | ||
| "min_expected_lag": {"all": "5"}, | ||
| "max_expected_lag": {"all": "9"}, | ||
| "dry_run": true, | ||
| "suppressed_errors": [] | ||
| }, | ||
| "static": { | ||
| "minimum_sample_size": 0, | ||
| "missing_se_allowed": true, | ||
| "missing_sample_size_allowed": true | ||
| }, | ||
| "dynamic": { | ||
| "ref_window_size": 7, | ||
| "smoothed_signals": [ | ||
| "naats_total_7dav", | ||
| "naats_positivity_7dav" | ||
| ] | ||
| } | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
|
|
||
| [MESSAGES CONTROL] | ||
|
|
||
| disable=logging-format-interpolation, | ||
| too-many-locals, | ||
| too-many-arguments, | ||
| # Allow pytest functions to be part of a class. | ||
| no-self-use, | ||
| # Allow pytest classes to have one test. | ||
| too-few-public-methods | ||
|
|
||
| [BASIC] | ||
|
|
||
| # Allow arbitrarily short-named variables. | ||
| variable-rgx=[a-z_][a-z0-9_]* | ||
| argument-rgx=[a-z_][a-z0-9_]* | ||
| attr-rgx=[a-z_][a-z0-9_]* | ||
|
|
||
| [DESIGN] | ||
|
|
||
| # Don't complain about pytest "unused" arguments. | ||
| ignored-argument-names=(_.*|run_as_module) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,133 @@ | ||
| # Dataset layout | ||
|
|
||
| The Data Strategy and Execution Workgroup (DSEW) publishes a Community Profile | ||
| Report each weekday, comprising a pair of files: an Excel workbook (.xlsx) and a | ||
| PDF which shows select metrics from the workbook as time series charts and | ||
| choropleth maps. These files are listed as attachments on the healthdata.gov | ||
| site: | ||
|
|
||
| https://healthdata.gov/Health/COVID-19-Community-Profile-Report/gqxm-d9w9 | ||
|
|
||
| Each Excel file attachment has a filename. The filename contains a date, | ||
| presumably the publish date. The attachment also has an alphanumeric | ||
| assetId. Both the filename and the assetId are required for downloading the | ||
| file. Whether this means that updated versions of a particular file may be | ||
| uploaded by DSEW at later times is not known. The attachment does not explicitly | ||
| list an upload timestamp. To be safe, we cache our downloads using both the | ||
| assetId and the filename. | ||
|
|
||
| # Workbook layout | ||
|
|
||
| Each Excel file is a workbook with multiple sheets. The exemplar file used in | ||
| writing this indicator is "Community Profile Report 20211102.xlsx". The sheets | ||
| include: | ||
|
|
||
| - User Notes: Instructions for using the workbook | ||
| - Overview: US National figures for the last 5 weeks, plus monthly peaks back to | ||
| April 2020 | ||
| - Regions*: Figures for FEMA regions (double-checked: they match HHS regions | ||
| except that FEMA 2 does not include Palau while HHS 2 does) | ||
| - States*: Figures for US states and territories | ||
| - CBSAs*: Figures for US Census Block Statistical Areas | ||
| - Counties*: Figures for US counties | ||
| - Weekly Transmission Categories: Lists of high, substantial, and moderate | ||
| transmission states and territories | ||
| - National Peaks: Monthly national peaks back to April 2020 | ||
| - National Historic: Daily national figures back to January 22 2020 | ||
| - Data Notes: Source and methods information for all metrics | ||
| - Color Thresholds: Color-coding is used extensively in all sheets; these are | ||
| the keys | ||
|
|
||
| The starred sheets above have nearly-identical column layouts, and together | ||
| cover the county, MSA, state, and HHS geographical levels used in | ||
| covidcast. Rather than aggregate them ourselves and risk a mismatch, this | ||
| indicator lifts these geographical aggregations directly from the corresponding | ||
| sheets of the workbook. | ||
|
|
||
| GeoMapper _is_ used to generate national figures from | ||
| state, due to architectural differences between the starred sheets and the | ||
| Overview sheet. If we discover that our nation-level figures differ too much | ||
| from those listed in the Overview sheet, we can add dedicated parsing for the | ||
| Overview sheet and remove GeoMapper from this indicator altogether. | ||
|
|
||
| # Sheet layout | ||
|
|
||
| ## Headers | ||
|
|
||
| Each starred sheet has two rows of headers. The first row uses merged cells to | ||
| group several columns together under a single "overheader". This overheader | ||
| often includes the reference period for that group of columns, such as: | ||
|
|
||
| - CASES/DEATHS: LAST WEEK (October 26-November 1) | ||
| - TESTING: LAST WEEK (October 24-30, Test Volume October 20-26) | ||
| - TESTING: PREVIOUS WEEK (October 17-23, Test Volume October 13-19) | ||
|
|
||
| Overheaders have changed periodically since the first report. For example, the | ||
| "TESTING: LAST WEEK" overheader above has also appeared as "VIRAL (RT-PCR) LAB | ||
| TESTING: LAST WEEK", with and without a separate reference date for Test | ||
| Volume. All known overheader forms are checked in test_pull.py. | ||
|
|
||
| The second row contains a header for each column. The headers uniquely identify | ||
| each column included in the sheet. Column headers include spaces, and typically | ||
| specify both the metric and the reference period over which it was calculated, | ||
| such as: | ||
|
|
||
| - Total NAATs - last 7 days (may be an underestimate due to delayed reporting) | ||
| - NAAT positivity rate - previous 7 days (may be an underestimate due to delayed | ||
| reporting) | ||
|
|
||
| Columns headers have also changed periodically since the first report. For | ||
| example, the "Total NAATs - last 7 days" header above has also appeared as | ||
| "Total RT-PCR diagnostic tests - last 7 days". | ||
|
|
||
| ## Contents | ||
|
|
||
| Each starred sheet contains test positivity and total test volume figures for | ||
| two reference periods, "last [week]" and "previous [week]". In some reports, the | ||
| reference periods for test positivity and total test volume are the same; in | ||
| others, they are different, such that the report contains figures for four | ||
| distinct reference periods, two for each metric we extract. | ||
|
|
||
| # Time series conversions and parsing notes | ||
|
|
||
| ## Reference date | ||
|
|
||
| The reference period in the overheader never includes the year. We guess the | ||
| reference year by picking the same year as the publish date (i.e., the date | ||
| extracted from the filename), and if the reference month is greater than the | ||
| publish month, subtract 1 from the reference year. This adequately covers the | ||
| December-January boundary. | ||
|
|
||
| We select as reference date the end date of the reference period for each | ||
| metric. Reference periods are always 7 days, so this indicator produces | ||
| seven-day averages. We divide the total testing volume by seven and leave the | ||
| test positivity alone. | ||
|
|
||
| ## Geo ID | ||
|
|
||
| The Counties sheet lists FIPS codes numerically, such that FIPS with a leading | ||
| zero only have four digits. We fix this by zero-filling to five characters. | ||
|
|
||
| MSAs are a subset of CBSAs. We fix this by selecting only CBSAs with type | ||
| "Metropolitan". | ||
|
|
||
| Most of the starred sheets have the geo id as the first non-index column. The | ||
| Region sheet has no such column. We fix this by generating the HHS ids from the | ||
| index column instead. | ||
|
|
||
| ## Combining multiple reports | ||
|
|
||
| Each report file generates two reference dates for each metric, up to four | ||
| reference dates total. Since it's not clear whether new versions of past files | ||
| are ever made available, the default mode (params.indicator.reports="new") | ||
| fetches any files that are not already in the input cache, then combines the | ||
| results into a single data frame before exporting. This will generate correct | ||
| behavior should (for instance) a previously-downloaded file get a new assetId. | ||
|
|
||
| For the initial run on an empty input cache, and for runs configured to process | ||
| a range of reports (using params.indicator.reports=YYYY-mm-dd--YYYY-mm-dd), this | ||
| indicator makes no distinction between figures that came from different | ||
| reports. That may not be what you want. If the covidcast issue date needs to | ||
| match the date on the report filename, then the indicator must instead be run | ||
| repeatedly, with equal start and end dates, keeping the output of each run | ||
| separate. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,30 @@ | ||
| .PHONY = venv, lint, test, clean | ||
|
|
||
| dir = $(shell find ./delphi_* -name __init__.py | grep -o 'delphi_[_[:alnum:]]*') | ||
|
|
||
| venv: | ||
| python3.8 -m venv env | ||
|
|
||
| install: venv | ||
| . env/bin/activate; \ | ||
| pip install wheel ; \ | ||
| pip install -e ../_delphi_utils_python ;\ | ||
| pip install -e . | ||
|
|
||
| install-ci: venv | ||
| . env/bin/activate; \ | ||
| pip install wheel ; \ | ||
| pip install ../_delphi_utils_python ;\ | ||
| pip install . | ||
|
|
||
| lint: | ||
| . env/bin/activate; pylint $(dir) | ||
| . env/bin/activate; pydocstyle $(dir) | ||
|
|
||
| test: | ||
| . env/bin/activate ;\ | ||
| (cd tests && ../env/bin/pytest --cov=$(dir) --cov-report=term-missing) | ||
|
|
||
| clean: | ||
| rm -rf env | ||
| rm -f params.json |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| # COVID-19 Community Profile Report | ||
|
|
||
| The Data Strategy and Execution Workgroup (DSEW) publishes a Community Profile | ||
| Report each weekday at this location: | ||
|
|
||
| https://healthdata.gov/Health/COVID-19-Community-Profile-Report/gqxm-d9w9 | ||
|
|
||
| This indicator extracts COVID-19 test figures from these reports. | ||
|
|
||
| Indicator-specific parameters: | ||
|
|
||
| * `input_cache`: a directory where Excel (.xlsx) files downloaded from | ||
| healthdata.gov will be stored for posterity. Each file is 3.3 MB in size, so | ||
| we expect this directory to require ~1GB of disk space for each year of | ||
| operation. | ||
| * `reports`: {new | all | YYYY-mm-dd--YYYY-mm-dd} a string indicating which | ||
| reports to export. The default, "new", downloads and exports only reports not | ||
| already found in the input cache. The "all" setting exports data for all | ||
| available reports, downloading them to the input cache if necessary. The date | ||
| range setting refers to the date listed in the filename for the report, | ||
| presumably the publish date. Only reports named with a date within the | ||
| specified range (inclusive) will be downloaded to the input cache if necessary | ||
| and exported. | ||
| * `export_start_date`: a YYYY-mm-dd string indicating the first date to export. | ||
| * `export_end_date`: a YYYY-mm-dd string indicating the final date to export. | ||
|
|
||
| ## Running the Indicator | ||
|
|
||
| The indicator is run by directly executing the Python module contained in this | ||
| directory. The safest way to do this is to create a virtual environment, | ||
| installed the common DELPHI tools, and then install the module and its | ||
| dependencies. To do this, run the following command from this directory: | ||
|
|
||
| ``` | ||
| make install | ||
| ``` | ||
|
|
||
| This command will install the package in editable mode, so you can make changes that | ||
| will automatically propagate to the installed package. | ||
|
|
||
| All of the user-changable parameters are stored in `params.json`. To execute | ||
| the module and produce the output datasets (by default, in `receiving`), run | ||
| the following: | ||
|
|
||
| ``` | ||
| env/bin/python -m delphi_dsew_community_profile | ||
| ``` | ||
|
|
||
| If you want to enter the virtual environment in your shell, | ||
| you can run `source env/bin/activate`. Run `deactivate` to leave the virtual environment. | ||
|
|
||
| Once you are finished, you can remove the virtual environment and | ||
| params file with the following: | ||
|
|
||
| ``` | ||
| make clean | ||
| ``` | ||
|
|
||
| ## Testing the code | ||
|
|
||
| To run static tests of the code style, run the following command: | ||
|
|
||
| ``` | ||
| make lint | ||
| ``` | ||
|
|
||
| Unit tests are also included in the module. To execute these, run the following | ||
| command from this directory: | ||
|
|
||
| ``` | ||
| make test | ||
| ``` | ||
|
|
||
| To run individual tests, run the following: | ||
|
|
||
| ``` | ||
| (cd tests && ../env/bin/pytest <your_test>.py --cov=delphi_dsew_community_profile --cov-report=term-missing) | ||
| ``` | ||
|
|
||
| The output will show the number of unit tests that passed and failed, along | ||
| with the percentage of code covered by the tests. | ||
|
|
||
| None of the linting or unit tests should fail, and the code lines that are not covered by unit tests should be small and | ||
| should not include critical sub-routines. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,38 @@ | ||
| ## Code Review (Python) | ||
|
|
||
| A code review of this module should include a careful look at the code and the | ||
| output. To assist in the process, but certainly not in replace of it, please | ||
| check the following items. | ||
|
|
||
| **Documentation** | ||
|
|
||
| - [ ] the README.md file template is filled out and currently accurate; it is | ||
| possible to load and test the code using only the instructions given | ||
| - [ ] minimal docstrings (one line describing what the function does) are | ||
| included for all functions; full docstrings describing the inputs and expected | ||
| outputs should be given for non-trivial functions | ||
|
|
||
| **Structure** | ||
|
|
||
| - [ ] code should pass lint checks (`make lint`) | ||
| - [ ] any required metadata files are checked into the repository and placed | ||
| within the directory `static` | ||
| - [ ] any intermediate files that are created and stored by the module should | ||
| be placed in the directory `cache` | ||
| - [ ] final expected output files to be uploaded to the API are placed in the | ||
| `receiving` directory; output files should not be committed to the respository | ||
| - [ ] all options and API keys are passed through the file `params.json` | ||
| - [ ] template parameter file (`params.json.template`) is checked into the | ||
| code; no personal (i.e., usernames) or private (i.e., API keys) information is | ||
| included in this template file | ||
|
|
||
| **Testing** | ||
|
|
||
| - [ ] module can be installed in a new virtual environment (`make install`) | ||
| - [ ] reasonably high level of unit test coverage covering all of the main logic | ||
| of the code (e.g., missing coverage for raised errors that do not currently seem | ||
| possible to reach are okay; missing coverage for options that will be needed are | ||
| not) | ||
| - [ ] all unit tests run without errors (`make test`) | ||
| - [ ] indicator directory has been added to GitHub CI | ||
| (`covidcast-indicators/.github/workflows/python-ci.yml`) |
Empty file.
13 changes: 13 additions & 0 deletions
13
dsew_community_profile/delphi_dsew_community_profile/__init__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| # -*- coding: utf-8 -*- | ||
| """Module to pull and clean indicators from the XXXXX source. | ||
|
|
||
| This file defines the functions that are made public by the module. As the | ||
| module is intended to be executed though the main method, these are primarily | ||
| for testing. | ||
| """ | ||
|
|
||
| from __future__ import absolute_import | ||
|
|
||
| from . import run | ||
|
|
||
| __version__ = "0.1.0" |
12 changes: 12 additions & 0 deletions
12
dsew_community_profile/delphi_dsew_community_profile/__main__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| # -*- coding: utf-8 -*- | ||
| """Call the function run_module when executed. | ||
|
|
||
| This file indicates that calling the module (`python -m delphi_dsew_community_profile`) will | ||
| call the function `run_module` found within the run.py file. There should be | ||
| no need to change this template. | ||
| """ | ||
|
|
||
| from delphi_utils import read_params | ||
| from .run import run_module # pragma: no cover | ||
|
|
||
| run_module(read_params()) # pragma: no cover |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you thought about what the threshold for "differs too much" is? Since we have all the historical data, we can probably decide now whether or not to add dedicated national parsing so we don't have to do any backfilling later on.