Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indicator cdc vaccines in progress #1312

Open
wants to merge 98 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
81bd84a
First pass of the CDC Indicator
Ananya-Joshi Sep 8, 2021
e459efc
added explicit dictionary creation
Ananya-Joshi Sep 9, 2021
f9a6329
added os import
Ananya-Joshi Sep 9, 2021
22303fa
Minor changes for the linter - tests pass locally
Ananya-Joshi Sep 9, 2021
93a9982
minor changes
Ananya-Joshi Sep 10, 2021
014da6d
Update cdc_vaccines/delphi_cdc_vaccines/__main__.py
Ananya-Joshi Sep 10, 2021
33fce0c
Update cdc_vaccines/delphi_cdc_vaccines/constants.py
Ananya-Joshi Sep 10, 2021
cfc4a3d
Update cdc_vaccines/README.md
Ananya-Joshi Sep 10, 2021
76843dd
Update cdc_vaccines/README.md
Ananya-Joshi Sep 10, 2021
09382c7
Update cdc_vaccines/params.json.template
Ananya-Joshi Sep 10, 2021
7f903bd
Update cdc_vaccines/README.md
Ananya-Joshi Sep 10, 2021
d05d5e2
Update cdc_vaccines/delphi_cdc_vaccines/run.py
Ananya-Joshi Sep 10, 2021
8618e55
Update cdc_vaccines/delphi_cdc_vaccines/pull.py
Ananya-Joshi Sep 10, 2021
e537b35
changes to the json file
Ananya-Joshi Sep 10, 2021
6054dcd
changed the signal name generation
Ananya-Joshi Sep 11, 2021
c847431
committed constants
Ananya-Joshi Sep 11, 2021
f4399d6
Modified run.py to have the right NaN codes
Ananya-Joshi Sep 17, 2021
e5db427
Update cdc_vaccines/README.md
Ananya-Joshi Sep 13, 2021
8f13b6a
Added appropriate NaN codes
Ananya-Joshi Sep 19, 2021
ded675f
Update cdc_vaccines/delphi_cdc_vaccines/run.py
Ananya-Joshi Sep 21, 2021
85d4717
added back appropriate nan codes
Ananya-Joshi Sep 21, 2021
211c317
changes to run.py
Ananya-Joshi Sep 21, 2021
0df9b81
adding test_run changes with new col names
Ananya-Joshi Oct 11, 2021
7f99565
Update cdc_vaccines/delphi_cdc_vaccines/run.py
Ananya-Joshi Oct 12, 2021
1bd6fac
Update cdc_vaccines/delphi_cdc_vaccines/run.py
Ananya-Joshi Oct 12, 2021
05c3cbf
Update cdc_vaccines/delphi_cdc_vaccines/run.py
Ananya-Joshi Oct 12, 2021
a14f23c
lint nit
Ananya-Joshi Oct 12, 2021
17352cf
Modifying for the changes in the base csv file from the CDC
Ananya-Joshi Oct 12, 2021
3d66880
Changes to the CDC Files and respective changes to tests
Ananya-Joshi Oct 12, 2021
3184ed2
First pass of the CDC Indicator
Ananya-Joshi Sep 8, 2021
69071be
added explicit dictionary creation
Ananya-Joshi Sep 9, 2021
b63e46d
added os import
Ananya-Joshi Sep 9, 2021
fe67af3
Minor changes for the linter - tests pass locally
Ananya-Joshi Sep 9, 2021
9284835
minor changes
Ananya-Joshi Sep 10, 2021
15eb876
Update cdc_vaccines/delphi_cdc_vaccines/__main__.py
Ananya-Joshi Sep 10, 2021
7a23d2f
Update cdc_vaccines/delphi_cdc_vaccines/constants.py
Ananya-Joshi Sep 10, 2021
0ee8c24
Update cdc_vaccines/README.md
Ananya-Joshi Sep 10, 2021
1997668
Update cdc_vaccines/README.md
Ananya-Joshi Sep 10, 2021
754bba3
Update cdc_vaccines/params.json.template
Ananya-Joshi Sep 10, 2021
33069b5
Update cdc_vaccines/README.md
Ananya-Joshi Sep 10, 2021
e524821
Update cdc_vaccines/delphi_cdc_vaccines/run.py
Ananya-Joshi Sep 10, 2021
8d1c2d2
Update cdc_vaccines/delphi_cdc_vaccines/pull.py
Ananya-Joshi Sep 10, 2021
0093537
changes to the json file
Ananya-Joshi Sep 10, 2021
253392c
changed the signal name generation
Ananya-Joshi Sep 11, 2021
30def22
committed constants
Ananya-Joshi Sep 11, 2021
f1edd0f
Modified run.py to have the right NaN codes
Ananya-Joshi Sep 17, 2021
6495845
Update cdc_vaccines/README.md
Ananya-Joshi Sep 13, 2021
e1ee433
Added appropriate NaN codes
Ananya-Joshi Sep 19, 2021
4aeb263
Update cdc_vaccines/delphi_cdc_vaccines/run.py
Ananya-Joshi Sep 21, 2021
5a9e67b
added back appropriate nan codes
Ananya-Joshi Sep 21, 2021
899f17e
changes to run.py
Ananya-Joshi Sep 21, 2021
ada5f10
adding test_run changes with new col names
Ananya-Joshi Oct 11, 2021
90034b3
Update cdc_vaccines/delphi_cdc_vaccines/run.py
Ananya-Joshi Oct 12, 2021
6616178
Update cdc_vaccines/delphi_cdc_vaccines/run.py
Ananya-Joshi Oct 12, 2021
0dc2b6b
Update cdc_vaccines/delphi_cdc_vaccines/run.py
Ananya-Joshi Oct 12, 2021
32067a7
lint nit
Ananya-Joshi Oct 12, 2021
387949b
Modifying for the changes in the base csv file from the CDC
Ananya-Joshi Oct 12, 2021
62e343a
Changes to the CDC Files and respective changes to tests
Ananya-Joshi Oct 12, 2021
6c8dbd5
Added an export start and end date
Ananya-Joshi Dec 21, 2021
853413d
First pass of the CDC Indicator
Ananya-Joshi Sep 8, 2021
3b622d7
added explicit dictionary creation
Ananya-Joshi Sep 9, 2021
68fe6f5
added os import
Ananya-Joshi Sep 9, 2021
24afea1
Minor changes for the linter - tests pass locally
Ananya-Joshi Sep 9, 2021
62410e1
minor changes
Ananya-Joshi Sep 10, 2021
0e6f6a7
Update cdc_vaccines/delphi_cdc_vaccines/__main__.py
Ananya-Joshi Sep 10, 2021
c2144e5
Update cdc_vaccines/delphi_cdc_vaccines/constants.py
Ananya-Joshi Sep 10, 2021
00c5967
Update cdc_vaccines/README.md
Ananya-Joshi Sep 10, 2021
cfed683
Update cdc_vaccines/README.md
Ananya-Joshi Sep 10, 2021
c341984
Update cdc_vaccines/params.json.template
Ananya-Joshi Sep 10, 2021
921dc40
Update cdc_vaccines/README.md
Ananya-Joshi Sep 10, 2021
1566359
Update cdc_vaccines/delphi_cdc_vaccines/run.py
Ananya-Joshi Sep 10, 2021
39f9df5
Update cdc_vaccines/delphi_cdc_vaccines/pull.py
Ananya-Joshi Sep 10, 2021
d42771c
changes to the json file
Ananya-Joshi Sep 10, 2021
4b776da
changed the signal name generation
Ananya-Joshi Sep 11, 2021
17a54dd
committed constants
Ananya-Joshi Sep 11, 2021
f162cf6
Modified run.py to have the right NaN codes
Ananya-Joshi Sep 17, 2021
242456c
Update cdc_vaccines/README.md
Ananya-Joshi Sep 13, 2021
a5b28ff
Added appropriate NaN codes
Ananya-Joshi Sep 19, 2021
991c34c
Update cdc_vaccines/delphi_cdc_vaccines/run.py
Ananya-Joshi Sep 21, 2021
e3d4217
added back appropriate nan codes
Ananya-Joshi Sep 21, 2021
ab53ede
changes to run.py
Ananya-Joshi Sep 21, 2021
b226bd5
adding test_run changes with new col names
Ananya-Joshi Oct 11, 2021
de8da7a
Update cdc_vaccines/delphi_cdc_vaccines/run.py
Ananya-Joshi Oct 12, 2021
cc8d5ac
Update cdc_vaccines/delphi_cdc_vaccines/run.py
Ananya-Joshi Oct 12, 2021
5d1376c
Update cdc_vaccines/delphi_cdc_vaccines/run.py
Ananya-Joshi Oct 12, 2021
e7949cc
lint nit
Ananya-Joshi Oct 12, 2021
82225c1
Modifying for the changes in the base csv file from the CDC
Ananya-Joshi Oct 12, 2021
6adcd92
Changes to the CDC Files and respective changes to tests
Ananya-Joshi Oct 12, 2021
cd4d12d
slight changes to start and end export date
Ananya-Joshi Dec 21, 2021
d627442
further small changes
Ananya-Joshi Dec 21, 2021
228fc59
cleaning up from rebase, still lint error
Ananya-Joshi Dec 21, 2021
23380db
Cdc vaccines: add basic nancodes
dshemetov Sep 29, 2021
3febaf7
Changed test to account for export start and end date
Ananya-Joshi Dec 22, 2021
b74eb5f
Added tests forexport_start_date & export_end_date
Ananya-Joshi Dec 22, 2021
3da0a85
Update cdc_vaccines/delphi_cdc_vaccines/pull.py
Ananya-Joshi Jan 14, 2022
768a589
Merge branch 'main' into indicator_cdc_vaccines_fixed
Ananya-Joshi Jan 14, 2022
ba89018
Fix test_pull issue
dshemetov Jan 14, 2022
0bf112e
Fix test_pull
dshemetov Jan 14, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/python-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
if: github.event.pull_request.draft == false
strategy:
matrix:
packages: [_delphi_utils_python, changehc, claims_hosp, combo_cases_and_deaths, covid_act_now, doctor_visits, google_symptoms, hhs_hosp, hhs_facilities, jhu, nchs_mortality, nowcast, quidel, quidel_covidtest, safegraph_patterns, sir_complainsalot, usafacts]
packages: [_delphi_utils_python, changehc, claims_hosp, combo_cases_and_deaths, covid_act_now, doctor_visits, google_symptoms, hhs_hosp, hhs_facilities, jhu, nchs_mortality, nowcast, quidel, quidel_covidtest, safegraph_patterns, sir_complainsalot, usafacts, cdc_vaccines]
defaults:
run:
working-directory: ${{ matrix.packages }}
Expand Down
22 changes: 22 additions & 0 deletions cdc_vaccines/.pylintrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@

[MESSAGES CONTROL]

disable=logging-format-interpolation,
too-many-locals,
too-many-arguments,
# Allow pytest functions to be part of a class.
no-self-use,
# Allow pytest classes to have one test.
too-few-public-methods

[BASIC]

# Allow arbitrarily short-named variables.
variable-rgx=[a-z_][a-z0-9_]*
argument-rgx=[a-z_][a-z0-9_]*
attr-rgx=[a-z_][a-z0-9_]*

[DESIGN]

# Don't complain about pytest "unused" arguments.
ignored-argument-names=(_.*|run_as_module)
29 changes: 29 additions & 0 deletions cdc_vaccines/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
.PHONY = venv, lint, test, clean

dir = $(shell find ./delphi_* -name __init__.py | grep -o 'delphi_[_[:alnum:]]*')

venv:
python3.8 -m venv env

install: venv
. env/bin/activate; \
pip install wheel ; \
pip install -e ../_delphi_utils_python ;\
pip install -e .

lint:
. env/bin/activate; pylint $(dir)
. env/bin/activate; pydocstyle $(dir)

test:
. env/bin/activate ;\
(cd tests && ../env/bin/pytest --cov=$(dir) --cov-report=term-missing)

clean:
rm -rf env
rm -f params.json

run:
env/bin/python -m $(dir)
env/bin/python -m delphi_utils.validator --dry_run
env/bin/python -m delphi_utils.archive
69 changes: 69 additions & 0 deletions cdc_vaccines/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# CDC Vaccinations

This indicator provides the official vaccination counts in the US. We export the county-level
daily vaccination rates data as-is, and publish the result as a COVIDcast signal.
We also aggregate the data to the MSA, HRR, State, HHS Region, and Nation levels.
For detailed information see the files DETAILS.md contained in this directory.

Note that individuals could be vaccinated outside of the US. Additionally,
there is no county level data for counties in Texas and Hawaii. Each state has some vaccination counts assigned to "unknown county". Some vaccination counts are assigned to "unknown state, unknown county".


## Running the Indicator

The indicator is run by directly executing the Python module contained in this
directory. The safest way to do this is to create a virtual environment,
installed the common DELPHI tools, and then install the module and its
dependencies. To do this, run the following command from this directory:

```
make install
```

This command will install the package in editable mode, so you can make changes that
will automatically propagate to the installed package.

All of the user-changable parameters are stored in `params.json`. To execute
the module and produce the output datasets (by default, in `receiving`), run
the following:

```
env/bin/python -m delphi_cdc_vaccines
```

If you want to enter the virtual environment in your shell,
you can run `source env/bin/activate`. Run `deactivate` to leave the virtual environment.

Once you are finished, you can remove the virtual environment and
params file with the following:

```
make clean
```

## Testing the code

To run static tests of the code style, run the following command:

```
make lint
```

Unit tests are also included in the module. To execute these, run the following
command from this directory:

```
make test
```

To run individual tests, run the following:

```
(cd tests && ../env/bin/pytest test_run.py --cov=delphi_ --cov-report=term-missing)
```

The output will show the number of unit tests that passed and failed, along
with the percentage of code covered by the tests.

None of the linting or unit tests should fail, and the code lines that are not covered by unit tests should be small and
should not include critical sub-routines.
38 changes: 38 additions & 0 deletions cdc_vaccines/REVIEW.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
## Code Review (Python)

A code review of this module should include a careful look at the code and the
output. To assist in the process, but certainly not in replace of it, please
check the following items.

**Documentation**

- [ ] the README.md file template is filled out and currently accurate; it is
possible to load and test the code using only the instructions given
- [ ] minimal docstrings (one line describing what the function does) are
included for all functions; full docstrings describing the inputs and expected
outputs should be given for non-trivial functions

**Structure**

- [ ] code should pass lint checks (`make lint`)
- [ ] any required metadata files are checked into the repository and placed
within the directory `static`
- [ ] any intermediate files that are created and stored by the module should
be placed in the directory `cache`
- [ ] final expected output files to be uploaded to the API are placed in the
`receiving` directory; output files should not be committed to the respository
- [ ] all options and API keys are passed through the file `params.json`
- [ ] template parameter file (`params.json.template`) is checked into the
code; no personal (i.e., usernames) or private (i.e., API keys) information is
included in this template file

**Testing**

- [ ] module can be installed in a new virtual environment (`make install`)
- [ ] reasonably high level of unit test coverage covering all of the main logic
of the code (e.g., missing coverage for raised errors that do not currently seem
possible to reach are okay; missing coverage for options that will be needed are
not)
- [ ] all unit tests run without errors (`make test`)
- [ ] indicator directory has been added to GitHub CI
(`covidcast-indicators/.github/workflows/python-ci.yml`)
Empty file added cdc_vaccines/cache/.gitignore
Empty file.
13 changes: 13 additions & 0 deletions cdc_vaccines/delphi_cdc_vaccines/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# -*- coding: utf-8 -*-
"""Module to pull and clean indicators from the CDC source.

This file defines the functions that are made public by the module. As the
module is intended to be executed though the main method, these are primarily
for testing.
"""

from __future__ import absolute_import
from . import pull
from . import run

__version__ = "0.1.0"
12 changes: 12 additions & 0 deletions cdc_vaccines/delphi_cdc_vaccines/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# -*- coding: utf-8 -*-
"""Call the function run_module when executed.

This file indicates that calling the module (`python -m delphi_cdc_vaccines`) will
call the function `run_module` found within the run.py file. There should be
no need to change this template.
"""

from delphi_utils import read_params
from .run import run_module # pragma: no cover

run_module(read_params()) # pragma: no cover
33 changes: 33 additions & 0 deletions cdc_vaccines/delphi_cdc_vaccines/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
"""Registry for variations."""

from itertools import product
from delphi_utils import Smoother


CUMULATIVE = 'cumulative'
INCIDENCE ='incidence'
FREQUENCY = [CUMULATIVE, INCIDENCE]
STATUS = ["tot", "part"]
AGE = ["", "_12P", "_18P", "_65P"]

SIGNALS = [f"{frequency}_counts_{status}_vaccine{AGE}" for
frequency, status, age in product(FREQUENCY, STATUS, AGE)]
DIFFERENCE_MAPPING = {
f"{INCIDENCE}_counts_{status}_vaccine{age}": f"{CUMULATIVE}_counts_{status}_vaccine{age}"
for status, age in product(STATUS, AGE)
}
SIGNALS = list(DIFFERENCE_MAPPING.keys()) + list(DIFFERENCE_MAPPING.values())


GEOS = [
"nation",
"state",
"hrr",
"hhs",
"msa"
]

SMOOTHERS = [
(Smoother("identity", impute_method=None), ""),
(Smoother("moving_average", window_length=7), "_7dav"),
]
136 changes: 136 additions & 0 deletions cdc_vaccines/delphi_cdc_vaccines/pull.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# -*- coding: utf-8 -*-
"""Functions for pulling data from the CDC data website for vaccines."""
import hashlib
from logging import Logger
from delphi_utils.geomap import GeoMapper
import numpy as np
import pandas as pd
from .constants import SIGNALS, DIFFERENCE_MAPPING



def pull_cdcvacc_data(base_url: str, logger: Logger) -> pd.DataFrame:
"""Pull the latest data from the CDC on vaccines and conform it into a dataset.

The output dataset has:
- Each row corresponds to (County, Date), denoted (FIPS, timestamp)
- Each row additionally has columns that correspond to the counts or
cumulative counts of vaccination status (fully vaccinated,
partially vaccinated) of various age groups (all, 12+, 18+, 65+)
from December 13th 2020 until the latest date

Note that the raw dataset gives the `cumulative` metrics, from which
we compute `counts` by taking first differences. Hence, `counts`
may be negative. This is wholly dependent on the quality of the raw
dataset.

We filter the data such that we only keep rows with valid FIPS, or "FIPS"
codes defined under the exceptions of the README. The current exceptions
include:
# - 0: statewise unallocated
Parameters
----------
base_url: str
Base URL for pulling the CDC Vaccination Data
logger: Logger
Returns
-------
pd.DataFrame
Dataframe as described above.
"""
# Columns to drop the the data frame.
drop_columns = [
"date",
"recip_state",
"series_complete_pop_pct",
"mmwr_week",
"recip_county",
"state_id"
]


# Read data
df = pd.read_csv(base_url)
logger.info("data retrieved from source",
num_rows=df.shape[0],
num_cols=df.shape[1],
min_date=min(df['Date']),
max_date=max(df['Date']),
checksum=hashlib.sha256(pd.util.hash_pandas_object(df).values).hexdigest())
df.columns = [i.lower() for i in df.columns]

df['recip_state'] = df['recip_state'].str.lower()
drop_columns.extend([x for x in df.columns if ("pct" in x) | ("svi" in x)])
drop_columns = list(set(drop_columns))
df = GeoMapper().add_geocode(df, "state_id", "state_code",
from_col="recip_state", new_col="state_id", dropna=False)
df['state_id'] = df['state_id'].fillna('0').astype(int)
# Change FIPS from 0 to XX000 for statewise unallocated cases/deaths
unassigned_index = (df["fips"] == "UNK")
df.loc[unassigned_index, "fips"] = df["state_id"].loc[unassigned_index].values * 1000

# Conform FIPS
df["fips"] = df["fips"].apply(lambda x: f"{int(x):05d}")
df["timestamp"] = pd.to_datetime(df["date"])
# Drop unnecessary columns (state is pre-encoded in fips)
try:
df.drop(drop_columns, axis=1, inplace=True)
except KeyError as e:
raise ValueError(
"Tried to drop non-existent columns. The dataset "
"schema may have changed. Please investigate and "
"amend drop_columns."
) from e
# timestamp: str -> datetime
df.columns = ["fips",
"cumulative_counts_tot_vaccine",
"cumulative_counts_tot_vaccine_12P",
"cumulative_counts_tot_vaccine_18P",
"cumulative_counts_tot_vaccine_65P",
"cumulative_counts_part_vaccine",
"cumulative_counts_part_vaccine_12P",
"cumulative_counts_part_vaccine_18P",
"cumulative_counts_part_vaccine_65P",
"timestamp"]
df_dummy = df.loc[(df["fips"]!='00000') & (df["timestamp"] == min(df["timestamp"]))].copy()
#handle fips 00000 separately
df_oth = df.loc[((df["fips"]=='00000') &
(df["timestamp"]==min(df[df['fips'] == '00000']['timestamp'])))].copy()
df_dummy = pd.concat([df_dummy, df_oth])
df_dummy.loc[:, "timestamp"] = df_dummy.loc[:, "timestamp"] - pd.Timedelta(days=1)
df_dummy.loc[:, ["cumulative_counts_tot_vaccine",
"cumulative_counts_tot_vaccine_12P",
"cumulative_counts_tot_vaccine_18P",
"cumulative_counts_tot_vaccine_65P",
"cumulative_counts_part_vaccine",
"cumulative_counts_part_vaccine_12P",
"cumulative_counts_part_vaccine_18P",
"cumulative_counts_part_vaccine_65P",
]] = 0

df =pd.concat([df_dummy, df])
# Obtain new_counts
df.sort_values(["fips", "timestamp"], inplace=True)
for to, from_d in DIFFERENCE_MAPPING.items():
df[to] = df[from_d].diff()

rem_list = [ x for x in list(df.columns) if x not in ['timestamp', 'fips'] ]
# Handle edge cases where we diffed across fips
mask = df["fips"] != df["fips"].shift(1)
df.loc[mask, rem_list] = np.nan
df.reset_index(inplace=True, drop=True)
Ananya-Joshi marked this conversation as resolved.
Show resolved Hide resolved
# Final sanity checks
unique_days = df["timestamp"].unique()
min_timestamp = min(unique_days)
max_timestamp = max(unique_days)
n_days = (max_timestamp - min_timestamp) / np.timedelta64(1, "D") + 1
if n_days != len(unique_days):
raise ValueError(
f"Not every day between {min_timestamp} and "
"{max_timestamp} is represented."
)
return df.loc[
df["timestamp"] >= min(df["timestamp"]),
# Reorder
["fips", "timestamp"] + SIGNALS,
].reset_index(drop=True)
Loading