Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
022c1e0
first implimentation
aysim319 Nov 14, 2024
541b0aa
figuring out metric/sensor to use
aysim319 Nov 15, 2024
1c79068
first take
aysim319 Nov 18, 2024
9a9780b
working on test
aysim319 Nov 19, 2024
24a17cd
added preliminary data source
aysim319 Nov 20, 2024
dde6685
adding indicator for gitaction
aysim319 Nov 20, 2024
f5ad4c9
lint
aysim319 Nov 20, 2024
aedf0e9
replace with setup.py
aysim319 Nov 20, 2024
25c24f1
more lint
aysim319 Nov 20, 2024
4a4e169
fixed date range for test
aysim319 Nov 20, 2024
3cfcdf6
lint
aysim319 Nov 20, 2024
b818ca8
Update DETAILS.md
aysim319 Nov 20, 2024
eca8fe0
fix output data
aysim319 Nov 20, 2024
711ace3
analysis in progress
aysim319 Nov 21, 2024
48a8a76
lint and suggestions
aysim319 Nov 21, 2024
8117096
more analysis
aysim319 Nov 22, 2024
411694c
add hhs geo aggregate
aysim319 Nov 25, 2024
f42c2e9
more analysis
aysim319 Nov 25, 2024
7da9e5a
update DETAILS.md
aysim319 Nov 25, 2024
1b4c277
Update nhsn/params.json.template
aysim319 Nov 25, 2024
84f34fa
Update nhsn/params.json.template
aysim319 Nov 25, 2024
1dc6343
Merge remote-tracking branch 'origin/nhsn_indicator' into nhsn_indicator
aysim319 Nov 25, 2024
3b5fbee
cleaning up anaylsis
aysim319 Nov 25, 2024
e97c3cc
rename geo_id column name
aysim319 Nov 26, 2024
26d69f3
suggested / needed to deploy
aysim319 Nov 27, 2024
678822a
adding default locations for deployment
aysim319 Nov 27, 2024
840ebb7
fix geo aggregation for hhs
aysim319 Dec 2, 2024
de601d6
Update nhsn/params.json.template
aysim319 Dec 2, 2024
928a5c7
lint
aysim319 Dec 3, 2024
3ba877e
needed to add hhs in to geo for tests
aysim319 Dec 5, 2024
162152e
fixed and added more plots
aysim319 Dec 6, 2024
5f26ff1
cleaning up notebook and adding details
aysim319 Dec 6, 2024
0000c73
new signal name
aysim319 Dec 10, 2024
de9dc95
needed to update the type dict also
aysim319 Dec 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/python-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ jobs:
dir: "delphi_quidel_covidtest"
- package: "sir_complainsalot"
dir: "delphi_sir_complainsalot"
- package: "nhsn"
dir: "delphi_nhsn"
defaults:
run:
working-directory: ${{ matrix.package }}
Expand Down
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
- TODO: #527 Get this list automatically from python-ci.yml at runtime.
*/

def indicator_list = ['backfill_corrections', 'changehc', 'claims_hosp', 'google_symptoms', 'hhs_hosp', 'nchs_mortality', 'quidel_covidtest', 'sir_complainsalot', 'doctor_visits', 'nwss_wastewater', 'nssp']
def indicator_list = ['backfill_corrections', 'changehc', 'claims_hosp', 'google_symptoms', 'hhs_hosp', 'nchs_mortality', 'quidel_covidtest', 'sir_complainsalot', 'doctor_visits', 'nwss_wastewater', 'nssp', 'nhsn']
def build_package_main = [:]
def build_package_prod = [:]
def deploy_staging = [:]
Expand Down
30 changes: 30 additions & 0 deletions ansible/templates/nhsn-params-prod.json.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{
"common": {
"export_dir": "/common/covidcast/receiving/nhsn",
"backup_dir": "./raw_data_backups",
"log_filename": "/var/log/indicators/nhsn.log",
"log_exceptions": false
},
"indicator": {
"wip_signal": true,
"static_file_dir": "./static",
"socrata_token": "{{ nhsn_token }}"
},
"validation": {
"common": {
"data_source": "nhsn",
"api_credentials": "{{ validation_api_key }}",
"span_length": 15,
"min_expected_lag": {"all": "7"},
"max_expected_lag": {"all": "13"},
"dry_run": true,
"suppressed_errors": []
},
"static": {
"minimum_sample_size": 0,
"missing_se_allowed": true,
"missing_sample_size_allowed": true
},
"dynamic": {}
}
}
4 changes: 4 additions & 0 deletions ansible/templates/sir_complainsalot-params-prod.json.j2
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,10 @@
"nssp": {
"max_age":19,
"maintainers": []
},
"nhsn": {
"max_age":19,
"maintainers": []
}
}
}
3 changes: 3 additions & 0 deletions ansible/vars.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,9 @@ nwss_wastewater_token: "{{ vault_cdc_socrata_token }}"
# nssp
nssp_token: "{{ vault_cdc_socrata_token }}"

# nhsn
nhsn_token: "{{ vault_cdc_socrata_token }}"

# SirCAL
sir_complainsalot_api_key: "{{ vault_sir_complainsalot_api_key }}"
sir_complainsalot_slack_token: "{{ vault_sir_complainsalot_slack_token }}"
Expand Down
30 changes: 30 additions & 0 deletions nhsn/DETAILS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# NHSN data

We import the NHSN Weekly Hospital Respiratory Data

There are 2 sources we grab data from for nhsn:
Note that they are from the same source, but with different cadence and one reporting preliminary data for the previous reporting week

Primary source: https://data.cdc.gov/Public-Health-Surveillance/Weekly-Hospital-Respiratory-Data-HRD-Metrics-by-Ju/ua7e-t2fy/about_data
Secondary (preliminary source): https://data.cdc.gov/Public-Health-Surveillance/Weekly-Hospital-Respiratory-Data-HRD-Metrics-by-Ju/mpgq-jmmr/about_data

## Geographical Levels
* `state`: reported using two-letter postal code
* `national`: just `us` for now
* `hhs`: reporting using Geomapper with state level

## Metrics
* `confirmed_admissions_covid`: total number of confirmed admission for covid
* `confirmed_admissions_flu`: total number of confirmed admission for flu
* `prelim_confirmed_admissions_covid`: total number of confirmed admission for covid from preliminary source
* `prelim_confirmed_admissions_flu`: total number of confirmed admission for flu from preliminary source

## Additional Notes
HHS dataset and NHSN dataset covers the equivalent data of hospital admission for covid and flu.
As a general trend, HHS and NHSN data matches pretty well.
However, there are differences between some of the states, notably for GA (untill 2023), LA, NV, PR (late 2020-early 2021), TN all have HHS substantially lower, HHS is substantially lower than NHSN.

Some states have this spike in NHSN or hhs where the other source doesn't have a spike and spikes don't happen at the same time_values across states

More details regarding the analysis is available in the [analysis.ipynb](notebook%2Fanalysis.ipynb)
(may require installing additional packages to work)
32 changes: 32 additions & 0 deletions nhsn/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
.PHONY = venv, lint, test, clean

dir = $(shell find ./delphi_* -name __init__.py | grep -o 'delphi_[_[:alnum:]]*' | head -1)
venv:
python3.8 -m venv env

install: venv
. env/bin/activate; \
pip install wheel ; \
pip install -e ../_delphi_utils_python ;\
pip install -e .

install-ci: venv
. env/bin/activate; \
pip install wheel ; \
pip install ../_delphi_utils_python ;\
pip install .

lint:
. env/bin/activate; pylint $(dir) --rcfile=../pyproject.toml
. env/bin/activate; pydocstyle $(dir)

format:
. env/bin/activate; darker $(dir)

test:
. env/bin/activate ;\
(cd tests && ../env/bin/pytest --cov=$(dir) --cov-report=term-missing)

clean:
rm -rf env
rm -f params.json
13 changes: 13 additions & 0 deletions nhsn/delphi_nhsn/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# -*- coding: utf-8 -*-
"""Module to pull and clean indicators from the XXXXX source.

This file defines the functions that are made public by the module. As the
module is intended to be executed though the main method, these are primarily
for testing.
"""

from __future__ import absolute_import

from . import run

__version__ = "0.1.0"
13 changes: 13 additions & 0 deletions nhsn/delphi_nhsn/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# -*- coding: utf-8 -*-
"""Call the function run_module when executed.

This file indicates that calling the module (`python -m MODULE_NAME`) will
call the function `run_module` found within the run.py file. There should be
no need to change this template.
"""

from delphi_utils import read_params

from .run import run_module # pragma: no cover

run_module(read_params()) # pragma: no cover
31 changes: 31 additions & 0 deletions nhsn/delphi_nhsn/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
"""Registry for signal names."""

GEOS = ["state", "nation", "hhs"]

# column name from socrata
TOTAL_ADMISSION_COVID_API = "totalconfc19newadm"
TOTAL_ADMISSION_FLU_API = "totalconfflunewadm"

SIGNALS_MAP = {
"confirmed_admissions_covid_ew": TOTAL_ADMISSION_COVID_API,
"confirmed_admissions_flu_ew": TOTAL_ADMISSION_FLU_API,
}

TYPE_DICT = {
"timestamp": "datetime64[ns]",
"geo_id": str,
"confirmed_admissions_covid_ew": float,
"confirmed_admissions_flu_ew": float,
}

# signal mapping for secondary, preliminary source
PRELIM_SIGNALS_MAP = {
"confirmed_admissions_covid_ew_prelim": TOTAL_ADMISSION_COVID_API,
"confirmed_admissions_flu_ew_prelim": TOTAL_ADMISSION_FLU_API,
}
PRELIM_TYPE_DICT = {
"timestamp": "datetime64[ns]",
"geo_id": str,
"confirmed_admissions_covid_ew_prelim": float,
"confirmed_admissions_flu_ew_prelim": float,
}
123 changes: 123 additions & 0 deletions nhsn/delphi_nhsn/pull.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# -*- coding: utf-8 -*-
"""Functions for pulling NSSP ER data."""
import logging
from typing import Optional

import pandas as pd
from delphi_utils import create_backup_csv
from sodapy import Socrata

from .constants import PRELIM_SIGNALS_MAP, PRELIM_TYPE_DICT, SIGNALS_MAP, TYPE_DICT


def pull_data(socrata_token: str, dataset_id: str):
"""Pull data from Socrata API."""
client = Socrata("data.cdc.gov", socrata_token)
results = []
offset = 0
limit = 50000 # maximum limit allowed by SODA 2.0
while True:
page = client.get(dataset_id, limit=limit, offset=offset)
if not page:
break # exit the loop if no more results
results.extend(page)
offset += limit

df = pd.DataFrame.from_records(results)
return df


def pull_nhsn_data(socrata_token: str, backup_dir: str, custom_run: bool, logger: Optional[logging.Logger] = None):
"""Pull the latest NSSP ER visits data, and conforms it into a dataset.

The output dataset has:

- Each row corresponds to a single observation
- Each row additionally has columns for the signals in SIGNALS

Parameters
----------
socrata_token: str
My App Token for pulling the NHSN data
backup_dir: str
Directory to which to save raw backup data
custom_run: bool
Flag indicating if the current run is a patch. If so, don't save any data to disk
logger: Optional[logging.Logger]
logger object

Returns
-------
pd.DataFrame
Dataframe as described above.
"""
# Pull data from Socrata API
df = pull_data(socrata_token, dataset_id="ua7e-t2fy")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easier to test transform logic if you separate the Extract and Transform steps. Probably fine in this case though, since we're not really transforming it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is holdover from my nssp and nchs dirty code practice. If we aren't doing a full rewrite soon, I would refactor them at some point. For now, they all work (ugly).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was bouncing back and forth, part of it being what if the other data source has different transformations down the line? maybe better to have a method defined? something along those lines. I did refactor the pull_data, since that doesn't seem like it would change


keep_columns = list(TYPE_DICT.keys())

if not df.empty:
create_backup_csv(df, backup_dir, custom_run, logger=logger)

df = df.rename(columns={"weekendingdate": "timestamp", "jurisdiction": "geo_id"})

for signal, col_name in SIGNALS_MAP.items():
df[signal] = df[col_name]

df = df[keep_columns]
df["geo_id"] = df["geo_id"].str.lower()
df.loc[df["geo_id"] == "usa", "geo_id"] = "us"
df = df.astype(TYPE_DICT)
else:
df = pd.DataFrame(columns=keep_columns)

return df


def pull_preliminary_nhsn_data(
socrata_token: str, backup_dir: str, custom_run: bool, logger: Optional[logging.Logger] = None
):
"""Pull the latest NSSP ER visits data, and conforms it into a dataset.

The output dataset has:

- Each row corresponds to a single observation
- Each row additionally has columns for the signals in SIGNALS

Parameters
----------
socrata_token: str
My App Token for pulling the NHSN data
backup_dir: str
Directory to which to save raw backup data
custom_run: bool
Flag indicating if the current run is a patch. If so, don't save any data to disk
logger: Optional[logging.Logger]
logger object

Returns
-------
pd.DataFrame
Dataframe as described above.
"""
# Pull data from Socrata API
df = pull_data(socrata_token, dataset_id="mpgq-jmmr")

keep_columns = list(PRELIM_TYPE_DICT.keys())

if not df.empty:
create_backup_csv(df, backup_dir, custom_run, sensor="prelim", logger=logger)

df = df.rename(columns={"weekendingdate": "timestamp", "jurisdiction": "geo_id"})

for signal, col_name in PRELIM_SIGNALS_MAP.items():
df[signal] = df[col_name]

df = df[keep_columns]
df = df.astype(PRELIM_TYPE_DICT)
df["geo_id"] = df["geo_id"].str.lower()
df.loc[df["geo_id"] == "usa", "geo_id"] = "us"
else:
df = pd.DataFrame(columns=keep_columns)

return df
Loading
Loading