-
Notifications
You must be signed in to change notification settings - Fork 16
Add hsanci to nssp #2162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hsanci to nssp #2162
Conversation
dshemetov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
nssp/delphi_nssp/run.py
Outdated
| elif geo == "hsanci": | ||
| df = df[["hsa_nci_id", "val", "timestamp"]] | ||
| df = df[df["hsa_nci_id"] != "All"] | ||
| df = df.groupby(["hsa_nci_id", "timestamp"])['val'].min().reset_index() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data source reports at the HSA-NCI level and duplicates the same value across the constituent counties. This picks out just one of those per key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok, for deduplicating. in that case, the .min() is misleading/confusing and could use a small explanatory comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. Wrote a similar comment on the epidata-etl side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an alternative couldve been:
df = df.drop_duplicates()
Description
Add
hsa-ncigeo level to nsspRan through extensive existing unit tests for nssp, ran the indicators and looked at the csv files. Result looks good.