-
Notifications
You must be signed in to change notification settings - Fork 67
Add hsa_nci geo resolution + Adjust csv regex patterns #1690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
melange396
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice job! Your approach is a great way of handling this!
There are a couple other things we will need to take care of:
- The following is essentially dead code now; we should remove it at the least, but potentially replace its functionality: (an easy way would be to add "
delphi-epidata/src/acquisition/covidcast/csv_importer.py
Lines 194 to 197 in ff823c6
if geo_type not in CsvImporter.GEOGRAPHIC_RESOLUTIONS: logger.warning(event='invalid geo_type', detail=geo_type, file=path) yield (path, None) continue (or geo_type)" to the log message where it will now error out). - The new geo_type and some basic validation code should also be added to the section at: ... According to https://seer.cancer.gov/seerstat/variables/countyattribs/hsa.html, valid codes should be 1-3 digit numbers, or the special code of "1022" (which NSSP doesnt appear to use, FWIW).
delphi-epidata/src/acquisition/covidcast/csv_importer.py
Lines 306 to 346 in ff823c6
if geo_type in ('hrr', 'msa', 'dma', 'hhs'): # these particular ids are prone to be written as ints -- and floats try: geo_id = str(CsvImporter.floaty_int(geo_id)) except ValueError: # expected a number, but got a string return (None, 'geo_id') # sanity check geo_id with respect to geo_type if geo_type == 'county': if len(geo_id) != 5 or not '01000' <= geo_id <= '80000': return (None, 'geo_id') elif geo_type == 'hrr': if not 1 <= int(geo_id) <= 500: return (None, 'geo_id') elif geo_type == 'msa': if len(geo_id) != 5 or not '10000' <= geo_id <= '99999': return (None, 'geo_id') elif geo_type == 'dma': if not 450 <= int(geo_id) <= 950: return (None, 'geo_id') elif geo_type == 'state': # note that geo_id is lowercase if len(geo_id) != 2 or not 'aa' <= geo_id <= 'zz': return (None, 'geo_id') elif geo_type == 'hhs': if not 1 <= int(geo_id) <= 10: return (None, 'geo_id') elif geo_type == 'nation': # geo_id is lowercase if len(geo_id) != 2 or not 'aa' <= geo_id <= 'zz': return (None, 'geo_id') else: return (None, 'geo_type') - I DO NOT think we should add the new geo_type to the part of the server code that does input validation, as its done with the older geomapper code which does not support this definition of HSA (it uses the incompatible Dartmouth version, see doc/urlref/crosswalk). However, we should mention in a comment there that we are intentionally sidestepping it for that reason:
delphi-epidata/src/server/_params.py
Lines 61 to 69 in fc450d5
# TODO: keep this translator in sync with CsvImporter.GEOGRAPHIC_RESOLUTIONS in acquisition/covidcast/ and with GeoMapper geo_type_translator = { "county": "fips", "state": "state_id", "zip": "zip", "hrr": "hrr", "hhs": "hhs", "msa": "msa", "nation": "nation"
melange396
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two comment clarifications, and lets add an integration test to bring the whole thing together... Then i think this is done!
for https://github.com/cmu-delphi/delphi-epidata/blob/dev/integrations/server/test_covidcast.py :
def test_hsa_nci(self):
row = CovidcastTestRow.make_default_row(geo_type='hsa_nci', geo_value='99')
self._insert_rows(rows)
response = self.request_based_on_row(row)
expected = [row.as_api_row_dict()]
self.assertEqual(response, {
'result': 1,
'epidata': expected,
'message': 'success',
})Co-authored-by: george <george.haff@gmail.com>
Co-authored-by: george <george.haff@gmail.com>
|
melange396
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome!



Addresses issue(s) https://github.com/cmu-delphi/epidata-etl/issues/394
Summary:
hsa_nciandhsacases to unit test to make sure the new regex is working as intended.Prerequisites:
devbranchdev