Fecal_Coliform/E_coli replace '#' outside of replace_unit_by_dict()

replace_unit_by_dict() is based on the entire string, since '#' is hard to deal with it's replaced the same way regardless of where in the unit string it is. There is a current TODO: to determine why doing string replacements before dict replacement (generally preferable as it standardizes units first) were problematic.
USEPA · Aug 1, 2023 · a18a70d · a18a70d
1 parent c5e5f43
commit a18a70d
Showing 1 changed file with 3 additions and 1 deletion.
diff --git a/harmonize_wq/harmonize.py b/harmonize_wq/harmonize.py
@@ -1040,7 +1040,9 @@ def harmonize_generic(df_in, char_val, units_out=None, errors='raise',
     elif out_col in ['Fecal_Coliform', 'E_coli']:
         # NOTE: Ecoli ['cfu/100ml', 'MPN/100ml', '#/100ml']
         # NOTE: feca ['CFU', 'MPN/100ml', 'cfu/100ml', 'MPN/100 ml', '#/100ml']
-        # Replace known unit problems ('#' count; assume CFU/MPN is /100ml)
+        # Replace known special character in unit ('#' count assumed as CFU)
+        wqp.replace_unit_by_str('#', 'CFU')
+        # Replace known unit problems (e.g., assume CFU/MPN is /100ml)
         wqp.replace_unit_by_dict(domains.UNITS_REPLACE[out_col])
         #TODO: figure out why the above must be done before replace_unit_by_str
         # Replace all instances in results column