Relationship Inference From The EHR (RIFTEHR) is an automated algorithm for identifying relatedness between patients in an institution's Electronic Health Records.
Original Code Authors: Fernanda Polubriaginof and Nicholas Tatonetti
https://github.com/tatonetti-lab/riftehr
2nd Version: Farhad Ghamsari
https://github.com/FarhadGSRX/Pythonic_RIFTEHR
This Version: Colby Witherup Wood (https://github.com/aGitHasNoName) and Dan Turner (https://github.com/turnerdan)
Publication info coming soon.
Remember to always respect patient privacy.
2nd ver.: Fully Python, no dependencies on SQL or Julia
2nd ver.: Much, much faster, thanks to vectorization of functions
Changes in the current version:
Significant changes are in Step 2. New Step 2 replaces old Steps 2, 3, and 4.
- Conflict checks (spouse age, generation age, flipped relationship, mismatched provided relationships) are now fully integrated into the inference step at every stage to prevent inferred matches based on conflicts
- Families are now defined by networkx before inferences are made and the inference step is applied within families instead of across the entire dataset
- Two-person families now skip the inference step
- Lookup dictionaries are used in place of if/else statements whenever possible
- List/dictionary comprehensions are used in place of for loops whenever possible
- The inference step is now about 60 times faster than the 2nd version due to all changes listed above
Patient Demographics Table is a comma delimited file with the following headers. Each of these values corresponds to the patient:
- MRN, FirstName, LastName, PhoneNumber, Zipcode
Emergency Contacts Table is a comma delimited file with the following headers. MRN_1 corresponds to the MRN of the patient. (It is the link to the Patient Demographics Table.) The rest of the values correspond to the Patient's Emergency Contact. EC_Relationship refers to the relationship between Patient and EC. (If EC_Relationship is Parent, then the EC is the Patient's Parent.)
- MRN_1, EC_LastName, EC_FirstName, EC_PhoneNumber, EC_Zipcode, EC_Relationship
- Go to Step 0 >
preprocess.py
>process_phones()
:- Remove any additional phone numbers that are recurrent in your data set. For example, our team had to remove the Northwestern University main line as it was a common placeholder for emergency contact's phone numbers.
- See
relation_map.csv
. The input_relation column contains emergency contact relationships as they appear in your dataset, and the output_relations column is what they should map to, as required by the RIFTEHR program. - In Step 1 >
match_in_batches.py
>find_matches()
This version searches for matches based on (First, Last, Zip, Phone), (First, Last, Phone), (First, Phone, Zip), (Last, Phone, Zip), (First, Phone), (Last, Phone), (Phone, Zip), (Phone), and (First, Last, Zip) in that order. Code is included for other, less specific matches, but commented out. See paper for more explanation.
Should you have any questions, comments, suggestions, please don't hesitate to reach out:
Ver. 2: fghamsari@tulane.edu Current Ver.: colby.witherup@northwestern.edu