Skip to content

NCI software to generate residential histories from address data

License

Notifications You must be signed in to change notification settings

NCIResHist/ResHistGen

Repository files navigation

ResHistGen

NCI software to generate residential histories from address data

As part of the National Cancer Institute’s residential history pilot project, Westat created "ResHistGen", a set of open-source SAS programs that will help researchers and others reconcile data from commercial vendors and generate residential histories of study participants.

The ResHistGen package includes the following contents:

Documentation

  • “ResHist Generation Process Notes.pdf” – the main written documentation (a good place to start)
  • “MatchPro Address Match Process.pdf” – how to run Match*Pro for the address matching step
  • “ResHist Output_File Data_Dictionary.xlsx” or “ResHist Output_File Data_Dictionary.pdf”– a data dictionary for the final ResHist output file
  • “V3.0 Change Summary.pdf” – a summary of changes since the last release on GitHub
  • “Manual Address Comparison Guidelines.pdf” – resource when conducting manual address comparison to determine if two addresses are the same
  • “README.md” – a basic introduction for GitHub

Template and demonstration files

  • “~FolderTemplate” Folder – a set of starter files and folders including the SAS programs and Match*Pro files
  • “DemoCanReg” Folder – a worked example based on sample files from a cancer registry
  • “DemoCohort” Folder – a worked example based on sample files from a cohort study

The steps to use the ResHistGen programs for the creation of residential histories of research subjects can be performed by staff at the cancer registry, members of the research team, or staff at a third-party contractor. Individual patient identifiers are needed for this process. It is essential that the researcher follow established procedures to protect the privacy of human subjects.

  1. Submit subject names and identifiers for relevant cases or study participants to the vendor (LexisNexis).
  2. Geocode the addresses received from the vendor. All U.S. cancer registries have access to the North American Association of Central Cancer Registries (NAACCR) geocoder, but any batch geocoder can be used.
  3. Collect and geocode additional addresses for each patient/participant from the cancer registry (address at diagnosis, current address), or cohort study (address at baseline, follow-up address(es), current address). These addresses are erferred to as "event" addresses.
  4. Collect additional patient/participant data for each subject. See "ResHist Generation Process Notes.pdf" for data items and file format.
  5. Make a copy of the RegistryTemplate folder and name it appropriately for this registry or study.
  6. Put copies of the vendor address file, the event address file (cancer registry or cohort study addresses), and the patient/participant file in the "Input_Files" subfolder.
  7. Run through the process to generate residential histories as described in "ResHist Generation Process Notes.pdf".

The current release of these programs is Version 3.0. For a summary of changes since the previous release, see "V3.0 Change Summary.pdf".

In the ResHistGen programs, local file locations are specified in the first few lines of each program to facilitate portability. The programs have been written to avoid any data conversion or divide-by-zero warning messages; if these occur, there is probably an unexpected problem somewhere. Also, the code includes tests for other unexpected conditions and messages with three pound signs ("###") are written to the log if any unexpected conditions are encountered. Any such messages should be investigated and resolved.

The ResHistGen programs are released under the GNU General Public License. For questions, limited support is available by email at NCI.ResidentialHistory@westat.com. Any enhancements that you make may also be shared via this email address. If found to be beneficial, they will be included in a future release. By the terms of the license, you may distribute your changes on your own provided you include a prominent notice that you have modified the original.

If you publish results based on these programs, please include the following citation: ResHistGen Residential History Generation Programs, Version 3.0 - June 2024; Surveillance Research Program, National Cancer Institute.

About

NCI software to generate residential histories from address data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages