Skip to content

Data Quality Methods and Tools to Support CTSA Hub Data Sharing

Notifications You must be signed in to change notification settings

data2health/data-quality

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Quality Methods and Tools to Support CTSA Hub Data Sharing

Electronic Health Record (EHR) data must be tested for data quality when being shared for research. Data quality is typically measured in three categories: Conformance, Completeness, and Plausibility (Kahn et al., 2016 eGEMS). Many CTSA institutions have harmonized their EHR data to the Observational Medical Outcomes Partnership (OMOP) data model, yet no publicly available tool with a standard operating procedure (SOP) exists to easily assess and visualize data quality tests, particularly across institutions. This project will launch a publically available data quality testing tool and SOP, configurable to any database environment for N OMOP datasets.

Project description

EHR data must be tested for data quality when being shared for research. Data quality is typically measured in three categories: Conformance, Completeness, and Plausibility (Kahn et al., 2016 eGEMS). Harmonized datasets need to conform to an established standard format and vocabulary before any analysis can be done. They need to have a bare minimum threshold of completeness (i.e., what percentage of values are null or empty). They also need to prove a certain level of plausibility (i.e., do the data make sense for what is expected, are they believable and credible). To date, most data sharing networks have developed internal protocols and tools to manage data harmonization, but no publicly available tool with a standard operating procedure exists to easily assess and visualize data quality tests across institutions. Therefore, data quality remains a problem that is inconsistently tackled and only by high level analytic teams if available.

Alignment to program objectives

TODO see here

Contact person

Point person (github handle) Site Program Director
Kari Stephens (@kstephen0909) UW Sean Mooney (@sdmooney)

Leads

Lead(s) (github handle) Site
Kari Stephens (@kstephen0909) UW
Adam Wilcox (@abwilcox) UW

Team members

Team members can be found here

Repositories

Originally Develop DQe-c Tool
https://github.com/data2health/DQe-c

Ongoing Re-Engineering of DQe-c Tool
https://github.com/data2health/DQe-c-v2

Deliverables

  • Data quality testing tool (DQe-c) available to CTSA hubs and affiliates
  • Data quality testing tool standard operating procedures and documentation supporting local configuration
  • List of recommended minimum level data quality tests to help with data sharing assurance

Milestones

View the project milestones here

Evaluation

View the Evaluation component here

Education

View the education component here.

Get involved

View the engagment component here

Working documents

Team collaborative working folder can be found here

Slack channel

#data-quality is accessible to participants that have been onboarded

About

Data Quality Methods and Tools to Support CTSA Hub Data Sharing

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published