An Extract Transform Load (ETL) processes to write gene mutation data and phenotype data to OMOP CDM.
The processes are initiated via the MAINJob.ktj which calls both MAINJob_DD.kjb and FILL-SOURCE_TO_CONCEPT_MAP.kjb.
Within the Genotpye directory the file "20220504_Mappings.xlsx" contains all the semantic mappings of the genotype elements to OMOP Concept_ids and relevant tables. Additionally, the OMOP_tables_used file shows which OMOP tables were utilized to write the data.
The MAINJob_DD.kjb is loading the PERSON.ktr first and then runs the CONDITION_OCCURENCE, MEASUREMENT, and OBSERVATION. It additionally, fills the SOURCE_TO_CONCEPT_MAP table with necessary concepts enlisted in the SOURCE_TO_CONCEPT_MAP.csv file and truncates the rest of the tables if necessary.
For the Person table in OMOP Year_of_birth is calculated via substrating the given age from the year 2023. Patients with an Age = NA will be filtered out as OMOP does not accept it.
type_concept_id is mandatory in Observation and Measurement and type_concept_id = 32810 is used as constant.
This Transformation processes fill the SOURCE_TO_CONCEPT_MAP table with a set of concepts_ids from Human Phenotype Ontology (HPO). These concepts can be used to transform phenotypic data elements to OMOP. We did not have data elements to test this transformation in this study.
Here, we outline how to adapt the RD-CDM pipeline for use across various data focus areas for rare diseases.
Specify the modules required to explore the research hypothesis. Identify key elements within these modules that address the research question.
Communicate the necessary modules and elements with medical experts to assess their availability at each study site. Engage with data providers to ensure data can be automatically retrieved, possibly in standard formats like FHIR, and using standard terminologies such as SNOMED. Consult legal authorities to address the sensitive nature of medical data, ensuring adherence to ethical standards, data security, and privacy protection.
Collaborate with stakeholders to finalize a comprehensive or selective list of data elements and reach to consensus on the included diagnostic elements.
Align individual data items with the RD-CDM's modules to standardize data across all study sites.
Implement ETL processes, for instance, converting FHIR to OMOP format, to standardize data in modules like "Person", "Diagnosis", "Laboratory findings", "Procedure", and "Medications".
Use direct ETL processes from CSV to OMOP CDM for the "Genotype" and "Phenotype" modules.