Skip to content

Extract Transform Load (ETL) processes to write gene mutation data and phenotype data to OMOP CDM

Notifications You must be signed in to change notification settings

NajiaAhmadi/ETL-Genotype-Phenotype-to-OMOP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ETL-Genotype-Phenotype-to-OMOP

An Extract Transform Load (ETL) processes to write gene mutation data and phenotype data to OMOP CDM.

The processes are initiated via the MAINJob.ktj which calls both MAINJob_DD.kjb and FILL-SOURCE_TO_CONCEPT_MAP.kjb.

Genotype

Mapping Table

Within the Genotpye directory the file "20220504_Mappings.xlsx" contains all the semantic mappings of the genotype elements to OMOP Concept_ids and relevant tables. Additionally, the OMOP_tables_used file shows which OMOP tables were utilized to write the data.

Dataset-A-Transformations

The MAINJob_DD.kjb is loading the PERSON.ktr first and then runs the CONDITION_OCCURENCE, MEASUREMENT, and OBSERVATION. It additionally, fills the SOURCE_TO_CONCEPT_MAP table with necessary concepts enlisted in the SOURCE_TO_CONCEPT_MAP.csv file and truncates the rest of the tables if necessary.

For the Person table in OMOP Year_of_birth is calculated via substrating the given age from the year 2023. Patients with an Age = NA will be filtered out as OMOP does not accept it.

type_concept_id is mandatory in Observation and Measurement and type_concept_id = 32810 is used as constant.

Phenotype

This Transformation processes fill the SOURCE_TO_CONCEPT_MAP table with a set of concepts_ids from Human Phenotype Ontology (HPO). These concepts can be used to transform phenotypic data elements to OMOP. We did not have data elements to test this transformation in this study.

Customize your Rare Diseases Common Data Model

Here, we outline how to adapt the RD-CDM pipeline for use across various data focus areas for rare diseases.

1. Use Case Definition

Specify the modules required to explore the research hypothesis. Identify key elements within these modules that address the research question.

2. Stakeholder Engagement

Communicate the necessary modules and elements with medical experts to assess their availability at each study site. Engage with data providers to ensure data can be automatically retrieved, possibly in standard formats like FHIR, and using standard terminologies such as SNOMED. Consult legal authorities to address the sensitive nature of medical data, ensuring adherence to ethical standards, data security, and privacy protection.

3. Diagnostic Entity Compilation

Collaborate with stakeholders to finalize a comprehensive or selective list of data elements and reach to consensus on the included diagnostic elements.

4. Mapping Use Case-Specific Entities

Align individual data items with the RD-CDM's modules to standardize data across all study sites.

5. Data Transformation

Implement ETL processes, for instance, converting FHIR to OMOP format, to standardize data in modules like "Person", "Diagnosis", "Laboratory findings", "Procedure", and "Medications".

6. Handling Genotypic and Phenotypic Data

Use direct ETL processes from CSV to OMOP CDM for the "Genotype" and "Phenotype" modules.

About

Extract Transform Load (ETL) processes to write gene mutation data and phenotype data to OMOP CDM

Resources

Stars

Watchers

Forks

Packages

No packages published