# Example of the OMOP CDM Data

## Small Background

> Disclaimer: Section co-written with ChatGPT

The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is a database schema and standard designed to facilitate the processing of healthcare data from electronic health records, health information exchanges, and electronic medical records.
Information about the schema is here: https://ohdsi.github.io/CommonDataModel/cdm54.html but the Entity-Relationship Diagram looks like this:

![](./assets/cdm54.png)

Some of the most important tables in the OMOP CDM are as follows:

1. `person`: Contains unique identifiers for individuals, including demographics 
2. `condition_occurrence`: Records diagnoses or health conditions, along with their start and end dates.
3. `procedure_occurrence`: Documents procedures performed on patients, including surgical and diagnostic interventions.
4. `drug_exposure`: Contains information on drug prescriptions, including drug names, dosages, and start/end dates.

Each table is linked through common identifiers (such as `person_id` and various `_occurrence_id` variables), enabling comprehensive analysis. 

## Dependency Set-Up

In [1]:
using DrWatson
@quickactivate "CompositionalMLStudy"

using DataFrames

import DBInterface:
    execute

import DrWatson:
  datadir

import SQLite:
    DB

## Setting Constants

In [2]:
# OMOP CDM Data Directory
OMOPCDM_DIR = datadir("exp_raw", "OMOPCDM")

# OMOP CDM Example Data 
DATABASE_FILE = "eunomia.sqlite"

"eunomia.sqlite"

## Basic Exploration of IPUMS Data

### Creating Connection to SQLite Database

In [3]:
conn = DB(joinpath(OMOPCDM_DIR, DATABASE_FILE))

SQLite.DB("/home/thecedarprince/Projects/CompositionalMLStudy/data/exp_raw/OMOPCDM/eunomia.sqlite")

### Examining Data

List out all tables from the OMOP CDM sample database:

In [4]:
sql =
    """
    SELECT name AS TABLE_NAME 
    FROM sqlite_master 
    WHERE type = 'table' 
    ORDER BY name;
    """

execute(conn, sql) |> DataFrame

Row,TABLE_NAME
Unnamed: 0_level_1,String
1,CARE_SITE
2,CDM_SOURCE
3,COHORT
4,COHORT_ATTRIBUTE
5,CONCEPT
6,CONCEPT_ANCESTOR
7,CONCEPT_CLASS
8,CONCEPT_RELATIONSHIP
9,CONCEPT_SYNONYM
10,CONDITION_ERA


Get unique patient IDs from OMOP CDM sample database:

In [5]:
sql =
    """
    SELECT person_id 
    FROM person;
    """

execute(conn, sql) |> DataFrame

Row,person_id
Unnamed: 0_level_1,Float64
1,6.0
2,123.0
3,129.0
4,16.0
5,65.0
6,74.0
7,42.0
8,187.0
9,18.0
10,111.0
