# Introduction

![logo_OHDSI](img/logo_OHDSI.png)

## Definition OMOP - CDM

OMOP = Observational Medical Outcomes Partnership  
CDM = Common Data Model

Open Source Community for "Real World Data" Analysis

Deux niveaux de standardisation :
- Modele relationnel : SQL
- Mapping concepts locaux => concepts standard (https://github.com/MIT-LCP/mimic-omop/tree/master/extras/concept)

## Site internet

- https://github.com/OHDSI/CommonDataModel/wiki
- https://github.com/MIT-LCP/mimic-omop

## Stucture globale

<img src="img/structure_OMOP_1.png" alt="structure_OMOP" width="700"/>

### Table Concept

<img src="img/concept_table.png" alt="concept_table" width="600"/>

## Outils

https://www.ohdsi.org/analytic-tools/
- Achilles : outils de visualisation
- Atlas : vocabulary search, cohort definition

# Code

In [0]:
import psycopg2
import sys
import pprint
import pandas as pd
import random

In [0]:
# creation DB
user = 'apa'
dbname = 'mimic'
schema_name = 'omop'
host = 'localhost'

# Connect to the database
con = psycopg2.connect(dbname=dbname, user=user, host=host,)

query_schema = 'set search_path to ' + schema_name + ';'

## Nb patients = PERSON

In [0]:
query = query_schema + """
SELECT count(*) 
FROM person
"""
nb_patients = pd.read_sql_query(query, con)
nb_patients

Unnamed: 0,count
0,46520


## Nb admissions = VISIT_OCCURRENCE

In [0]:
query = query_schema + """
SELECT count(*) 
FROM visit_occurrence
"""
nb_adm = pd.read_sql_query(query, con)
nb_adm

Unnamed: 0,count
0,58976


## Nb de patients en ICU = VISIT_DETAIL
- rmq : intéret de la colonne *_type_concept_id, chaque table la possède. Permet de filtrer

In [0]:
query = query_schema + """
SELECT COUNT(distinct visit_detail_id) AS num_totalstays_count
FROM visit_detail
WHERE TRUE
AND visit_detail_concept_id = 581382                -- concept.concept_name = 'Inpatient Intensive Care Facility'
AND visit_detail_type_concept_id = 2000000006       -- concept.concept_name = 'Ward and physical location'
"""
nb_icu = pd.read_sql_query(query, con)
nb_icu

Unnamed: 0,num_totalstays_count
0,71575


## CONCEPT

In [0]:
# si standard_concept = 'S' 
# => le concept est standard donc correspond  a la colonne *_concept_id et non *_source_concept_id
query = query_schema + """
SELECT *
FROM concept
LIMIT 1
"""
concept_table = pd.read_sql_query(query, con)
print(concept_table.T)

                                              0
concept_id                              2103359
concept_name           Excision of rib, partial
domain_id                             Procedure
vocabulary_id                              CPT4
concept_class_id                           CPT4
standard_concept                              S
concept_code                              21600
valid_start_date                     1970-01-01
valid_end_date                       2099-12-31
invalid_reason                             None
search_concept    'excis':1 'partial':4 'rib':3


### Ex CONCEPT et DEATH table

In [0]:
query = query_schema + """
SELECT distinct c.concept_name, d.death_type_concept_id  as concept_id
FROM death d
JOIN concept c ON d.death_type_concept_id = c.concept_id
"""
death = pd.read_sql_query(query, con)
death

Unnamed: 0,concept_name,concept_id
0,"EHR record patient status ""Deceased""",38003569
1,US Social Security Death Master File record,261


## Discharge from hospital (example de standardisation)

In [0]:
## Les items avec la meme signification ont le meme code (ex Home = 8536)
query = query_schema + """
SELECT distinct v.discharge_to_source_value, c.concept_name, v.discharge_to_concept_id  as concept_id
FROM visit_occurrence v
JOIN concept c ON v.discharge_to_concept_id = c.concept_id
"""
discharge = pd.read_sql_query(query, con)
discharge

Unnamed: 0,discharge_to_source_value,concept_name,concept_id
0,HOME HEALTH CARE,Home,8536
1,DISC-TRAN CANCER/CHLDRN H,Inpatient Hospital,8717
2,HOSPICE-MEDICAL FACILITY,Hospice,8546
3,LONG TERM CARE HOSPITAL,Inpatient Long-term Care,8970
4,HOME,Home,8536
5,ICF,Skilled Nursing Facility,8863
6,SHORT TERM HOSPITAL,Skilled Nursing Facility,8863
7,REHAB/DISTINCT PART HOSP,Skilled Nursing Facility,8863
8,SNF,Skilled Nursing Facility,8863
9,DEAD/EXPIRED,Patient died,4216643


## Mort a l'hopital

In [0]:
query = query_schema + """
SELECT count(distinct visit_occurrence_id) AS dead_hospital_count
FROM visit_occurrence
WHERE TRUE
AND discharge_to_concept_id = 4216643                   -- concept.concept_name = 'Patient died'
"""
death_ICU = pd.read_sql_query(query, con)
death_ICU

Unnamed: 0,dead_hospital_count
0,5815


## MEASUREMENT
- *_concept_id vs *_source_concept_id (*_concept_id à 0 si non mappé)
- *_type_concept_id


- Example de code dans le readme de chaque table SQL : https://github.com/MIT-LCP/mimic-omop/tree/master/etl/StandardizedClinicalDataTables/MEASUREMENT

In [0]:
## deux colonnes d'interet measurement_concept_id et measurement_source_concept_id
query = query_schema + """
SELECT column_name, data_type 
FROM information_schema.columns 
WHERE table_name = 'measurement' 
ORDER BY ordinal_position"""
meas = pd.read_sql_query(query, con)
meas

Unnamed: 0,column_name,data_type
0,measurement_id,integer
1,person_id,integer
2,measurement_concept_id,integer
3,measurement_date,date
4,measurement_datetime,timestamp without time zone
5,measurement_type_concept_id,integer
6,operator_concept_id,integer
7,value_as_number,numeric
8,value_as_concept_id,integer
9,unit_concept_id,integer


In [0]:
# measurement_concept_id = 0 donc mapping non fait pour le moment (80% est fait)
query = query_schema + """
SELECT *
FROM measurement
LIMIT 2 
"""
meas_result = pd.read_sql_query(query, con)
print(meas_result.T)

                                                 0                    1
measurement_id                          1293892407           1293894295
person_id                                 62099621             62099621
measurement_concept_id                           0                    0
measurement_date                        2191-03-04           2191-03-04
measurement_datetime           2191-03-04 08:00:00  2191-03-04 04:00:00
measurement_type_concept_id               44818701             44818701
operator_concept_id                        4172703              4172703
value_as_number                                  0                  NaN
value_as_concept_id                           None                 None
unit_concept_id                               None                 None
range_low                                     None                 None
range_high                                    None                 None
provider_id                                  94489              

In [0]:
query = query_schema + """
SELECT c.concept_name, c.concept_id, COUNT(1)
FROM measurement m
JOIN concept c ON measurement_type_concept_id = concept_id
GROUP BY concept_name, concept_id 
ORDER BY COUNT(1) DESC;
"""
type_column = pd.read_sql_query(query, con)
type_column

Unnamed: 0,concept_name,concept_id,count
0,From physical examination,44818701,320690365
1,Labs - Chemistry,2000000011,18486124
2,Labs - Hemato,2000000009,14870291
3,Labs - Blood Gaz,2000000010,6149218
4,Output Event,2000000003,4349218
5,Derived value,45754907,1045012
6,Labs - Culture Organisms,2000000007,363506
7,Labs - Culture Sensitivity,2000000008,267350
8,Lab result,44818702,5032


## FACT_RELATIONSHIP

In [0]:
query = query_schema + """
SELECT *
FROM fact_relationship
WHERE fact_id_1 != fact_id_2 
LIMIT 4
"""
fact_relation = pd.read_sql_query(query, con)
print(fact_relation.T)

                                0         1         2         3
domain_concept_id_1            36        36        36        36
fact_id_1                78746625  78746626  78746627  78746628
domain_concept_id_2            21        21        21        21
fact_id_2                40575878  40698537  40698579  40556640
relationship_concept_id  44818854  44818854  44818854  44818854


### Ex FACT_RELATIONSHIP : antibiotiques et germes

In [0]:
## fact_id_1 is always the organisms and fact_id_2 the antibiotics tested

query = query_schema + """
SELECT m.measurement_source_value, m.value_as_concept_id, resistance.concept_name
FROM measurement m
JOIN concept resistance ON value_as_concept_id = concept_id
JOIN fact_relationship ON m.measurement_id =  fact_id_2
JOIN
(
    SELECT measurement_id AS id_is_staph
    FROM measurement m
    WHERE TRUE
    AND measurement_type_concept_id = 2000000007          -- concept.concept_name = 'Labs - Culture Organisms'
    AND value_as_concept_id = 4149419                     -- concept.concept_name = 'staph aureus coag +'
    AND measurement_concept_id = 46235217                 -- concept.concept_name = 'Bacteria identified in Blood product unit.autologous by Culture';
    LIMIT 10
) staph ON id_is_staph = fact_id_1
WHERE m.measurement_type_concept_id = 2000000008
LIMIT 10
"""
fact_relation_ex = pd.read_sql_query(query, con)
print(fact_relation_ex)

  measurement_source_value  value_as_concept_id concept_name
0               GENTAMICIN              4038110  Susceptible
1             ERYTHROMYCIN              4148441    Resistant
2                OXACILLIN              4148441    Resistant
3               VANCOMYCIN              4038110  Susceptible
4              CLINDAMYCIN              4148441    Resistant
5             LEVOFLOXACIN              4148441    Resistant
6                 RIFAMPIN              4038110  Susceptible
7               PENICILLIN              4148441    Resistant
8             TETRACYCLINE              4038110  Susceptible
9             PENICILLIN G              4148441    Resistant


## DRUG_EXPOSURE

In [0]:
# prescription written = prescriptions table 
# inpatient administration = inputevents_mv / cv 
query = query_schema + """
SELECT concept_name, drug_type_concept_id, count(1)
FROM drug_exposure
JOIN concept ON drug_type_concept_id = concept_id
GROUP BY concept_name, drug_type_concept_id
ORDER BY count(1) desc
"""
drug_type = pd.read_sql_query(query, con)
drug_type

Unnamed: 0,concept_name,drug_type_concept_id,count
0,Inpatient administration,38000180,20778301
1,Prescription written,38000177,4156457


In [0]:
# as used in prescriptions table (= prescribed medications)
# 0 = items non mappés 30% de realisé
query = query_schema + """
SELECT concept_id, concept_name, count(1)
FROM drug_exposure
JOIN CONCEPT  ON drug_concept_id = concept_id
WHERE drug_type_concept_id = 38000177                         -- concept.concept_name = 'Prescription written'
GROUP BY concept_id, concept_name
ORDER BY count(1) desc limit 10
"""
rxnorm = pd.read_sql_query(query, con)
rxnorm

Unnamed: 0,concept_id,concept_name,count
0,0,No matching concept,1022421
1,19079524,Sodium Chloride 9 MG/ML Injectable Solution,252309
2,19076324,Glucose 50 MG/ML Injectable Solution,155485
3,40141424,Sodium Chloride Prefilled Syringe,84994
4,46275280,50 ML Magnesium Sulfate 40 MG/ML Injection,55530
5,43011850,"heparin sodium, porcine 5000 UNT/ML Injectable...",53680
6,40167213,Metoprolol Tartrate 25 MG Oral Tablet,53658
7,1127433,Acetaminophen 325 MG Oral Tablet,53309
8,40167196,5 ML Metoprolol Tartrate 1 MG/ML Injection,49663
9,19135374,Calcium Chloride 0.0014 MEQ/ML / Potassium Chl...,40820
