# OMOP Data Exploration and Analysis with PostgreSQL

This jupyter notebook includes a example for querying and analyzing OMOP(Observational Medical Outcomes Partnership) data stored in a PostgreSQL database. The notebook covers essential steps, from establishing a connection to the database to executing SQL queries, and visualizing key insights.

https://www.ohdsi.org/data-standardization/

### Importing required Libraries

- **psycopg2** For establishing a connection between python and a PostgreSQL database.
- **pandas** For efficient data handling, manipulation nad analysis of data.
- **sqlio** For executing SQL queries and reading the results directly into pandas dataframe.

In [1]:
import pandas as pd
import pandas.io.sql as sqlio
import psycopg2 as ps
import warnings

warnings.filterwarnings('ignore') #ignoring warning related to sqlio and sql alchemy. Will improve this.

### Database connection and setup

Please change the username password to your OMOP DB.

In [2]:
conn = ps.connect(dbname="ohdsi",
                   user = "ohdsi_admin_user",
                   password = "admin1",
                   host = "omop-db-postgress",
                   port = "5432")

## Initial Analysis - Query Counts from OMOP DB
### Number of Person

In [4]:
df = sqlio.read_sql_query("SELECT Count(*) FROM cds_cdm.observation " , conn)
df.head()

Unnamed: 0,count
0,33403


### Number of Observation

In [4]:
df = sqlio.read_sql_query('''
SELECT visit_detail_id, lag(visit_detail_source_value, 1) 
OVER ( PARTITION BY person_id, visit_occurrence_id 
ORDER BY visit_detail_start_datetime) 
AS source_value
FROM cds_cdm.visit_detail''', conn)
df.head(10)

Unnamed: 0,visit_detail_id,source_value
0,136740,
1,136744,410e1fce-eb97-4e4b-8ce4-8a70e3f59c15
2,136344,
3,136390,
4,136385,fd645af5-a101-4342-955e-464d81289480
5,138974,
6,138967,fd645af5-a101-4342-955e-464d81289480
7,136550,
8,136557,d721cb87-080f-40b9-8979-ec9b1835abc8
9,136605,


### Number of Death

In [5]:
df = sqlio.read_sql_query("SELECT * FROM cds_cdm.visit_detail where fhir_logical_id = 'enc-9e6b3982-bf03-4cee-a692-befd1b74f96d'", conn)
df

Unnamed: 0,visit_detail_id,person_id,visit_detail_concept_id,visit_detail_start_date,visit_detail_start_datetime,visit_detail_end_date,visit_detail_end_datetime,visit_detail_type_concept_id,provider_id,care_site_id,...,visit_detail_source_concept_id,admitting_source_value,admitting_source_concept_id,discharge_to_source_value,discharge_to_concept_id,preceding_visit_detail_id,visit_detail_parent_id,visit_occurrence_id,fhir_identifier,fhir_logical_id
0,136381,112133,9201,2022-09-09,2022-09-09 14:00:45,2022-09-09,2022-09-09 23:59:59,32817,,,...,,,,fd645af5-a101-4342-955e-464d81289480,,,,153488,,enc-9e6b3982-bf03-4cee-a692-befd1b74f96d
1,136386,112133,9201,2022-09-09,2022-09-09 14:00:45,2022-09-09,2022-09-09 23:59:59,32817,,,...,,1729ca87-6458-46ce-9164-98cc596f05a2,,,,136381.0,,153488,,enc-9e6b3982-bf03-4cee-a692-befd1b74f96d


### Number of Procedure

### Number of measurements eg: Height, Weight etc.

In [13]:
df = sqlio.read_sql_query("SELECT * FROM cds_cdm.observation where observation_source_concept_id=0 ", conn)
df.head()

Unnamed: 0,observation_id,person_id,observation_concept_id,observation_date,observation_datetime,observation_type_concept_id,value_as_number,value_as_string,value_as_concept_id,qualifier_concept_id,...,visit_occurrence_id,visit_detail_id,observation_source_value,observation_source_concept_id,unit_source_value,qualifier_source_value,fhir_identifier,fhir_logical_id,value_as_boolean,value_as_datetime
0,287436,112081,3002314,2022-09-05,2022-09-05 20:51:21,32817,,,,,...,153670.0,,8665-2,0,,,,obs-3f7abf01-820e-478b-b95f-56ede7c21bf7,,2022-09-01
1,287440,112082,3002314,2022-09-05,2022-09-05 20:53:05,32817,,,,,...,153756.0,,8665-2,0,,,,obs-f63152d3-bda8-41d2-8934-873f7806a64f,,2022-09-05
2,287443,112082,4152200,2022-09-05,2022-09-05 20:53:05,32817,,ANC.B8.DE112,60000255.0,,...,153756.0,,271692001,0,,,,obs-9652f4a6-6af8-492c-b78a-590e538121fc,,NaT
3,287445,112086,3002314,2022-09-07,2022-09-07 12:18:12,32817,,,,,...,153688.0,,8665-2,0,,,,obs-ed99cf9b-a5b1-4b1c-a5e7-5fde8a4445d4,,2022-09-01
4,287456,112089,3002314,2022-09-06,2022-09-06 19:00:07,32817,,,,,...,153684.0,,8665-2,0,,,,obs-638b7f8e-a622-46dd-822c-9d8488429f26,,2022-09-01


In [20]:
#Number of Concepts
df = sqlio.read_sql_query("SELECT * FROM cds_cdm.source_to_concept_map", conn)
df.to_csv('data.csv')

### Describe Height meaurements

In [None]:
df = sqlio.read_sql_query("SELECT * FROM cds_cdm.measurement where measurement_source_concept_id=3036277", conn)
df["value_as_number"].describe()

## Encounter visit longitudinal Analysis

In [None]:
df = sqlio.read_sql_query(f"SELECT * FROM cds_cdm.visit_occurrence", conn)
df.head()

In [None]:
df = sqlio.read_sql_query(f"SELECT person_id, COUNT(person_id) AS count FROM cds_cdm.visit_occurrence GROUP BY person_id", conn)
df["count"].describe()

### What has the person with max number of visits come for?

In [None]:
max_count = df["count"].describe().max() 
person_id_max_visit = df.loc[df['count'] == max_count]["person_id"].item()
print(person_id_max_visit)
df_new = sqlio.read_sql_query(f"SELECT * FROM cds_cdm.procedure_occurrence where person_id={person_id_max_visit}", conn)
procedure_concept_id = df_new['procedure_concept_id'].iat[0]
procedure_reason = sqlio.read_sql_query(f"SELECT * FROM cds_cdm.concept where concept_id={procedure_concept_id}", conn)
print(procedure_reason)

## Procedure specific longitudinal Analysis

### Enter OMOP code for a procedure eg: Mammography for breast cancer
#### Search example
https://athena.ohdsi.org/search-terms/terms?conceptClass=Procedure&page=1&pageSize=15&query=Mammography&boosts 

In [None]:
omop_code = 4324693 #Code for Mammography. Please change this to the code that you are interested in.
df = sqlio.read_sql_query(f"SELECT * FROM cds_cdm.procedure_occurrence where procedure_concept_id={omop_code}", conn)
df.head()

In [None]:
df = sqlio.read_sql_query(f"SELECT person_id, COUNT(person_id) AS count FROM cds_cdm.procedure_occurrence where procedure_concept_id={omop_code} GROUP BY person_id", conn)
df