# OMOP Data Exploration and Analysis with PostgreSQL

This jupyter notebook includes a example for querying and analyzing OMOP(Observational Medical Outcomes Partnership) data stored in a PostgreSQL database. The notebook covers essential steps, from establishing a connection to the database to executing SQL queries, and visualizing key insights.

https://www.ohdsi.org/data-standardization/

### Importing required Libraries

- **psycopg2** For establishing a connection between python and a PostgreSQL database.
- **pandas** For efficient data handling, manipulation nad analysis of data.
- **sqlio** For executing SQL queries and reading the results directly into pandas dataframe.

In [32]:
import pandas as pd
import pandas.io.sql as sqlio
import psycopg2 as ps
import warnings

warnings.filterwarnings('ignore') #ignoring warning related to sqlio and sql alchemy. Will improve this.

### Database connection and setup

Please change the username password to your OMOP DB.

In [33]:
conn = ps.connect(dbname="ohdsi",
                   user = "ohdsi_admin_user",
                   password = "admin1",
                   host = "omop-db-postgress",
                   port = "5432")

### Number of Person

In [34]:
df = sqlio.read_sql_query("SELECT COUNT(*) FROM cds_cdm.person", conn)
df.head()

Unnamed: 0,count
0,47


### Number of Observation

In [35]:
df = sqlio.read_sql_query("SELECT COUNT(*) FROM cds_cdm.observation", conn)
df.head()

Unnamed: 0,count
0,4035


### Number of Death

In [36]:
df = sqlio.read_sql_query("SELECT COUNT(*) FROM cds_cdm.death", conn)
df.head()

Unnamed: 0,count
0,13


### Number of Procedure

In [37]:
df = sqlio.read_sql_query("SELECT COUNT(*) FROM cds_cdm.procedure_occurrence", conn)
df.head()

Unnamed: 0,count
0,3881


### Number of measurements eg: Height, Weight etc.

In [38]:
df = sqlio.read_sql_query("SELECT COUNT(*) FROM cds_cdm.measurement", conn)
df.head()

Unnamed: 0,count
0,44525


### Describe Height meaurements

In [39]:
df = sqlio.read_sql_query("SELECT * FROM cds_cdm.measurement where measurement_source_concept_id=3036277", conn)
df["value_as_number"].describe()

count    947.000000
mean     160.307075
std       25.719697
min       51.200000
25%      163.200000
50%      170.300000
75%      175.400000
max      186.000000
Name: value_as_number, dtype: float64

In [42]:
### Longitudanal 

Unnamed: 0,person_id,birth_datetime,care_site_id,day_of_birth,ethnicity_concept_id,ethnicity_source_concept_id,ethnicity_source_value,fhir_identifier,fhir_logical_id,gender_concept_id,gender_source_concept_id,gender_source_value,location_id,month_of_birth,person_source_value,provider_id,race_concept_id,race_source_concept_id,race_source_value,year_of_birth
0,95,,,20,0,,,pat-e87ea803-bdcd-994e-3f01-108763f1bec4,pat-173541,8532,,female,,12,e87ea803-bdcd-994e-3f01-108763f1bec4,,4218674,,,2012
1,96,,,13,0,,,pat-1e542b3e-c624-ee67-8a90-0091e8e79004,pat-173827,8507,,male,,1,1e542b3e-c624-ee67-8a90-0091e8e79004,,4218674,,,1981
2,97,,,16,0,,,pat-62ab9fae-baa5-5d46-b217-607ccf34500c,pat-174072,8507,,male,,8,62ab9fae-baa5-5d46-b217-607ccf34500c,,4218674,,,1985
3,98,,,26,0,,,pat-d157fe1c-e3bd-09d6-20ad-b275f1dd8bf8,pat-174239,8507,,male,,1,d157fe1c-e3bd-09d6-20ad-b275f1dd8bf8,,4218674,,,1975
4,99,,,19,0,,,pat-697480bf-e17b-2a7d-c550-d50349dee474,pat-175146,8507,,male,,1,697480bf-e17b-2a7d-c550-d50349dee474,,4218674,,,1945
