# **NOTEBOOK FOR UNDERSTANDING `SYNTHEA` DATA**

***I WILL LOOK INTO THE TABLES, FIND THE NECESSARY ONES - AND TRY TO UNDERSTAND THE RELATIONSHIP BETWEEN EACH TABLES***

## **Creating Connections with the database**

In [9]:
import os
from pathlib import Path

def get_project_root(project_name="ranojoy_data_analytics_projects"):
    """
    Universally finds the project root.
    1. Checks if we are inside the project (standard usage).
    2. Checks if we are above the project (your specific usage).
    """
    current_path = Path.cwd()

    # CASE 1: Standard Usage (Running from inside the repo)
    if project_name in str(current_path):
        while current_path.name != project_name:
            current_path = current_path.parent
        return current_path

    # CASE 2: Your Setup (Running from 'Machine' or a parent folder)
    # We search specifically for the project folder inside the current directory
    # limiting depth to 4 levels to keep it fast.
    else:
        # rglob searches recursively for the folder name
        for path in current_path.rglob(project_name):
            if path.is_dir():
                return path
        
        raise FileNotFoundError(f"Could not find project: {project_name}")

# --- USAGE ---
repo_root = get_project_root()

# Build path to your data folder
# This will work on your PC AND on anyone else's PC
data_path = repo_root / "About Synthea - Why Patients Readmit More"
print(data_path)

C:\Users\Rano's PC\Machine\github_repo_cloned\ranojoy_data_analytics_projects\About Synthea - Why Patients Readmit More


In [10]:
%load_ext autoreload
%autoreload 2
import sys, os
sys.path.append(os.path.abspath(f"{data_path}"))
import pandas as pd
import numpy as np
from sqlalchemy import create_engine,text

import src.functions as F

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [11]:
engine = F.get_engine()

In [12]:
engine

Engine(mysql+pymysql://root:***@localhost:3306/synthea_medical_dataset)

## **Checking all the tables**

In [13]:
tables = F.sql("show tables in synthea_medical_dataset",engine)
tables = [c[0] for c in tables.values]

In [18]:
tables = ['allergies',
 'careplans',
 'conditions',
 'devices',
 'encounters',
 'imaging_studies',
 'immunizations',
 'medications',
 'observations',
 'organizations',
 'patients',
 'payer_transitions',
 'payers',
 'procedures',
 'providers',
 'supplies']

In [19]:
pd.set_option('display.max_columns',None)

In [20]:
for name in tables:
    print(f'Table: {name}')
    query = f''' select * from {name}
            limit 3;
        '''
    display(F.sql(query,engine))
    print("_"*50)

Table: allergies


Unnamed: 0,START,STOP,PATIENT,ENCOUNTER,CODE,DESCRIPTION
0,1982-10-25,,76982e06-f8b8-4509-9ca3-65a99c8650fe,b896bf40-8b72-42b7-b205-142ee3a56b55,300916003,Latex allergy
1,1982-10-25,,76982e06-f8b8-4509-9ca3-65a99c8650fe,b896bf40-8b72-42b7-b205-142ee3a56b55,300913006,Shellfish allergy
2,2002-01-25,,71ba0469-f0cc-4177-ac70-ea07cb01c8b8,7be1a590-4239-4826-9872-031327f3c368,419474003,Allergy to mould


__________________________________________________
Table: careplans


Unnamed: 0,Id,START,STOP,PATIENT,ENCOUNTER,CODE,DESCRIPTION,REASONCODE,REASONDESCRIPTION
0,d2500b8c-e830-433a-8b9d-368d30741520,2010-01-23,2012-01-23,034e9e3b-2def-4559-bb2a-7850888ae060,d0c40d10-8d87-447e-836e-99d26ad52ea5,53950000,Respiratory therapy,10509002.0,Acute bronchitis (disorder)
1,07d9ddd8-dfa1-4e43-9bfe-39f63f4ace15,2011-05-13,2011-08-02,10339b10-3cd1-4ac3-ac13-ec26728cb592,e1ab4933-07a1-49f0-b4bd-05500919061d,53950000,Respiratory therapy,10509002.0,Acute bronchitis (disorder)
2,a3bb6e99-3b99-44b3-974c-e230b4511b5c,2011-12-31,2012-11-30,f5dcd418-09fe-4a2f-baa0-3da800bd8c3a,16300c56-a035-4126-a656-68c093da6dfc,53950000,Respiratory therapy,10509002.0,Acute bronchitis (disorder)


__________________________________________________
Table: conditions


Unnamed: 0,START,STOP,PATIENT,ENCOUNTER,CODE,DESCRIPTION
0,2001-05-01,,1d604da9-9a81-4ba9-80c2-de3375d59b40,8f104aa7-4ca9-4473-885a-bba2437df588,40055000,Chronic sinusitis (disorder)
1,2011-08-09,2011-08-16,8d4c4326-e9de-4f45-9a4c-f8c36bff89ae,9d35ec9f-352a-4629-92ef-38eae38437e7,444814009,Viral sinusitis (disorder)
2,2011-11-16,2011-11-26,8d4c4326-e9de-4f45-9a4c-f8c36bff89ae,ae7555a9-eaff-4c09-98a7-21bc6ed1b1fd,195662009,Acute viral pharyngitis (disorder)


__________________________________________________
Table: devices


Unnamed: 0,START,STOP,PATIENT,ENCOUNTER,CODE,DESCRIPTION,UDI
0,2001-07-04T08:42:44Z,,d49f748f-928d-40e8-92c8-73e4c5679711,2500b8bd-dc98-44ef-a252-22dc4f81d61b,72506001,Implantable defibrillator device (physical ob...,(01)67677988606464(11)010613(17)260628(10)2882...
1,2010-10-24T05:28:51Z,,e05dd037-e45f-434a-946d-5c9076a2f596,83cda5b1-4653-4327-a81c-14c4af434e95,72506001,Implantable defibrillator device (physical ob...,(01)20180595258346(11)101003(17)351018(10)1391...
2,2017-06-04T11:11:47Z,,0f5646bc-a156-4ec0-9252-5b592e3d3184,7302a9ce-5845-456d-af2e-bca7a1fc387f,72506001,Implantable defibrillator device (physical ob...,(01)87988924478445(11)170514(17)420529(10)4143...


__________________________________________________
Table: encounters


Unnamed: 0,Id,start,stop,PATIENT,ORGANIZATION,PROVIDER,PAYER,ENCOUNTERCLASS,CODE,DESCRIPTION,BASE_ENCOUNTER_COST,TOTAL_CLAIM_COST,PAYER_COVERAGE,REASONCODE,REASONDESCRIPTION,duration
0,d0c40d10-8d87-447e-836e-99d26ad52ea5,2010-01-23 17:45:28,2010-01-23 18:10:28,034e9e3b-2def-4559-bb2a-7850888ae060,e002090d-4e92-300e-b41e-7d1f21dee4c6,e6283e46-fd81-3611-9459-0edb1c3da357,6e2f1a2d-27bd-3701-8d08-dae202c58632,ambulatory,185345009,Encounter for symptom,129.16,129.16,54.16,10509002.0,Acute bronchitis (disorder),25.0
1,e88bc3a9-007c-405e-aabc-792a38f4aa2b,2012-01-23 17:45:28,2012-01-23 18:00:28,034e9e3b-2def-4559-bb2a-7850888ae060,772ee193-bb9f-30eb-9939-21e86c8e4da5,6f1d59a7-a5bd-3cf9-9671-5bad2f351c28,6e2f1a2d-27bd-3701-8d08-dae202c58632,wellness,162673000,General examination of patient (procedure),129.16,129.16,129.16,,,15.0
2,8f104aa7-4ca9-4473-885a-bba2437df588,2001-05-01 15:02:18,2001-05-01 15:17:18,1d604da9-9a81-4ba9-80c2-de3375d59b40,5d4b9df1-93ae-3bc9-b680-03249990e558,af01a385-31d3-3c77-8fdb-2867fe88df2f,b1c428d6-4f07-31e0-90f0-68ffa6ff8c76,ambulatory,185345009,Encounter for symptom,129.16,129.16,0.0,36971009.0,Sinusitis (disorder),15.0


__________________________________________________
Table: imaging_studies


Unnamed: 0,Id,DATE,PATIENT,ENCOUNTER,BODYSITE_CODE,BODYSITE_DESCRIPTION,MODALITY_CODE,MODALITY_DESCRIPTION,SOP_CODE,SOP_DESCRIPTION
0,d3e49b38-7634-4416-879d-7bc68bf3e7df,2014-07-08T15:35:36Z,b58731cc-2d8b-4c2d-b327-4cab771af3ef,3a36836d-da25-4e73-808b-972b669b7e4e,40983000,Arm,DX,Digital Radiography,1.2.840.10008.5.1.4.1.1.1.1,Digital X-Ray Image Storage
1,46baf530-4941-40ab-8219-685a08fd9086,2014-01-22T18:58:37Z,2ffe9369-24e4-414b-8973-258fad09313a,33b71e4b-0690-4fe9-897a-dc3b2ff9215c,40983000,Arm,DX,Digital Radiography,1.2.840.10008.5.1.4.1.1.1.1,Digital X-Ray Image Storage
2,b8fb8a6e-a2f5-46c9-8b3f-a35aa982efcd,2001-12-01T02:08:27Z,86b97fc7-ae8f-4e0d-8e66-db68f36e7a76,e42d1046-568d-46c2-b0a5-d910b2f3bd1d,8205005,Wrist,DX,Digital Radiography,1.2.840.10008.5.1.4.1.1.1.1,Digital X-Ray Image Storage


__________________________________________________
Table: immunizations


Unnamed: 0,DATE,PATIENT,ENCOUNTER,CODE,DESCRIPTION,BASE_COST
0,2010-07-27T12:58:08Z,10339b10-3cd1-4ac3-ac13-ec26728cb592,dae2b7cb-1316-4b78-954f-fa610a6c6d0e,140,Influenza seasonal injectable preservative ...,140.52
1,2010-11-20T03:04:34Z,f5dcd418-09fe-4a2f-baa0-3da800bd8c3a,7ff86631-0378-4bfc-92ce-1edd697eb18e,140,Influenza seasonal injectable preservative ...,140.52
2,2012-01-23T17:45:28Z,034e9e3b-2def-4559-bb2a-7850888ae060,e88bc3a9-007c-405e-aabc-792a38f4aa2b,140,Influenza seasonal injectable preservative ...,140.52


__________________________________________________
Table: medications


Unnamed: 0,START,STOP,PATIENT,PAYER,ENCOUNTER,CODE,DESCRIPTION,BASE_COST,PAYER_COVERAGE,DISPENSES,TOTALCOST,REASONCODE,REASONDESCRIPTION
0,2010-05-05T00:26:23Z,2011-04-30T00:26:23Z,8d4c4326-e9de-4f45-9a4c-f8c36bff89ae,b1c428d6-4f07-31e0-90f0-68ffa6ff8c76,1e0d6b0e-1711-4a25-99f9-b1c700c9b260,389221,Etonogestrel 68 MG Drug Implant,677.08,0.0,12,8124.96,,
1,2011-04-30T00:26:23Z,2012-04-24T00:26:23Z,8d4c4326-e9de-4f45-9a4c-f8c36bff89ae,b1c428d6-4f07-31e0-90f0-68ffa6ff8c76,6aa37300-d1b4-48e7-a2f8-5e0f70f48f38,389221,Etonogestrel 68 MG Drug Implant,624.09,0.0,12,7489.08,,
2,2012-04-24T00:26:23Z,2013-04-19T00:26:23Z,8d4c4326-e9de-4f45-9a4c-f8c36bff89ae,b1c428d6-4f07-31e0-90f0-68ffa6ff8c76,7253a9f9-6f6d-429a-926a-7b1d424eae3f,748856,Yaz 28 Day Pack,43.32,0.0,12,519.84,,


__________________________________________________
Table: observations


Unnamed: 0,DATE,PATIENT,ENCOUNTER,CODE,DESCRIPTION,VALUE,UNITS,TYPE
0,2012-01-23T17:45:28Z,034e9e3b-2def-4559-bb2a-7850888ae060,e88bc3a9-007c-405e-aabc-792a38f4aa2b,8302-2,Body Height,193.3,cm,numeric
1,2012-01-23T17:45:28Z,034e9e3b-2def-4559-bb2a-7850888ae060,e88bc3a9-007c-405e-aabc-792a38f4aa2b,72514-3,Pain severity - 0-10 verbal numeric rating [Sc...,2.0,{score},numeric
2,2012-01-23T17:45:28Z,034e9e3b-2def-4559-bb2a-7850888ae060,e88bc3a9-007c-405e-aabc-792a38f4aa2b,29463-7,Body Weight,87.8,kg,numeric


__________________________________________________
Table: organizations


Unnamed: 0,Id,NAME,ADDRESS,CITY,STATE,ZIP,LAT,LON,PHONE,REVENUE,UTILIZATION
0,ef58ea08-d883-3957-8300-150554edc8fb,HEALTHALLIANCE HOSPITALS INC,60 HOSPITAL ROAD,LEOMINSTER,MA,1453,42.520838,-71.770876,9784662000,198002.28,1557
1,69176529-fd1f-3b3f-abce-a0a3626769eb,MOUNT AUBURN HOSPITAL,330 MOUNT AUBURN STREET,CAMBRIDGE,MA,2138,42.375967,-71.118275,6174923500,288569.17,2296
2,5e765f2b-e908-3888-9fc7-df2cb87beb58,STURDY MEMORIAL HOSPITAL,211 PARK STREET,ATTLEBORO,MA,2703,41.931653,-71.294503,5082225200,287355.03,2287


__________________________________________________
Table: patients


Unnamed: 0,Id,BIRTHDATE,DEATHDATE,SSN,DRIVERS,PASSPORT,PREFIX,FIRST,LAST,SUFFIX,MAIDEN,MARITAL,RACE,ETHNICITY,GENDER,BIRTHPLACE,ADDRESS,CITY,STATE,COUNTY,ZIP,LAT,LON,HEALTHCARE_EXPENSES,HEALTHCARE_COVERAGE
0,1d604da9-9a81-4ba9-80c2-de3375d59b40,1989-05-25,,999-76-6866,S99984236,X19277260X,Mr.,José Eduardo181,Gómez206,,,M,white,hispanic,M,Marigot Saint Andrew Parish DM,427 Balistreri Way Unit 19,Chicopee,Massachusetts,Hampden County,1013.0,42.228354,-72.562951,271227.08,1334.88
1,034e9e3b-2def-4559-bb2a-7850888ae060,1983-11-14,,999-73-5361,S99962402,X88275464X,Mr.,Milo271,Feil794,,,M,white,nonhispanic,M,Danvers Massachusetts US,422 Farrell Path Unit 69,Somerville,Massachusetts,Middlesex County,2143.0,42.360697,-71.126531,793946.01,3204.49
2,10339b10-3cd1-4ac3-ac13-ec26728cb592,1992-06-02,,999-27-3385,S99972682,X73754411X,Mr.,Jayson808,Fadel536,,,M,white,nonhispanic,M,Springfield Massachusetts US,1056 Harris Lane Suite 70,Chicopee,Massachusetts,Hampden County,1020.0,42.181642,-72.608842,574111.9,2606.4


__________________________________________________
Table: payer_transitions


Unnamed: 0,PATIENT,START_YEAR,END_YEAR,PAYER,OWNERSHIP
0,1d604da9-9a81-4ba9-80c2-de3375d59b40,1989,1998,b1c428d6-4f07-31e0-90f0-68ffa6ff8c76,Guardian
1,034e9e3b-2def-4559-bb2a-7850888ae060,1983,2001,6e2f1a2d-27bd-3701-8d08-dae202c58632,Guardian
2,8d4c4326-e9de-4f45-9a4c-f8c36bff89ae,1978,1995,7c4411ce-02f1-39b5-b9ec-dfbea9ad3c1a,Guardian


__________________________________________________
Table: payers


Unnamed: 0,Id,NAME,ADDRESS,CITY,STATE_HEADQUARTERED,ZIP,PHONE,AMOUNT_COVERED,AMOUNT_UNCOVERED,REVENUE,COVERED_ENCOUNTERS,UNCOVERED_ENCOUNTERS,COVERED_MEDICATIONS,UNCOVERED_MEDICATIONS,COVERED_PROCEDURES,UNCOVERED_PROCEDURES,COVERED_IMMUNIZATIONS,UNCOVERED_IMMUNIZATIONS,UNIQUE_CUSTOMERS,QOLS_AVG,MEMBER_MONTHS
0,b3221cfc-24fb-339e-823d-bc4136cbc4ed,Dual Eligible,7500 Security Blvd,Baltimore,MD,21244.0,1-877-267-2323,141676.87,119449.83,1305000.0,907,0,556,0,280,0,223,0,25,0.36281,3348
1,7caa7254-5050-3b5e-9eae-bd5ea30e809c,Medicare,7500 Security Blvd,Baltimore,MD,21244.0,1-800-633-4227,4756720.13,890394.37,7122250.0,11072,0,13471,0,7383,0,2557,0,198,0.786223,29760
2,7c4411ce-02f1-39b5-b9ec-dfbea9ad3c1a,Medicaid,7500 Security Blvd,Baltimore,MD,21244.0,1-877-267-2323,3636994.03,1814967.93,20365500.0,26878,0,6382,0,47611,0,7022,0,507,0.812137,84276


__________________________________________________
Table: procedures


Unnamed: 0,DATE,PATIENT,ENCOUNTER,CODE,DESCRIPTION,BASE_COST,REASONCODE,REASONDESCRIPTION
0,2011-04-30T00:26:23Z,8d4c4326-e9de-4f45-9a4c-f8c36bff89ae,6aa37300-d1b4-48e7-a2f8-5e0f70f48f38,169553002,Insertion of subcutaneous contraceptive,14896.56,,
1,2010-07-27T12:58:08Z,10339b10-3cd1-4ac3-ac13-ec26728cb592,dae2b7cb-1316-4b78-954f-fa610a6c6d0e,430193006,Medication Reconciliation (procedure),726.51,,
2,2010-11-20T03:04:34Z,f5dcd418-09fe-4a2f-baa0-3da800bd8c3a,7ff86631-0378-4bfc-92ce-1edd697eb18e,430193006,Medication Reconciliation (procedure),788.5,,


__________________________________________________
Table: providers


Unnamed: 0,Id,ORGANIZATION,NAME,GENDER,SPECIALITY,ADDRESS,CITY,STATE,ZIP,LAT,LON,UTILIZATION
0,3421aa75-dec7-378d-a9e0-0bc764e4cb0d,ef58ea08-d883-3957-8300-150554edc8fb,Tomas436 Sauer652,M,GENERAL PRACTICE,60 HOSPITAL ROAD,LEOMINSTER,MA,1453,42.520838,-71.770876,1557
1,c9b3c857-2e24-320c-a79a-87b8a60de63c,69176529-fd1f-3b3f-abce-a0a3626769eb,Suzette512 Monahan736,F,GENERAL PRACTICE,330 MOUNT AUBURN STREET,CAMBRIDGE,MA,2138,42.375967,-71.118275,2296
2,0359f968-d1a6-30eb-b1cc-e6cc0b4d3513,5e765f2b-e908-3888-9fc7-df2cb87beb58,Gaynell126 Streich926,F,GENERAL PRACTICE,211 PARK STREET,ATTLEBORO,MA,2703,41.931653,-71.294503,2287


__________________________________________________
Table: supplies


Unnamed: 0,DATE,PATIENT,ENCOUNTER,CODE,DESCRIPTION,QUANTITY


__________________________________________________


### **Understanding about this data**

- ***After checking the datasets, I can understand the encounters table is the major table where each patient visit is recorded***

- ***For our analysis of patient readmission, we do not need to use all the tables***

- ***Some important tables are - `encounters`,`allergies`,`careplan`,`conditions`,`medications`,`procedures`,`patients`,`providers`***

- ***I will not use `organization` or `payer` table, as they are not directly related to `patients` and `encounters`***

- ***need to clean these tables, with keeping backup...and create a new dataframe by using aggregated column according to my need**

In [16]:
necessary_tables = ['encounters','allergies','careplans',
                    'conditions','medications',
                    'procedures','patients','providers']

## **Creating the backup tables of necessary tables**

In [8]:
import time


start = time.time()
for table in necessary_tables:
    query = f'''
        create table backup_{table} like {table}
    '''
    F.run(query)
    query = f'''
    insert into backup_{table} select * from {table}
    '''
    F.run(query)
end = time.time()
print(f"Time Taken: {round((end - start),2)}")

Time Taken: 10.01
