# EHR Data Profiler
## Run the next cell to make all the imports, which include Pandas and the EHR data anaylsis functions:

In [16]:
import pandas as pd
import matplotlib.pyplot as plt
from lib.ehr_dp_lib import *
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_rows', 500)
pd.set_option('display.width', 1000)

## The following cells in this notebook are auto-generated from the data in the 'Data' folder
## For each table a Pandas dataframe is created to connect to each table

### Below is a list of the EHR Data Profiler functions, arguments, and descriptions:


- **missingness( *dataframe name* )**: Returns a dataframe of the number of null values per column.


- **catbar( *dataframe name, column name, graph=(True or False)*)**: \[Generated on *categorical* data type only\] Returns a dataframe of counts of all the groups of categories in the specific column in the dataframe. When `graph` argument set to `True` returns a bar graph.


- **numstats( *dataframe name, column name* )**: \[Generated on *number* data type only\] Returns a dataframe of descriptive statistics (ie. mean, max, min, median, quartiles) for the column data.


- **dateline( *dataframe name, column name* )**: \[Generated on *date* data type only\] Returns a line graph of the freuency of specific dates along an x-axis of time.


- **flow_stats( *flowsheet dataframe* )**: \[Generated only if Flowsheet_Vitals.csv table in Data folder\] Returns a dataframe of descriptive statistics for common vitals sign types (ie. Height, Weight, Temperature, Sp02, Pulse, BMI, Respirations).


- **lab_stats( *lab dataframe, top=(10 or greater)* )**: \[Generated only if Labs.csv table in Data folder\] Returns a dataframe of descriptive statistics for top lab procedures in dataset. The `top` argument can be adjusted to capture more lab procedures.

## Using TEXT_SEARCH

### Another useful function included is 'text_search'. It is useful way to search specific columns in dataframes for text and return only those rows that contain the text.

- **text_search( *dataframe name, column name, text to search, ignore case=(True by default can also be set to False)* )**


### Example:
If you wanted to search Patient Demographics data for patients whose 'ETHNICITY' contains the text 'latino' using text_search:
`text_search(patient_demographics_df, 'ETHNICITY', 'latino')`

Result:
![latino_search.PNG](lib/latino_search.PNG)


## Combining TEXT_SEARCH with other functions:
### You can also combine functions to get the a specific analytical calculation. 

### Example:
If you wanted to get a set of counts of the categories in `SEX` of the patients (ie. Male, Female) in the previous dataset of 'latino'. First, you would need to assign the result of the `text_search` to a new value, in this case `latino_pats`:

`latino_pats = text_search(patient_demographics_df, 'ETHNICITY', 'latino')
catbar(latino_pats, 'SEX', graph='True')`

Result:
![latino_gender_search.PNG](lib/latino_gender_search.PNG)


## Run the following block to describe the tables in your Data folder:

In [None]:
describe_tables()

## PATIENT_DEMOGRAPHICS

In [52]:
patient_demographics_df = pd.read_csv('Data/Patient_Demographics.csv')
patient_demographics_df

Unnamed: 0,IP_PATIENT_ID,AGE,SEX,RACE,ETHNICITY,VITAL_STATUS,LANGUAGE,MARITAL_STATUS,SEXUAL_ORIENTATION,RELIGION,ADI_NATRANK,ADI_STATERNK,EDUCATION,INCOME,SVI_SOCIO_ECON,SVI_HCOMP,SVI_MINO_LANG,SVI_HTYPE_TRANS,SVI_TOTAL
0,IPPAT_101101099917108,60.0,Male,White or Caucasian,Unknown,Not Known Deceased,English,Single,,Unknown,9.0,4.0,SHRINE|EDU:30-40,SHRINE|INC:100k-150k,0.3697,0.5022,0.7505,0.8394,0.6433
1,IPPAT_101101099942813,101.0,Female,Unknown,Unknown,Not Known Deceased,Unknown,Unknown,,Unknown,,,,,,,,,
2,IPPAT_101101099967579,53.0,Male,Unknown,Unknown,Not Known Deceased,English,Single,,Christian,,,SHRINE|EDU:50-60,SHRINE|INC:100k-150k,,,,,
3,IPPAT_101101099971777,107.0,Female,White or Caucasian,Unknown,Not Known Deceased,English,Widowed,,Methodist,8.0,3.0,SHRINE|EDU:40-50,SHRINE|INC:100k-150k,0.084,0.3945,0.7175,0.3115,0.2707
4,IPPAT_101101099983912,60.0,Female,White or Caucasian,Unknown,Not Known Deceased,English,Single,,Jewish,,,,,,,,,
5,IPPAT_101101099986815,64.0,Male,White or Caucasian,Unknown,Not Known Deceased,English,Married,,Protestant,9.0,4.0,SHRINE|EDU:50-60,SHRINE|INC:75k-100k,0.5095,0.0165,0.8446,0.9413,0.5644
6,IPPAT_101101099989818,60.0,Male,Other,Not Hispanic or Latino,Not Known Deceased,English,Single,Straight (not lesbian or gay),,5.0,1.0,SHRINE|EDU:60-70,SHRINE|INC:100k-150k,0.0758,0.0119,0.1684,0.4847,0.0412
7,IPPAT_101101100000423,54.0,Male,White or Caucasian,Unknown,Not Known Deceased,Unknown,Single,,Unknown,,,SHRINE|EDU:50-60,SHRINE|INC:-,-999.0,0.0011,0.5527,-999.0,-999.0
8,IPPAT_101101100002433,77.0,Female,White or Caucasian,Unknown,Not Known Deceased,Unknown,Married,,Jehovah's Witness,,,SHRINE|EDU:40-50,SHRINE|INC:150k-200k,,,,,
9,IPPAT_101101100017191,36.0,Female,Black or African American,Unknown,Not Known Deceased,English,Single,,Baptist,2.0,1.0,SHRINE|EDU:60-70,SHRINE|INC:100k-150k,0.4355,0.0687,0.7109,0.4703,0.3846


In [65]:
ids = text_search(patient_demographics_df, 'LANGUAGE', 'Unknown', exclusion=True, return_type='ids')
filter_by_ids(encounters_df, ids)

Unnamed: 0,IP_PATIENT_ID,IP_ENC_ID,EPIC_ENCOUNTER_TYPE,IP_VISIT_TYPE,ENCOUNTER_DATE,ENCOUNTER_AGE,ADMIT_DATE,DISCHARGE_DATE,HOSP_DISCHARGE_DISPOSITION,ED_DISPOSITION,EPIC_DEPARTMENT_ID,EPIC_DEPARTMENT_NAME,DEPARTMENT_SPECIALTY,LOCATION
0,IPPAT_101101099989818,IPENC_101102059243852,Appointment,Other Ambulatory Visit,11/24/2015 00:00,53,,,,,10303.0,US IMG RRH,Radiology,RONALD REAGAN UCLA MEDICAL CENTER
1,IPPAT_101101099989818,IPENC_101102061139381,Hospital Encounter,Ambulatory Visit,10/24/2015 00:00,53,10/24/2015 00:00,10/24/2015 23:59,Home or Self Care,,10505.0,UCLA IMAGE LIB TELERAD,Radiology,RONALD REAGAN UCLA MEDICAL CENTER
2,IPPAT_101101099989818,IPENC_101102061380314,Appointment,Other Ambulatory Visit,11/17/2015 00:00,53,,,,,60145.0,NUC MED CARD IMG MP1,Nuclear Medicine,WW CARDIACNUCLEAR IMAGING
3,IPPAT_101101099989818,IPENC_101102062013335,Ancillary Orders,Other,10/26/2015 00:00,53,,,,,60413.0,SURG PANCREAS MP2,"Surgery, Pancreatic",WW PLI AND GEN SURG SUITE
4,IPPAT_101101099989818,IPENC_101102062037888,Orders Only,Other,10/26/2015 00:00,53,,,,,60413.0,SURG PANCREAS MP2,"Surgery, Pancreatic",WW PLI AND GEN SURG SUITE
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11202,IPPAT_101101115185525,IPENC_101102217943761,Telephone,Other Ambulatory Visit,12/05/2022 00:00,60,,,,,60171.0,MSS PULMONOLOGY MP2,"Medicine, Pulmonary Disease",WW MED SPECIALTY SUITES
11203,IPPAT_101101115185525,IPENC_101102218053199,History,Other,12/02/2022 00:00,60,,,,,10201.0,RR PERIOPERATIVE AREA,Other Specialty,RONALD REAGAN UCLA MEDICAL CENTER
11204,IPPAT_101101115185525,IPENC_101102218244342,BPA,Other,12/02/2022 00:00,60,,,,,10202.0,RR PTU,Other Specialty,RONALD REAGAN UCLA MEDICAL CENTER
11205,IPPAT_101101115185525,IPENC_101102218464086,Appointment,Other,12/02/2022 00:00,60,,,,,10208.0,RR CATH,"Medicine, Interventional Cardiology",RONALD REAGAN UCLA MEDICAL CENTER


In [68]:
pat_df = patient_demographics_df

t1_groups = ['AGE','RACE','ETHNICITY','LANGUAGE','EDUCATION','INCOME']
output = ''

for grp in t1_groups:
    if grp == 'AGE':
        grp_df = pat_df['AGE']
        grp_df['a_grp'] = ''
        grp_df.loc[grp_df['AGE'] < 18, 'a_grp'] = '<18'
        grp_df.loc[(grp_df['AGE'] >= 18) & (grp_df['AGE'] < 25), 'a_grp'] = '18-24'
        grp_df.loc[(grp_df['AGE'] >= 25) & (grp_df['AGE'] < 35), 'a_grp'] = '25-34'
        grp_df.loc[(grp_df['AGE'] >= 35) & (grp_df['AGE'] < 45), 'a_grp'] = '35-44'
        grp_df.loc[(grp_df['AGE'] >= 45) & (grp_df['AGE'] < 55), 'a_grp'] = '45-54'
        grp_df.loc[(grp_df['AGE'] >= 55) & (grp_df['AGE'] < 65), 'a_grp'] = '55-64'
        grp_df.loc[(grp_df['AGE'] >= 65), 'a_grp'] = '65+'
#         grp_df['AGE'] = grp_df['a_grp'].value_counts().reset_index()
        grp_df
#     else:
#         grp_df = pat_df[grp].value_counts().reset_index()
        
#     grp_total = grp_df[grp].value_counts().sum()
#     grp_df['perc'] = round(grp_df[grp] / int(grp_total), 1)

#     output += f'\n{grp}\n-----------------\n'
#     for i, row in grp_df.iterrows():
#         output += f'{row[0]} => {row[2]}\n'
            
# print(output)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':


KeyError: 'AGE'

In [None]:
missingness(patient_demographics_df)

In [None]:
catbar(patient_demographics_df, 'LANGUAGE', graph=False) ## Set graph=True for Bar graph

In [None]:
catbar(patient_demographics_df, 'SEX', graph=False) ## Set graph=True for Bar graph

In [None]:
catbar(patient_demographics_df, 'MARITAL_STATUS', graph=False) ## Set graph=True for Bar graph

In [None]:
catbar(patient_demographics_df, 'ETHNICITY', graph=False) ## Set graph=True for Bar graph

In [None]:
numstats(patient_demographics_df, 'AGE')

In [None]:
catbar(patient_demographics_df, 'RELIGION', graph=False) ## Set graph=True for Bar graph

In [None]:
catbar(patient_demographics_df, 'RACE', graph=False) ## Set graph=True for Bar graph

In [None]:
catbar(patient_demographics_df, 'SEXUAL_ORIENTATION', graph=False) ## Set graph=True for Bar graph

## ENCOUNTERS

In [2]:
encounters_df = pd.read_csv('Data/Encounters.csv')
encounters_df

Unnamed: 0,IP_PATIENT_ID,IP_ENC_ID,EPIC_ENCOUNTER_TYPE,IP_VISIT_TYPE,ENCOUNTER_DATE,ENCOUNTER_AGE,ADMIT_DATE,DISCHARGE_DATE,HOSP_DISCHARGE_DISPOSITION,ED_DISPOSITION,EPIC_DEPARTMENT_ID,EPIC_DEPARTMENT_NAME,DEPARTMENT_SPECIALTY,LOCATION
0,IPPAT_101101099989818,IPENC_101102059243852,Appointment,Other Ambulatory Visit,11/24/2015 00:00,53,,,,,10303.0,US IMG RRH,Radiology,RONALD REAGAN UCLA MEDICAL CENTER
1,IPPAT_101101099989818,IPENC_101102061139381,Hospital Encounter,Ambulatory Visit,10/24/2015 00:00,53,10/24/2015 00:00,10/24/2015 23:59,Home or Self Care,,10505.0,UCLA IMAGE LIB TELERAD,Radiology,RONALD REAGAN UCLA MEDICAL CENTER
2,IPPAT_101101099989818,IPENC_101102061380314,Appointment,Other Ambulatory Visit,11/17/2015 00:00,53,,,,,60145.0,NUC MED CARD IMG MP1,Nuclear Medicine,WW CARDIACNUCLEAR IMAGING
3,IPPAT_101101099989818,IPENC_101102062013335,Ancillary Orders,Other,10/26/2015 00:00,53,,,,,60413.0,SURG PANCREAS MP2,"Surgery, Pancreatic",WW PLI AND GEN SURG SUITE
4,IPPAT_101101099989818,IPENC_101102062037888,Orders Only,Other,10/26/2015 00:00,53,,,,,60413.0,SURG PANCREAS MP2,"Surgery, Pancreatic",WW PLI AND GEN SURG SUITE
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11202,IPPAT_101101115185525,IPENC_101102217943761,Telephone,Other Ambulatory Visit,12/05/2022 00:00,60,,,,,60171.0,MSS PULMONOLOGY MP2,"Medicine, Pulmonary Disease",WW MED SPECIALTY SUITES
11203,IPPAT_101101115185525,IPENC_101102218053199,History,Other,12/02/2022 00:00,60,,,,,10201.0,RR PERIOPERATIVE AREA,Other Specialty,RONALD REAGAN UCLA MEDICAL CENTER
11204,IPPAT_101101115185525,IPENC_101102218244342,BPA,Other,12/02/2022 00:00,60,,,,,10202.0,RR PTU,Other Specialty,RONALD REAGAN UCLA MEDICAL CENTER
11205,IPPAT_101101115185525,IPENC_101102218464086,Appointment,Other,12/02/2022 00:00,60,,,,,10208.0,RR CATH,"Medicine, Interventional Cardiology",RONALD REAGAN UCLA MEDICAL CENTER


In [None]:
missingness(encounters_df)

In [3]:
occurrence_stats(encounters_df, 'IP_ENC_ID')

Unnamed: 0,Patients w/ Occurrence,Occurrence Min,Occurrence Max,Occurrence Mean
0,200,1,1231,56.035


In [None]:
dateline(encounters_df, 'ENCOUNTER_DATE')

In [None]:
numstats(encounters_df, 'ENCOUNTER_AGE')

In [None]:
catbar(encounters_df, 'EPIC_ENCOUNTER_TYPE', graph=False) ## Set graph=True for Bar graph

In [None]:
catbar(encounters_df, 'IP_VISIT_TYPE', graph=False) ## Set graph=True for Bar graph

In [None]:
catbar(encounters_df, 'EPIC_DEPARTMENT_NAME', graph=False) ## Set graph=True for Bar graph

In [None]:
catbar(encounters_df, 'HOSP_DISCHARGE_DISPOSITION', graph=False) ## Set graph=True for Bar graph

In [None]:
catbar(encounters_df, 'ED_DISPOSITION', graph=False) ## Set graph=True for Bar graph

In [None]:
catbar(encounters_df, 'DEPARTMENT_SPECIALTY', graph=False) ## Set graph=True for Bar graph

In [None]:
catbar(encounters_df, 'LOCATION', graph=False) ## Set graph=True for Bar graph

## ENCOUNTER_DIAGNOSES

In [8]:
encounter_diagnoses_df = pd.read_csv('Data/Encounter_Diagnoses.csv')
encounter_diagnoses_df

Unnamed: 0,IP_PATIENT_ID,IP_ENC_ID,DIAGNOSIS_DATE,ICD_TYPE,ICD_CODE,ICD_DESCRIPTION,PRIMARY_DIAGNOSIS_FLAG,ADMISSION_DIAGNOSIS_FLAG,PRESENT_ON_ADMISSION,HOSPITAL_FINAL_DIAGNOSIS
0,IPPAT_101101099989818,IPENC_101102061139381,10/24/2015 00:00,9,V65.8,Other reasons for seeking consultation,S,0,0,0
1,IPPAT_101101099989818,IPENC_101102061380314,11/17/2015 00:00,9,V72.84,"Preoperative examination, unspecified",S,0,0,0
2,IPPAT_101101099989818,IPENC_101102062013335,10/26/2015 00:00,9,199.1,Other malignant neoplasm without specification of site (HCC/RAF),S,0,0,0
3,IPPAT_101101099989818,IPENC_101102062013335,10/26/2015 00:00,9,780.99,Other general symptoms(780.99),P,0,0,0
4,IPPAT_101101099989818,IPENC_101102062037888,10/26/2015 00:00,9,577.8,Other specified disease of pancreas,P,0,0,0
...,...,...,...,...,...,...,...,...,...,...
9121,IPPAT_101101115185525,IPENC_101102217454482,12/02/2022 00:00,10,J84.10,"Pulmonary fibrosis, unspecified (HCC/RAF)",S,0,0,0
9122,IPPAT_101101115185525,IPENC_101102217618188,11/26/2022 00:00,10,Z01.812,Encounter for preprocedural laboratory examination,P,0,0,0
9123,IPPAT_101101115185525,IPENC_101102217618188,11/26/2022 00:00,10,Z20.822,Contact with and (suspected) exposure to COVID-19,P,0,0,0
9124,IPPAT_101101115185525,IPENC_101102217740318,11/30/2022 00:00,10,Z01.812,Encounter for preprocedural laboratory examination,S,0,0,0


In [None]:
missingness(encounter_diagnoses_df)

In [9]:
occurrence_stats(encounter_diagnoses_df, 'IP_ENC_ID')

Unnamed: 0,Patients w/ Occurrence,Occurrence Min,Occurrence Max,Occurrence Mean
0,158,1,1130,57.759494


In [None]:
dateline(encounter_diagnoses_df, 'DIAGNOSIS_DATE')

In [None]:
catbar(encounter_diagnoses_df, 'PRESENT_ON_ADMISSION', graph=False) ## Set graph=True for Bar graph

In [None]:
catbar(encounter_diagnoses_df, 'ADMISSION_DIAGNOSIS_FLAG', graph=False) ## Set graph=True for Bar graph

In [None]:
catbar(encounter_diagnoses_df, 'HOSPITAL_FINAL_DIAGNOSIS', graph=False) ## Set graph=True for Bar graph

In [None]:
catbar(encounter_diagnoses_df, 'PRIMARY_DIAGNOSIS_FLAG', graph=False) ## Set graph=True for Bar graph

## PROCEDURES

In [None]:
procedures_df = pd.read_csv('Data/Procedures.csv')
procedures_df

In [None]:
missingness(procedures_df)

In [None]:
occurrence_stats(procedures_df, 'IP_ENC_ID')

In [None]:
dateline(procedures_df, 'PROCEDURE_DATE')

In [None]:
catbar(procedures_df, 'PROCEDURE_DESCRIPTION', graph=False) ## Set graph=True for Bar graph

In [None]:
catbar(procedures_df, 'PROCEDURE_CODE', graph=False) ## Set graph=True for Bar graph

In [None]:
catbar(procedures_df, 'PROCEDURE_TYPE', graph=False) ## Set graph=True for Bar graph

## FLOWSHEET_VITALS

In [None]:
flowsheet_vitals_df = pd.read_csv('Data/Flowsheet_Vitals.csv')
flowsheet_vitals_df

In [None]:
missingness(flowsheet_vitals_df)

In [None]:
occurrence_stats(flowsheet_vitals_df, 'IP_ENC_ID')

In [None]:
dateline(flowsheet_vitals_df, 'VITAL_SIGN_TAKEN_TIME')

In [None]:
catbar(flowsheet_vitals_df, 'VITAL_SIGN_TYPE', graph=False) ## Set graph=True for Bar graph

In [None]:
flow_stats(flowsheet_vitals_df)

## LABS

In [17]:
labs_df = pd.read_csv('Data/Labs.csv')
labs_df

Unnamed: 0,IP_PATIENT_ID,IP_ENC_ID,IP_ORDER_PROC_ID,COMPONENT_ID,COMPONENT_NAME,SPECIMEN_TAKEN_TIME,PROCEDURE_ID,PROCEDURE_CODE,PROCEDURE_DESCRIPTION,ORDER_TIME,RESULT_TIME,RESULT_TEXT,RESULT_NUM,REFERENCE_UNIT,LOINC
0,IPPAT_101101099989818,IPENC_101102063273061,IPLAB_101101203937868,8000016,WHITE BLOOD CELL COUNT,11/17/2015 13:50,1698,LAB294,CBC & PLATELET CT,11/17/2015 13:41,11/17/2015 15:57,6.86,6.86,x10E3/uL,6690-2
1,IPPAT_101101099989818,IPENC_101102063273061,IPLAB_101101203937868,8000019,RED BLOOD CELL COUNT,11/17/2015 13:50,1698,LAB294,CBC & PLATELET CT,11/17/2015 13:41,11/17/2015 15:57,4.47,4.47,x10E6/uL,789-8
2,IPPAT_101101099989818,IPENC_101102063273061,IPLAB_101101203937868,8000020,HEMOGLOBIN,11/17/2015 13:50,1698,LAB294,CBC & PLATELET CT,11/17/2015 13:41,11/17/2015 15:57,13.6,13.60,g/dL,718-7
3,IPPAT_101101099989818,IPENC_101102063273061,IPLAB_101101203937868,8000021,HEMATOCRIT,11/17/2015 13:50,1698,LAB294,CBC & PLATELET CT,11/17/2015 13:41,11/17/2015 15:57,41.8,41.80,%,4544-3
4,IPPAT_101101099989818,IPENC_101102063273061,IPLAB_101101203937868,8000023,MEAN CORPUSCULAR VOLUME,11/17/2015 13:50,1698,LAB294,CBC & PLATELET CT,11/17/2015 13:41,11/17/2015 15:57,93.5,93.50,fL,787-2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28697,IPPAT_101101115185525,IPENC_101102217454482,IPLAB_101101579711885,3000380,UREA NITROGEN,12/02/2022 06:42,678,LAB15,BASIC METABOLIC PANEL,12/02/2022 06:29,12/02/2022 07:42,13,13.00,mg/dL,3094-0
28698,IPPAT_101101115185525,IPENC_101102217454482,IPLAB_101101579711885,3000381,CALCIUM,12/02/2022 06:42,678,LAB15,BASIC METABOLIC PANEL,12/02/2022 06:29,12/02/2022 07:42,9.5,9.50,mg/dL,17861-6
28699,IPPAT_101101115185525,IPENC_101102217454482,IPLAB_101101579711885,3006266,ESTIMATED GFR 2021 CKD-EPI,12/02/2022 06:42,678,LAB15,BASIC METABOLIC PANEL,12/02/2022 06:29,12/02/2022 07:42,81,81.00,mL/min/1.73m2,
28700,IPPAT_101101115185525,IPENC_101102217454482,IPLAB_101101579711886,3000121,PROTHROMBIN TIME,12/02/2022 06:42,1750,LAB320,PROTHROMBIN TIME PANEL,12/02/2022 06:29,12/02/2022 06:57,13.5,13.50,seconds,5902-2


In [26]:
labs_df[labs_df['RESULT_NUM'] == 9999999]['RESULT_TEXT'].value_counts().reset_index().rename(columns={'index':'LAB_CATEGORY','RESULT_TEXT':'COUNTS'})

Unnamed: 0,LAB_CATEGORY,COUNTS
0,Negative,751
1,See Comment,273
2,...,241
3,Reference Intervals,164
4,Nonreactive,148
5,NEGATIVE,48
6,Yellow,47
7,Performed,45
8,1+,38
9,PERFORMED,35


In [None]:
missingness(labs_df)

In [5]:
occurrence_stats(labs_df, 'IP_ORDER_PROC_ID')

Unnamed: 0,Patients w/ Occurrence,Occurrence Min,Occurrence Max,Occurrence Mean
0,103,1,6087,278.660194


In [None]:
dateline(labs_df, 'ORDER_TIME')

In [None]:
catbar(labs_df, 'PROCEDURE_CODE', graph=False) ## Set graph=True for Bar graph

In [None]:
catbar(labs_df, 'COMPONENT_NAME', graph=False) ## Set graph=True for Bar graph

In [None]:
catbar(labs_df, 'PROCEDURE_DESCRIPTION', graph=False) ## Set graph=True for Bar graph

In [None]:
lab_stats(labs_df, top=10)

## MEDICATIONS

In [6]:
medications_df = pd.read_csv('Data/Medications.csv')
medications_df

Unnamed: 0,IP_PATIENT_ID,IP_ENC_ID,IP_ORDER_MED_ID,ORDER_DATE,START_DATE,END_DATE,EPIC_MEDICATION_ID,EPIC_MEDICATION_NAME,MEDISPAN_GENERIC_NAME,MEDISPAN_CLASS_NAME,QUANTITY,REFILLS,SIG,TAKEN_TIME,FREQUENCY
0,IPPAT_101101099989818,IPENC_101102062817610,IPMED_101101201495208,10/26/2015 00:00,10/12/2015 00:00,12/15/2015 00:00,27694,OMEPRAZOLE 20 MG PO CPDR,Omeprazole Cap Delayed Release 20 MG,Ulcer Drugs,,2.0,,,
1,IPPAT_101101099989818,IPENC_101102062817610,IPMED_101101201495209,10/26/2015 00:00,,01/24/2016 00:00,127013,TRIAMCINOLONE ACETONIDE 55 MCG/ACT NA AERO,Triamcinolone Acetonide Nasal Aerosol Suspension 55 MCG/ACT,Decongestants,,,Spray 2 sprays by nasal route daily.,,Daily
2,IPPAT_101101099989818,IPENC_101102062817610,IPMED_101101201495210,10/26/2015 00:00,,,123655,MULTI-VITAMINS PO TABS,Multiple Vitamin Tab,Multivitamins,,,Take 1 tablet by mouth daily.,,Daily
3,IPPAT_101101099989818,IPENC_101102062817610,IPMED_101101201495211,10/26/2015 00:00,,12/06/2015 00:00,66603,ST JOHNS WORT PO,St Johns Wort,Medi-Span Reserved Or Unknown(95),,,Take by mouth.,,
4,IPPAT_101101099989818,IPENC_101102062817610,IPMED_101101201495212,10/26/2015 00:00,,,102,ACETAMINOPHEN 500 MG PO TABS,Acetaminophen Tab 500 MG,Analgesics-Nonnarcotic,,,Take 500 mg by mouth every six (6) hours as needed for Pain.,,Every 6 hours PRN
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12542,IPPAT_101101115185525,IPENC_101102215929781,IPMED_101101579711863,11/17/2022 00:00,,,27694,OMEPRAZOLE 20 MG PO CPDR,Omeprazole Cap Delayed Release 20 MG,Ulcer Drugs,,,Take 20 mg by mouth daily.,,Daily
12543,IPPAT_101101115185525,IPENC_101102215929781,IPMED_101101579711864,11/17/2022 00:00,,,8654,VITAMIN B-12 1000 MCG PO TABS,Cyanocobalamin Tab 1000 MCG,Hematopoietic Agents,,,"Take 1,000 mcg by mouth every thirty (30) days.",,Every 30 days
12544,IPPAT_101101115185525,IPENC_101102215929781,IPMED_101101579711865,11/17/2022 00:00,,,17837,ALBUTEROL SULFATE HFA 108 (90 BASE) MCG/ACT IN AERS,Albuterol Sulfate Inhal Aero 108 MCG/ACT (90MCG Base Equiv),Antiasthmatic,,,Inhale 2 puffs every six (6) hours as needed.,,Every 6 hours PRN
12545,IPPAT_101101115185525,IPENC_101102215929781,IPMED_101101579711866,11/17/2022 00:00,,,28645,ATORVASTATIN CALCIUM 80 MG PO TABS,Atorvastatin Calcium Tab 80 MG (Base Equivalent),Antihyperlipidemic,,,Take 80 mg by mouth daily.,,Daily


In [None]:
missingness(medications_df)

In [7]:
occurrence_stats(medications_df, 'IP_ORDER_MED_ID')

Unnamed: 0,Patients w/ Occurrence,Occurrence Min,Occurrence Max,Occurrence Mean
0,125,1,4038,100.376


In [None]:
dateline(medications_df, 'ORDER_DATE')

In [None]:
catbar(medications_df, 'EPIC_MEDICATION_NAME', graph=False) ## Set graph=True for Bar graph

In [None]:
catbar(medications_df, 'MEDISPAN_GENERIC_NAME', graph=False) ## Set graph=True for Bar graph

In [None]:
catbar(medications_df, 'MEDISPAN_CLASS_NAME', graph=False) ## Set graph=True for Bar graph