## Determine Data Subset for Focus Question 2

The purpose of this analysis is to identify the data subset necessary to explore possible analytic solutions for **Focus Question 2**. This analysis does not create the dataset needed, it is simply an attempt to discover what dataset joins and columns will be used for the actual analysis.

Entity Relationship Diagram (ERD) model displays the inter-relationships between the Patients, Procedures, and Medicines datasets. 

Datasets Assumptions:
1. Datasets provided where build as views and queried from original base tables
2. PatientId and FRDPersonnellID columns are unique to each patient and personnel in base tables but are not a primary key in each provided dataset and may appear more than one time in any of the datasets
3. PatientId and FRDPersonnellID in the Procedures and Medicines dataset and will be treated as foreign keys to the Patients dataset, even though in reality they would not be if we were using all the base tables in the data warehouse

![](img/ERD_Model.jpg)

Restating Focus Question 2 for user story:
Determine Data Subset regarding exploring what, if any, relationship there is between 
1. EMS procedures performed by provider with calulated tenure to an individual patient and 
2. medications given to an individual patient by a provider with calulated tenure.

Dataset Observations:
- To retrieve EMS procedures performed, will require most or all rows from the Procedures dataset.
- To retrieve EMS medications given, will require rows most or all from the Medications dataset.
- To complete the view required, the rows from the Procedures and Medications dataset will need to be joined with the Patients dataset.

At a minimum, to begin exploring Focus Question 2 the following columns are needed for
- EMS procedures performed by provider with calulated tenure to an individual patient
 - Patients.PatientId
 - Patients.FRDPersonnelID
 - Patients.PatientOutcome
 - Patients.DispatchTime
 - Patients.FRDPersonnelStartTime
 - Calulated Value for months and years of tenre of Provider
 - Procedures.PatientId
 - Procedures.Procedure_Performed_Code
 - Procedures.Procedure_Performed_Description
 - Procedures.FRDPersonnelID

- Medications given to an individual patient by a provider with calulated tenure
 - Patients.PatientId
 - Patients.FRDPersonnelID
 - Patients.PatientOutcome
 - Patients.DispatchTime
 - Patients.FRDPersonnelStartTime
 - Calulated Value for months and years of tenre of Provider
 - Medications.PatientId
 - Medications.Medication_Given_RXCUI_Code
 - Medications.Medication_Given_Description
 - Medications.FRDPersonnelID

In [1]:
# Import libraries
import pandas as pd
import numpy as np

In [2]:
# Reads all sheets into dictionaries. Only importing first thousand rows because it's all that's 
# needed to get example from original provided unsorted datasets.

# Note: for provided dataset 20210214-ems-raw-v03.xlsx, on spreadsheet on Medications manually updated
# Personnel_Performer_ID_Internal to FRDPersonnelID

all_dfs = pd.read_excel(r'./data/20210214-ems-raw-v03.xlsx',
                        sheet_name=None, 
                        nrows=1000,
                        na_values=['NA'])

In [3]:
# Display dictionary keys created
all_dfs.keys()

dict_keys(['Patients', 'Procedures', 'Medications'])

In [4]:
# Display Patients dataset row and column count
all_dfs['Patients'].shape

(1000, 12)

In [5]:
# Display Procedures dataset row and column count
all_dfs['Procedures'].shape

(1000, 4)

In [6]:
# Display Medications dataset row and column count
all_dfs['Medications'].shape

(1000, 4)

In [7]:
# Select single row for example from Patients dataset where PatientId equals 479862 and display
df_pat_ex = all_dfs['Patients'].loc[(all_dfs['Patients']['PatientId'] == 479862)]
df_pat_ex

Unnamed: 0,PatientId,FRDPersonnelID,Shift,UnitId,FireStation,Battalion,PatientOutcome,PatientGender,CrewMemberRoles,DispatchTime,FRDPersonnelGender,FRDPersonnelStartDate
9,479862,F8D4C99E-9E01-E211-B5F5-78E7D18CFD3C,A - Shift,M437,37,405,Treated & Transported,Female,"Other Patient Caregiver-At Scene,Other Patient...",2018-01-01 00:44:31,Female,2006-12-11
10,479862,32D8C99E-9E01-E211-B5F5-78E7D18CFD3C,A - Shift,M437,37,405,Treated & Transported,Female,"Other Patient Caregiver-At Scene,Other Patient...",2018-01-01 00:44:31,Male,2006-12-11
11,479862,1D18E8FC-EE92-E211-A596-78E7D18C3D20,A - Shift,M437,37,405,Treated & Transported,Female,"Primary Patient Caregiver-At Scene,Primary Pat...",2018-01-01 00:44:31,Female,2012-09-24
12,479862,CED8C99E-9E01-E211-B5F5-78E7D18CFD3C,A - Shift,M437,37,405,Treated & Transported,Female,"Driver-Response,Driver-Transport",2018-01-01 00:44:31,Male,2008-03-03


In [8]:
# Select single row for example from Procedures dataset where PatientId equals 479862 and display
df_proc_ex = all_dfs['Procedures'].loc[(all_dfs['Procedures']['PatientId'] == 479862)]
df_proc_ex

Unnamed: 0,PatientId,Procedure_Performed_Code,Procedure_Performed_Description,FRDPersonnelID
953,479862,392230005,IV Start - Extremity Vein (arm or leg),32D8C99E-9E01-E211-B5F5-78E7D18CFD3C


In [9]:
# Select single row for example from Medications dataset where PatientId equals 479862 and display
df_med_ex = all_dfs['Medications'].loc[(all_dfs['Medications']['PatientId'] == 479862)]
df_med_ex

Unnamed: 0,PatientId,Medication_Given_RXCUI_Code,Medication_Given_Description,FRDPersonnelID
419,479862,7806,Oxygen,1D18E8FC-EE92-E211-A596-78E7D18C3D20


In [10]:
# Inner join on Patients and Procedures example and display results
df_proc_ex = df_pat_ex.merge(df_proc_ex, 
                        on=('PatientId','FRDPersonnelID'), 
                        how='inner')
df_proc_ex

Unnamed: 0,PatientId,FRDPersonnelID,Shift,UnitId,FireStation,Battalion,PatientOutcome,PatientGender,CrewMemberRoles,DispatchTime,FRDPersonnelGender,FRDPersonnelStartDate,Procedure_Performed_Code,Procedure_Performed_Description
0,479862,32D8C99E-9E01-E211-B5F5-78E7D18CFD3C,A - Shift,M437,37,405,Treated & Transported,Female,"Other Patient Caregiver-At Scene,Other Patient...",2018-01-01 00:44:31,Male,2006-12-11,392230005,IV Start - Extremity Vein (arm or leg)


In [11]:
# Inner join on Patients and Medications example and display results
df_med_ex = df_pat_ex.merge(df_med_ex, 
                        on=('PatientId','FRDPersonnelID'), 
                        how='inner')
df_med_ex

Unnamed: 0,PatientId,FRDPersonnelID,Shift,UnitId,FireStation,Battalion,PatientOutcome,PatientGender,CrewMemberRoles,DispatchTime,FRDPersonnelGender,FRDPersonnelStartDate,Medication_Given_RXCUI_Code,Medication_Given_Description
0,479862,1D18E8FC-EE92-E211-A596-78E7D18C3D20,A - Shift,M437,37,405,Treated & Transported,Female,"Primary Patient Caregiver-At Scene,Primary Pat...",2018-01-01 00:44:31,Female,2012-09-24,7806,Oxygen


**Conclusion:** For best results, recommend creating at least two derivative datasets. One merging Patients with Procedures and a second merging Patients with Medications to access gender of both patients and providers. Could pare-down size of datasets by removing a few unnecessary columns, but there are not that many columns in any of the datasets.