# Data Ingestion

## Imports

In [344]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb

## Data Loading

In [345]:
data = pd.read_csv('data.csv', encoding='latin-1', sep=';')
pd.pandas.set_option('display.max_columns', None)

print("Data contains ", data.shape[0]," rows and ", data.shape[1]," columns")

Data contains  1267  rows and  24  columns


## Head

In [346]:
data.head()

Unnamed: 0,Group,Sex,Age,Patients number per hour,Arrival mode,Injury,Chief_complain,Mental,Pain,NRS_pain,SBP,DBP,HR,RR,BT,Saturation,KTAS_RN,Diagnosis in ED,Disposition,KTAS_expert,Error_group,Length of stay_min,KTAS duration_min,mistriage
0,2,2,71,3,3,2,right ocular pain,1,1,2,160,100,84,18,36.6,100.0,2,Corneal abrasion,1,4,2,86,500,1
1,1,1,56,12,3,2,right forearm burn,1,1,2,137,75,60,20,36.5,,4,"Burn of hand, firts degree dorsum",1,5,4,64,395,1
2,2,1,68,8,2,2,"arm pain, Lt",1,1,2,130,80,102,20,36.6,98.0,4,"Fracture of surgical neck of humerus, closed",2,5,4,862,100,1
3,1,2,71,8,1,1,ascites tapping,1,1,3,139,94,88,20,36.5,,4,Alcoholic liver cirrhosis with ascites,1,5,6,108,983,1
4,1,2,58,4,3,1,"distension, abd",1,1,3,91,67,93,18,36.5,,4,Ascites,1,5,8,109,660,1


# Exploratory Data Analysis (EDA)

## Data Types




### Group
**Description:** Group categorization.

**Type:** Categorical Nominal

### Sex
**Description:** Patient's sex.

**Categories:**
- 1 (Female)
- 2 (Male)

**Type:** Categorical Nominal

### Age 
**Description:** Patient's age.

**Type:** Numerical Discrete

### Patient number per hour
**Description:** Number of patients in the Emergency Department per hour.

**Type:** Numerical Discrete

### Arrival mode
**Description:** How patients arrive at the Emergency Department.

**Categories:**
- 1 (Walking)
- 2 (Public Ambulance)
- 3 (Private Vehicle)
- 4 (Private Ambulance)
- 5, 6, 7 (Other)

**Type:** Categorical Nominal 

### Injury 
**Description:** Whether the patient is injured or not.

**Categories:**
- 1 (No)
- 2 (Yes)

**Type:** Categorical Nominal

### Chief_complain
**Description:** The patient's complaint.

**Type:** Categorical Nominal 

### Mental
**Description:** The mental state of the patient.

**Categories:**
- 1 (Alert)
- 2 (Verbal Response)
- 3 (Pain Response)
- 4 (Unresponsive)

**Type:** Categorical Nominal

### Pain
**Description:** Whether the patient has pain.

**Categories:**
- 1 (Yes)
- 0 (No)

**Type:** Binary

### NRS_pain
**Description:** Nurse's assessment of pain for the patient.

**Type:** Numeric

### SBP
**Description:** Systolic Blood Pressure.

**Type:** Numeric

### DBP
**Description:** Diastolic Blood Pressure.

**Type:** Numeric

### HR
**Description:** Heart Rate.

**Type:** Numeric

### RR
**Description:** Respiratory Rate.

**Type:** Numeric

### BT
**Description:** Body Temperature.

**Type:** Numeric

In [347]:
data.dtypes

Group                        int64
Sex                          int64
Age                          int64
Patients number per hour     int64
Arrival mode                 int64
Injury                       int64
Chief_complain              object
Mental                       int64
Pain                         int64
NRS_pain                    object
SBP                         object
DBP                         object
HR                          object
RR                          object
BT                          object
Saturation                  object
KTAS_RN                      int64
Diagnosis in ED             object
Disposition                  int64
KTAS_expert                  int64
Error_group                  int64
Length of stay_min           int64
KTAS duration_min           object
mistriage                    int64
dtype: object

In [348]:
data.describe().T


Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Group,1267.0,1.456985,0.498343,1.0,1.0,1.0,2.0,2.0
Sex,1267.0,1.521705,0.499726,1.0,1.0,2.0,2.0,2.0
Age,1267.0,54.423836,19.725033,16.0,37.0,57.0,71.0,96.0
Patients number per hour,1267.0,7.519337,3.160563,1.0,5.0,7.0,10.0,17.0
Arrival mode,1267.0,2.820837,0.807904,1.0,2.0,3.0,3.0,7.0
Injury,1267.0,1.192581,0.394482,1.0,1.0,1.0,1.0,2.0
Mental,1267.0,1.105762,0.447768,1.0,1.0,1.0,1.0,4.0
Pain,1267.0,0.563536,0.496143,0.0,0.0,1.0,1.0,1.0
KTAS_RN,1267.0,3.335438,0.885391,1.0,3.0,3.0,4.0,5.0
Disposition,1267.0,1.609313,1.157983,1.0,1.0,1.0,2.0,7.0


## Null Values

In [349]:
missing_values = data.isnull().sum()
missing_values = missing_values[missing_values > 0]

print("Columns with missing values:")
print(missing_values)




Columns with missing values:
Saturation         688
Diagnosis in ED      2
dtype: int64


## Unique Values

In [350]:
for column in data.columns:
  print('Column ', column, ' has ', data[column].nunique(), ' unique values')

Column  Group  has  2  unique values
Column  Sex  has  2  unique values
Column  Age  has  81  unique values
Column  Patients number per hour  has  16  unique values
Column  Arrival mode  has  7  unique values
Column  Injury  has  2  unique values
Column  Chief_complain  has  417  unique values
Column  Mental  has  4  unique values
Column  Pain  has  2  unique values
Column  NRS_pain  has  11  unique values
Column  SBP  has  127  unique values
Column  DBP  has  83  unique values
Column  HR  has  94  unique values
Column  RR  has  11  unique values
Column  BT  has  46  unique values
Column  Saturation  has  22  unique values
Column  KTAS_RN  has  5  unique values
Column  Diagnosis in ED  has  583  unique values
Column  Disposition  has  7  unique values
Column  KTAS_expert  has  5  unique values
Column  Error_group  has  10  unique values
Column  Length of stay_min  has  716  unique values
Column  KTAS duration_min  has  392  unique values
Column  mistriage  has  3  unique values


# Data Preprocessing

## Feature Engineering

### Replacing "??" and "#BO�!" for null values

In [351]:
cols_to_clean = ["SBP", "DBP", "HR", "RR", "BT", "Saturation"]
data[cols_to_clean] = data[cols_to_clean].replace("??", np.NaN)

## count how many "??" in selected columns after cleaning
print("\nAfter cleaning:")
for col in cols_to_clean:
    print(f"{col}: {data[col].value_counts().get('??', 0)}")


After cleaning:
SBP: 0
DBP: 0
HR: 0
RR: 0
BT: 0
Saturation: 0


### Replacing binary collums

In [352]:
data.rename(columns={'Sex': 'Female'}, inplace=True)
data["Female"] = data["Female"].replace(2,0)

data.rename(columns={'Injury': 'Injured'}, inplace=True)
data["Injured"] = data["Injured"].replace({1: 0, 2: 1})

### Replacing the numbers by their categorical value

In [353]:
group_map = ['Local ED 3th Degree', 'Regional ED 4tg Degree']
arrival_mode_map = ['Walking', 'Public Ambulance', 'Private Vehicle', 'Private Ambulance', 'Public Transport', 'Wheelchair', 'Other']
mental_map = ['Alert', 'Verbal Response', 'Pain Response', 'Unresponsive']
disposition_map = ['Discharge', 'Admission to Ward', 'Admission to ICU', 'Discharge', 'Transfer', 'Death', 'Surgery']
error_group_map = ['Vital Sign', 'Physical Exam', 'Psychatric', 'Pain', 'Mental', 'Underlying Disease', 'Medical Records of other ED', 'On set', 'Other']
mistriage_map = ['Correct','Over Triage', 'Under Triage']


data['Group'] = data['Group'].replace([1,2], group_map)
data['Arrival mode'] = data['Arrival mode'].replace([1,2,3,4,5,6,7], arrival_mode_map)
data['Mental'] = data['Mental'].replace([1,2,3,4], mental_map)
data['Disposition'] = data['Disposition'].replace([1,2,3,4,5,6,7], disposition_map)
data['Error_group'] = data['Error_group'].replace([1,2,3,4,5,6,7,8,9], error_group_map)
data['mistriage'] = data['mistriage'].replace([0,1,2], mistriage_map)

## Feature Imputation

In [354]:
## Checking all missing values
missing_values = [column for column in data.columns if data[column].isnull().sum() > 0]
for column in missing_values:
  print('Column ', column, ' has ', data[column].isnull().sum(), ' missing values')


from sklearn.impute import SimpleImputer
imp = SimpleImputer(missing_values=np.NaN, strategy='median')
data['Saturation'] = imp.fit_transform(data[['Saturation']])




## Check if that is any null values
data.isnull().sum()

Column  SBP  has  25  missing values
Column  DBP  has  29  missing values
Column  HR  has  20  missing values
Column  RR  has  22  missing values
Column  BT  has  18  missing values
Column  Saturation  has  697  missing values
Column  Diagnosis in ED  has  2  missing values


Group                        0
Female                       0
Age                          0
Patients number per hour     0
Arrival mode                 0
Injured                      0
Chief_complain               0
Mental                       0
Pain                         0
NRS_pain                     0
SBP                         25
DBP                         29
HR                          20
RR                          22
BT                          18
Saturation                   0
KTAS_RN                      0
Diagnosis in ED              2
Disposition                  0
KTAS_expert                  0
Error_group                  0
Length of stay_min           0
KTAS duration_min            0
mistriage                    0
dtype: int64

## Feature Encondig

In [360]:
one_hot_encoded_data = pd.get_dummies(data, columns=['Group','Mental','Arrival mode','Disposition','Error_group','mistriage'],dtype='int64')
one_hot_encoded_data.head()

Unnamed: 0,Female,Age,Patients number per hour,Injured,Chief_complain,Pain,NRS_pain,SBP,DBP,HR,RR,BT,Saturation,KTAS_RN,Diagnosis in ED,KTAS_expert,Length of stay_min,KTAS duration_min,Group_Local ED 3th Degree,Group_Regional ED 4tg Degree,Mental_Alert,Mental_Pain Response,Mental_Unresponsive,Mental_Verbal Response,Arrival mode_Other,Arrival mode_Private Ambulance,Arrival mode_Private Vehicle,Arrival mode_Public Ambulance,Arrival mode_Public Transport,Arrival mode_Walking,Arrival mode_Wheelchair,Disposition_Admission to ICU,Disposition_Admission to Ward,Disposition_Death,Disposition_Discharge,Disposition_Surgery,Disposition_Transfer,Error_group_0,Error_group_Medical Records of other ED,Error_group_Mental,Error_group_On set,Error_group_Other,Error_group_Pain,Error_group_Physical Exam,Error_group_Psychatric,Error_group_Underlying Disease,Error_group_Vital Sign,mistriage_Correct,mistriage_Over Triage,mistriage_Under Triage
0,0,71,3,1,right ocular pain,1,2,160,100,84,18,36.6,100.0,2,Corneal abrasion,4,86,500,0,1,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0
1,1,56,12,1,right forearm burn,1,2,137,75,60,20,36.5,98.0,4,"Burn of hand, firts degree dorsum",5,64,395,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0
2,1,68,8,1,"arm pain, Lt",1,2,130,80,102,20,36.6,98.0,4,"Fracture of surgical neck of humerus, closed",5,862,100,0,1,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0
3,0,71,8,0,ascites tapping,1,3,139,94,88,20,36.5,98.0,4,Alcoholic liver cirrhosis with ascites,5,108,983,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0
4,0,58,4,0,"distension, abd",1,3,91,67,93,18,36.5,98.0,4,Ascites,5,109,660,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0


## Feature Normalization