# Exploratory Data Analysis (EDA)

## Imports

In [71]:
###########
# Imports #
###########
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb

## Data Loading

In [72]:
data = pd.read_csv('data.csv', encoding='latin-1', sep=';')
pd.pandas.set_option('display.max_columns', None)

## Replace #BOÞ! in NRS_pain with NaN
data['NRS_pain'] = data[['NRS_pain']].replace('#BOÞ!', np.NaN)

print("Data contains ", data.shape[0]," rows and ", data.shape[1]," columns")

Data contains  1267  rows and  24  columns


## Data Types




### Group
**Description:** Group categorization.

**Type:** Categorical Nominal

### Sex
**Description:** Patient's sex.

**Categories:**
- 1 (Female)
- 2 (Male)

**Type:** Categorical Nominal

### Age 
**Description:** Patient's age.

**Type:** Numerical Discrete

### Patient number per hour
**Description:** Number of patients in the Emergency Department per hour.

**Type:** Numerical Discrete

### Arrival mode
**Description:** How patients arrive at the Emergency Department.

**Categories:**
- 1 (Walking)
- 2 (Public Ambulance)
- 3 (Private Vehicle)
- 4 (Private Ambulance)
- 5, 6, 7 (Other)

**Type:** Categorical Nominal 

### Injury 
**Description:** Whether the patient is injured or not.

**Categories:**
- 1 (No)
- 2 (Yes)

**Type:** Categorical Nominal

### Chief_complain
**Description:** The patient's complaint.

**Type:** Categorical Nominal 

### Mental
**Description:** The mental state of the patient.

**Categories:**
- 1 (Alert)
- 2 (Verbal Response)
- 3 (Pain Response)
- 4 (Unresponsive)

**Type:** Categorical Nominal

### Pain
**Description:** Whether the patient has pain.

**Categories:**
- 1 (Yes)
- 0 (No)

**Type:** Binary

### NRS_pain
**Description:** Nurse's assessment of pain for the patient.

**Type:** Numeric

### SBP
**Description:** Systolic Blood Pressure.

**Type:** Numeric

### DBP
**Description:** Diastolic Blood Pressure.

**Type:** Numeric

### HR
**Description:** Heart Rate.

**Type:** Numeric

### RR
**Description:** Respiratory Rate.

**Type:** Numeric

### BT
**Description:** Body Temperature.

**Type:** Numeric

In [73]:
data.dtypes

Group                        int64
Sex                          int64
Age                          int64
Patients number per hour     int64
Arrival mode                 int64
Injury                       int64
Chief_complain              object
Mental                       int64
Pain                         int64
NRS_pain                    object
SBP                         object
DBP                         object
HR                          object
RR                          object
BT                          object
Saturation                  object
KTAS_RN                      int64
Diagnosis in ED             object
Disposition                  int64
KTAS_expert                  int64
Error_group                  int64
Length of stay_min           int64
KTAS duration_min           object
mistriage                    int64
dtype: object

## Head

In [74]:
#################
# First 5 rows #
#################
data.head()

Unnamed: 0,Group,Sex,Age,Patients number per hour,Arrival mode,Injury,Chief_complain,Mental,Pain,NRS_pain,SBP,DBP,HR,RR,BT,Saturation,KTAS_RN,Diagnosis in ED,Disposition,KTAS_expert,Error_group,Length of stay_min,KTAS duration_min,mistriage
0,2,2,71,3,3,2,right ocular pain,1,1,2,160,100,84,18,36.6,100.0,2,Corneal abrasion,1,4,2,86,500,1
1,1,1,56,12,3,2,right forearm burn,1,1,2,137,75,60,20,36.5,,4,"Burn of hand, firts degree dorsum",1,5,4,64,395,1
2,2,1,68,8,2,2,"arm pain, Lt",1,1,2,130,80,102,20,36.6,98.0,4,"Fracture of surgical neck of humerus, closed",2,5,4,862,100,1
3,1,2,71,8,1,1,ascites tapping,1,1,3,139,94,88,20,36.5,,4,Alcoholic liver cirrhosis with ascites,1,5,6,108,983,1
4,1,2,58,4,3,1,"distension, abd",1,1,3,91,67,93,18,36.5,,4,Ascites,1,5,8,109,660,1


## Null Values

In [75]:
###############
# Null values #
###############

missing_values = data.isnull().sum()
missing_values = missing_values[missing_values > 0]

print("Columns with missing values:")
print(missing_values)




Columns with missing values:
NRS_pain           556
Saturation         688
Diagnosis in ED      2
dtype: int64


## Unique Values

In [76]:
for column in data.columns:
  print('Column ', column, ' has ', data[column].nunique(), ' unique values')

Column  Group  has  2  unique values
Column  Sex  has  2  unique values
Column  Age  has  81  unique values
Column  Patients number per hour  has  16  unique values
Column  Arrival mode  has  7  unique values
Column  Injury  has  2  unique values
Column  Chief_complain  has  417  unique values
Column  Mental  has  4  unique values
Column  Pain  has  2  unique values
Column  NRS_pain  has  10  unique values
Column  SBP  has  127  unique values
Column  DBP  has  83  unique values
Column  HR  has  94  unique values
Column  RR  has  11  unique values
Column  BT  has  46  unique values
Column  Saturation  has  22  unique values
Column  KTAS_RN  has  5  unique values
Column  Diagnosis in ED  has  583  unique values
Column  Disposition  has  7  unique values
Column  KTAS_expert  has  5  unique values
Column  Error_group  has  10  unique values
Column  Length of stay_min  has  716  unique values
Column  KTAS duration_min  has  392  unique values
Column  mistriage  has  3  unique values


# Data Preprocessing

## Feature Imputation

In [77]:
## Checking all missing values
missing_values = [column for column in data.columns if data[column].isnull().sum() > 0]
for column in missing_values:
  print('Column ', column, ' has ', data[column].isnull().sum(), ' missing values')


from sklearn.impute import SimpleImputer
imp = SimpleImputer(missing_values=np.NaN, strategy='median')
data['Saturation'] = imp.fit_transform(data[['Saturation']])




## Check if that is any null values
data.isnull().sum()

Column  NRS_pain  has  556  missing values
Column  Saturation  has  688  missing values
Column  Diagnosis in ED  has  2  missing values


ValueError: Cannot use median strategy with non-numeric data:
could not convert string to float: '??'

## Feature Encondig

## Feature Normalization

## Feature Engineering