<a href="https://colab.research.google.com/github/annabavaresco/pstrentino/blob/main/Emergency_room_Trentino.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Machine learning model

The model we decided to use in order to predict wait time in the emergency rooms is a sequential neural networks. Since wait times are supposed to vary based on the triage, we decided to build a model for each color. 

What we are going to do in this notebook is retrieving historical data about wait times which is stored inside a mySQL database hosted by Amazon, converting it into 5 pandas dataframes (one for each triage color) and building the neural networks. 

In [None]:
import pandas as pd
import datetime as dt
import numpy as np
import random
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from tensorflow import keras
from datetime import datetime, timedelta

In [None]:
pip install mysql-connector-python

Collecting mysql-connector-python
  Downloading mysql_connector_python-8.0.26-cp37-cp37m-manylinux1_x86_64.whl (30.9 MB)
[K     |████████████████████████████████| 30.9 MB 79 kB/s 
Installing collected packages: mysql-connector-python
Successfully installed mysql-connector-python-8.0.26


In [None]:
import mysql.connector

In [None]:
def retrieve_data():
  '''
    Creates a connection with the db hosting our data and converts it into a pandas dataframe.
  '''
  connection = mysql.connector.connect(
      host = 'emergencyroom.ci8zphg60wmc.us-east-2.rds.amazonaws.com',
      port =  3306,
      user = 'admin',
      database = 'prova',
      password = 'emr00mtr3nt036'
    )

  connection.autocommit = True
  data = pd.read_sql('SELECT * FROM prova.ER_PATIENTS', con=connection)

  connection.close()

  return data


Let's have a look at the data retrieved from the database. Here is the meaning of each column:


*   color: the color associated with the triage
*   hostpital: the color identifying the specific emergency room. There are 11 different emengency rooms in Trentino
*   start: timestamp referring to the moment when the patient entered the waiting room
*   end: timestamp referring to the moment when the patient left the waiting room
*   duration: timestamp showing how long the patient has been waiting
*   others: number of parients with the same triage color present at the moment of arrival
*   more_severe: patient with higher level of priority present at the time of arrival
*   less_severe: patients with lower level of priority present at the moment of arrival


In [None]:
df = retrieve_data()
df.head()

Unnamed: 0,colore,ospedale,inizio,fine,durata,altri,più_gravi,meno_gravi
0,bianco,001-PS-PSC,2021-05-05 07:30:00,2021-05-05 08:40:00,01:10:00,0,0,0
1,verde,001-PS-PSC,2021-05-05 08:40:00,2021-05-05 08:50:00,00:10:00,0,0,0
2,verde,001-PS-PSC,2021-05-05 09:00:00,2021-05-05 09:10:00,00:10:00,0,0,0
3,bianco,001-PS-PSC,2021-05-05 09:10:00,2021-05-05 09:40:00,00:30:00,0,0,0
4,bianco,001-PS-PSC,2021-05-05 10:30:00,2021-05-05 11:30:00,01:00:00,0,0,0


The following fuctions are meant for data preparation. Here is the list of chahges which will be performed on the dataset shown above:

1. Splitting into 5 different dataframes based on triage color
2. Removing observations for which the value of duration can be considered as an outlier depending on the triage color
3. Converting values of the "hospital" column into integers
4. Converting values of the "duration" column into integers representing the number of minutes
5. Removing the "color" column as the observations have already been put into different dataframes according to the triage color. 

In [None]:
def convert_to_mins(last: str):
  '''
    Converts a sting in the format hh:mm:ss into an integer representing the number of minutes.
   '''
   
  l = last.split(':')
  l = [int(s) for s in l[:2]]
  mins = l[1]
  if l[0] != 0:
    hrs = l[0] * 60
    mins += hrs 
        
  return mins

def process_df(data):
  '''
    Takes as input the data retrieved from the database and outputs 5 different datasets with processed data.
  '''
  data.columns = ['colore', 'ospedale', 'inizio', 'fine', 'durata', 'altri', 'più_gravi', 'meno_gravi']
  hosp_dict = {'001-PS-PSC': 1,
                '001-PS-PSG': 2,
                    '001-PS-PSO': 3,
                    '001-PS-PS': 4,
                    '001-PS-PSP': 5,
                    '006-PS-PS': 6,
                    '007-PS-PS': 7,
                    '010-PS-PS': 8,
                    '004-PS-PS': 9,
                    '014-PS-PS': 10,
                    '005-PS-PS': 11
                }

  data['osp_code'] = data['ospedale'].apply(lambda x: hosp_dict[x])
  data['weekday'] = data['inizio'].apply(lambda x: x.weekday())
  data['durata'] = data['durata'].apply(lambda x: convert_to_mins(str(x)))
  timeslot_dict = {}
  ind = 1
  for n in range(24):
    if n < 10:
      hrs =  '0' + str(n) 
    else:
      hrs = str(n)
    for i in range(6):
      mins = str(i) + '0'
      s = hrs + ':' + mins + ':' + '00'
      timeslot_dict[s] = ind
      ind += 1

  data['timeslot'] = data['inizio'].apply(lambda x: timeslot_dict[x.strftime("%H:%M:%S")])
  data = data.loc[:, ['colore', 'durata', 'altri', 'più_gravi', 'weekday', 'timeslot', 'osp_code']]
  ndf_bianco = data.loc[(data['colore'] == "bianco") & (data['durata']<800),:]
  ndf_verde = data.loc[(data['colore'] == "verde") & (data['durata']<120),:]
  ndf_azzurro = data.loc[(data['colore'] == "azzurro") & (data['durata']<80),:]
  ndf_arancio = data.loc[(data['colore'] == "arancio") & (data['durata']<30),:]
  ndf_rosso = data.loc[(data['colore'] == "rosso")& (data['durata']<5),:]
  
  return ndf_bianco, ndf_verde, ndf_azzurro, ndf_arancio, ndf_rosso

These are the datasets which are going to be used to build the models.

In [None]:
df_white, df_green, df_blue, df_orange, df_red = process_df(df)

## White model

In [None]:
workingCopyDataset = df_white

X = workingCopyDataset.loc[:,['weekday', 'timeslot', 'osp_code', 'altri', 'più_gravi']]
y = workingCopyDataset['durata']


trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.1, random_state=0)

inputVariables = 5
model = keras.models.Sequential()
model.add(keras.layers.Dense(12, input_dim=inputVariables, kernel_initializer='normal', activation='relu'))
model.add(keras.layers.Dense(8, activation='relu'))
model.add(keras.layers.Dense(1, activation='linear'))
model.summary()

model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])

numberOfEpochs = 500
batchSize = 256
history = model.fit(trainX, trainy, epochs=numberOfEpochs, batch_size=batchSize, 
                    verbose=False, validation_split=0.2)

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_3 (Dense)              (None, 12)                72        
_________________________________________________________________
dense_4 (Dense)              (None, 8)                 104       
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 9         
Total params: 185
Trainable params: 185
Non-trainable params: 0
_________________________________________________________________


In [None]:
testy_pred = model.predict(testX)
myMse = mean_squared_error(testy, testy_pred)
print(f'The mean squared error WHITE I get with the neural network is {myMse} minutes.')
#model.save("models/model_WHITE")

The mean squared error WHITE I get with the neural network is 397.2164665087391 minutes.


## Green model

In [None]:
workingCopyDataset = df_green

X = workingCopyDataset.loc[:,['weekday', 'timeslot', 'osp_code', 'altri', 'più_gravi']]
y = workingCopyDataset['durata']


trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.1, random_state=0)

inputVariables = 5
model = keras.models.Sequential()
model.add(keras.layers.Dense(12, input_dim=inputVariables, kernel_initializer='normal', activation='relu'))
model.add(keras.layers.Dense(8, activation='relu'))
model.add(keras.layers.Dense(1, activation='linear'))
model.summary()

model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])

numberOfEpochs = 500
batchSize = 256
history = model.fit(trainX, trainy, epochs=numberOfEpochs, batch_size=batchSize, 
                    verbose=False, validation_split=0.2)

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_6 (Dense)              (None, 12)                72        
_________________________________________________________________
dense_7 (Dense)              (None, 8)                 104       
_________________________________________________________________
dense_8 (Dense)              (None, 1)                 9         
Total params: 185
Trainable params: 185
Non-trainable params: 0
_________________________________________________________________


In [None]:
testy_pred = model.predict(testX)
myMse = mean_squared_error(testy, testy_pred)
print(f'The mean squared error GREEN I get with the neural network is {myMse} minutes.')
#model.save("models/model_GREEN")

## Blue model

In [None]:
workingCopyDataset = df_blue

X = workingCopyDataset.loc[:,['weekday', 'timeslot', 'osp_code', 'altri', 'più_gravi']]
y = workingCopyDataset['durata']


trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.1, random_state=0)

inputVariables = 5
model = keras.models.Sequential()
model.add(keras.layers.Dense(12, input_dim=inputVariables, kernel_initializer='normal', activation='relu'))
model.add(keras.layers.Dense(8, activation='relu'))
model.add(keras.layers.Dense(1, activation='linear'))
model.summary()

model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])

numberOfEpochs = 500
batchSize = 256
history = model.fit(trainX, trainy, epochs=numberOfEpochs, batch_size=batchSize, 
                    verbose=False, validation_split=0.2)

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_9 (Dense)              (None, 12)                72        
_________________________________________________________________
dense_10 (Dense)             (None, 8)                 104       
_________________________________________________________________
dense_11 (Dense)             (None, 1)                 9         
Total params: 185
Trainable params: 185
Non-trainable params: 0
_________________________________________________________________


In [None]:
testy_pred = model.predict(testX)
myMse = mean_squared_error(testy, testy_pred)
print(f'The mean squared error BLUE I get with the neural network is {myMse} minutes.')
#model.save("models/model_BLUE")

The mean squared error BLUE I get with the neural network is 1214.0484641761113 minutes.


## Orange model

In [None]:
workingCopyDataset = df_orange

X = workingCopyDataset.loc[:,['weekday', 'timeslot', 'osp_code', 'altri', 'più_gravi']]
y = workingCopyDataset['durata']


trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.1, random_state=0)

inputVariables = 5
model = keras.models.Sequential()
model.add(keras.layers.Dense(12, input_dim=inputVariables, kernel_initializer='normal', activation='relu'))
model.add(keras.layers.Dense(8, activation='relu'))
model.add(keras.layers.Dense(1, activation='linear'))
model.summary()

model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])

numberOfEpochs = 500
batchSize = 256
history = model.fit(trainX, trainy, epochs=numberOfEpochs, batch_size=batchSize, 
                    verbose=False, validation_split=0.2)

In [None]:
testy_pred = model.predict(testX)
myMse = mean_squared_error(testy, testy_pred)
print(f'The mean squared error ORANGE I get with the neural network is {myMse} minutes.')
#model.save("models/model_ORANGE")

## Red model

In [None]:
workingCopyDataset = df_red

X = workingCopyDataset.loc[:,['weekday', 'timeslot', 'osp_code', 'altri', 'più_gravi']]
y = workingCopyDataset['durata']


trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.1, random_state=0)

inputVariables = 5
model = keras.models.Sequential()
model.add(keras.layers.Dense(12, input_dim=inputVariables, kernel_initializer='normal', activation='relu'))
model.add(keras.layers.Dense(8, activation='relu'))
model.add(keras.layers.Dense(1, activation='linear'))
model.summary()

model.compile(loss='mse', optimizer='adam', metrics=['accuracy'])

numberOfEpochs = 500
batchSize = 256
history = model.fit(trainX, trainy, epochs=numberOfEpochs, batch_size=batchSize, 
                    verbose=False, validation_split=0.2)

In [None]:
testy_pred = model.predict(testX)
myMse = mean_squared_error(testy, testy_pred)
print(f'The mean squared error RED I get with the neural network is {myMse} minutes.')
#model.save("models/model_RED")