# Data Visualization - Project 2
### Student names and IDs: 
- Pedram Abdolahi Darestani - 202383919
- Javier Landin Cabrera - 202380154

## Project description
Tasks:
1- Interactive dashboard
2- A 5-10 min presentation explaining how to use the dashboard and highlight any interesting stories within the data.

Targets:
1- Help them understand their processes and patients.
2- Allow them to explore the data.
3- Help them explore and find interesting trends or correlations in the data.

Constraints:
1- Python
2- use voila or dash
3- any visualization package
4- 4 unique visualizations
5- at least 4 (or more) interactive controls

Evaluation topics:
1- Dashboard
2- the visualizations
3- the code to create them
4- the video presentation

## Importing libraries

In [2]:
import numpy as np
import pandas as pd 
import seaborn as sns
import plotly.express as px
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import dash_mantine_components as dmc # this seems to be the simplest way to use dash
import dash_bootstrap_components as dbc
from dash import Dash, html, dash_table, dcc, callback, Output, Input

## Reading and preprocessing data

In [3]:
data = pd.read_csv('data.csv')
data = data[data.columns.drop("Patient Id")]
data[data.columns.drop(['Gender', 'Comorbidities', 'Age'])] = data[data.columns.drop(['Gender', 'Comorbidities', 'Age'])].apply(pd.to_datetime)

## Data exploration 

General dataset inspection
- record count = 5147
- feature count = 19 (one is ID, it can be discarded since it is equal to np.arange(5147)
- features: 'Patient Id', 'Age', 'Gender', 'Comorbidities', 'Emergency Dept Time', 'Admission Time', 'Discharge Time', 'CT Scan Time', 'TPA Time', 'ICU Arrival Time', 'ICU Checkout Time', 'Neurology Ward Arrival Time', 'Occupational Therapist Visit', 'Speech Pathologist Visit', 'Physiotherapist Visit', 'Dietitian Visit', 'Social Worker Visit', 'Cardiologist Visit', 'Neurologist Visit'
- Feature data types:
    - Datetime: 15 features
    - Bool: 2 features (Gender and Comorbidities)
    - Int: 1 (Age)

- Some features have missing values. These features have the Datetime format and their missing values can mean that the patient did not make use of the respective feature (e.g., visit the said doctor, use the treatment, test, ...)
- features with missing values (9):  count and (% of all records)
    - TPA Time: 4578 (89%)
    - ICU Arrival Time: 2936 (57%)
    - ICU Checkout Time: 2836 (55%)
    - Occupational Therapist Visit: 1116 (22%)
    - Speech Pathologist Visit: 1063 (21%)
    - Physiotherapist Visit: 529 (10%)
    - Dietitian Visit: 3980 (77%)
    - Social Worker Visit: 930 (18%)
    - Cardiologist Visit: 4883 (95%)

Feature inspection
- Age
    - mean: 68.6
    - std: 14.1
    - min: 27
    - max: 98

- Percent of patients who used/had the features:
    - TPA Time: 11%
    - ICU Arrival Time: 43%
    - ICU Checkout Time: 45%
    - Occupational Therapist Visit: 78%
    - Speech Pathologist Visit: 79%
    - Physiotherapist Visit: 90%
    - Dietitian Visit: 23%
    - Social Worker Visit: 82%
    - Cardiologist Visit: 5%
    - Emergency Dept Time: 
    - Admission Time: 100%
    - Discharge Time: 100%
    - CT Scan Time: 100%
    - Neurology Ward Arrival Time: 100%
    - Neurologist Visit: 100%

In [4]:
ref = data['Emergency Dept Time']
# All the columns below have the timedelta64[ns] dtype and will be converted to hours later on
data['admission delay'] = data['Admission Time'] - data['Emergency Dept Time']
data['stay duration'] = data['Discharge Time'] - data['Admission Time']
data['ct delay'] = data['CT Scan Time'] - ref
data['tpa delay'] = data['TPA Time'] - ref
data['icu delay'] = data['ICU Arrival Time'] - ref
data['icu duration'] = data['ICU Checkout Time']- data['ICU Arrival Time']
data['nward delay'] = data['Neurology Ward Arrival Time'] - ref
data['neurologist delay'] = data['Neurologist Visit'] - ref
data['otherapist delay'] = data['Occupational Therapist Visit'] - ref
data['spathologist delay'] = data['Speech Pathologist Visit'] - ref
data['physio delay'] = data['Physiotherapist Visit'] - ref
data['dietitian delay'] = data['Dietitian Visit'] - ref
data['sworker delay'] = data['Social Worker Visit'] - ref
data['cardio delay'] = data['Cardiologist Visit'] - ref

# TODO: check to see if the time difference compared to the ADMISSION TIME is of significance
# converting time deltas to hours (float)
tdelta_cols = data.select_dtypes(include='timedelta64[ns]').columns
data[tdelta_cols] = data[tdelta_cols]/pd.Timedelta(hours=1)

## Visualization options
### Part1:
- Select each type of delay to see its info
    - % of patients related to that delay
    - Mean amount of delay
    -  ??
- multiple selection for comparison of each delay, sex, comorbidity, ...
- allow for time difference calculation
    - select any two features to display the time difference information for them
- allow different time step displays
    - weeks
    - days
    - hours
    - minutes
- Filtering: filter data by different values. Allowing for display of multiple values of a single feature next to each other.
    - sex
    - age groups (define some groups)
    - comorbitities
    - months of the year
    - hours of admission (morning, noon, evening, midnight, ...)
- allow for selection of colors for each column? (check possibility)

### Part 2:
- SELECTION of records by year, month, day, gender, comorbidity
- SELECTION of intervals of time: records between year1,month1,day1  and year2,month2,day2
- SELECTION of intervals of age: records of patients in a specific age interval
- SELECTION of specific patient records: what services did he use, delays he had, ...
- heatmap of number of patients in each month and each week for year selection?
- histograms/KDEs for SELECTION of delay type for every month/week?
- the stacked bar plot above, but with multiselect options instead of dropdown
- number of patients in each ward at every moment (for ward capacity analysis) (hard)

In [11]:
means = data[tdelta_cols].mean()
px.bar(means, orientation='h', title='Mean wait time for event after entry'.title(), labels={"index": "", "value": "Average wait time (hours)"})

## Dash attempt

In [7]:
app = Dash(__name__)

app.layout = html.Div([
    html.Div(children='Hello World'),
    dash_table.DataTable(data=data.to_dict('records'), page_size=10),
    dcc.Graph(figure=px.histogram(data, x='nward delay', y='neurologist delay', histfunc='avg'))
])

app.run(debug=True)