Assignment 3 - Visualization 2 - Darshan Panesar

2024 Monthly Average of Covid Hospitalizations and ICU Patients in Toronto

Dataset: COVID-19 cases in hospital and ICU, by Ontario Health (OH) region
Spanning 2020-2024

Link: https://data.ontario.ca/dataset/covid-19-cases-in-hospital-and-icu-by-ontario-health-region 

Note: for this assignment only part of the dataset was used. Specifically the Toronto data out of all the regional data. To do so the data was filtered in excel by filtering for Toronto. and a new excel sheet was generated with the Toronto data only. Then a second filter was used to get only the 2024 values - as the data set was quite large spanning 2020-2024.

For the graph plotly was used - with lecture 5 as the reference

In [1]:
#Import Packages:
import pandas as pd
import numpy as np
import plotly.graph_objects as go

In [2]:
#import data file:
df = pd.read_excel('/Users/darshs/Documents/visualization-main/toronto_covid_oh_2024.xlsx')

#prints first 5 rows of data
df.head()

Unnamed: 0,_id,date,date_ymd,date_md,oh_region,hospitalizations,icu_current_covid,icu_crci_total,icu_current_covid_vented,icu_crci_total_vented,icu_former_covid,icu_former_covid_vented
0,38531,2024-01-01T00:00:00,01/01/2024,2025-01-01 00:00:00,TORONTO,344,25,35,14,18,10,4
1,38532,2024-01-02T00:00:00,01/02/2024,2025-01-02 00:00:00,TORONTO,346,23,32,14,17,9,3
2,38533,2024-01-03T00:00:00,01/03/2024,2025-01-03 00:00:00,TORONTO,349,21,28,16,18,7,2
3,38534,2024-01-04T00:00:00,01/04/2024,2025-01-04 00:00:00,TORONTO,330,20,26,13,16,6,3
4,38535,2024-01-05T00:00:00,01/05/2024,2025-01-05 00:00:00,TORONTO,298,21,25,14,16,4,2


In [3]:
#Make a variable for monthly averages for both hospitalizations and ICU patients with covid
#Reference: https://stackoverflow.com/questions/65471540/get-monthly-average-in-pandas and https://pandas.pydata.org/docs/user_guide/groupby.html

month_avg = df.groupby(pd.PeriodIndex(df['date_ymd'], freq = 'M'))[['hospitalizations', 'icu_current_covid']].mean().reset_index()


print(month_avg)

   date_ymd  hospitalizations  icu_current_covid
0   2024-01        235.290323          17.548387
1   2024-02         88.655172           7.551724
2   2024-03         40.419355           3.387097
3   2024-04         42.866667           3.166667
4   2024-05         91.612903           2.451613
5   2024-06         77.200000           2.033333
6   2024-07         82.225806           4.258065
7   2024-08        110.935484           4.225806
8   2024-09        130.400000           4.900000
9   2024-10        113.451613           5.032258
10  2024-11         91.920000           5.680000


In [4]:
#Convert the date into just the month name - not necessary but visually better
#Reference: https://www.programiz.com/python-programming/datetime/strftime 

month_avg['date_ymd'] = month_avg['date_ymd'].dt.strftime('%B') #The (%B) is the month name and strftime is the converting function

month_avg['date_ymd']


0       January
1      February
2         March
3         April
4           May
5          June
6          July
7        August
8     September
9       October
10     November
Name: date_ymd, dtype: object

In [5]:
#Calculate standard error of the mean (SEM) for both variables: note this needs to be calculated from the OG data for each month
# Reference: https://www.statology.org/standard-error-of-mean-python/
#For groupby functions: https://pandas.pydata.org/docs/user_guide/groupby.html

sem_hospitalizations = df.groupby(pd.PeriodIndex(df['date_ymd'], freq = 'M'))['hospitalizations'].apply(lambda x: x.std()) / np.sqrt(len(month_avg['hospitalizations']))
sem_icu = df.groupby(pd.PeriodIndex(df['date_ymd'], freq = 'M'))['icu_current_covid'].apply(lambda x: x.std()) / np.sqrt(len(month_avg['icu_current_covid']))

print(sem_hospitalizations)
print(sem_icu)

date_ymd
2024-01    22.162375
2024-02     6.669019
2024-03     1.690928
2024-04     2.606639
2024-05     4.502980
2024-06     3.859930
2024-07     3.033859
2024-08     7.177378
2024-09     3.825013
2024-10     6.342659
2024-11     2.154840
Freq: M, Name: hospitalizations, dtype: float64
date_ymd
2024-01    0.978058
2024-02    0.855032
2024-03    0.277187
2024-04    0.210736
2024-05    0.309826
2024-06    0.367004
2024-07    0.696170
2024-08    0.645308
2024-09    0.483910
2024-10    0.629978
2024-11    0.366391
Freq: M, Name: icu_current_covid, dtype: float64


In [6]:
#x and y axis values
x = month_avg['date_ymd'] #Month
y1 = month_avg['hospitalizations'] #Hospitalizations with COVID-19
y2 = month_avg['icu_current_covid'] #ICU Patients with COVID-19

In [7]:
# Customizations for graph:

title_font = dict(family='Arial', size=20, color='black')
x_font = dict(family='Comic Sans MS', size=12, color='black')
y_font = dict(family='Comic Sans MS', size=12, color='black')


In [8]:
#Additional Reference used: https://plotly.com/python/creating-and-updating-figures/
#Reference for adding error bars: https://plotly.com/python/error-bars/

# Create a bar chart
covid_graph = go.Figure() #make empty figure (this can handle arguments though e.g., (data=[go.Bar(x=x1, y=y1)])

#plot type and data for the first y variable Hospitalizations with COVID-19
covid_graph.add_trace(go.Bar(x=x, y=y1, name='Hospitalizations with COVID-19', marker_color='green',
                             error_y=dict(type='data', array=sem_hospitalizations, color='black')
))

#plot type and data for the second y variable ICU Patients with COVID-19
covid_graph.add_trace(go.Bar(x=x, y=y2, name='ICU Patients with COVID-19', marker_color='red',
                              error_y=dict(type='data', array=sem_icu, color='black')
))


In [9]:
# Add titles and labels to the graph
# Additional Reference: https://plotly.com/python/figure-labels/
covid_graph.update_layout(
    title= dict(text='2024 Monthly Average of Covid Hospitalizations and ICU Patients in Toronto', font = title_font),
    xaxis_title= dict(text='Month', font = x_font),
    yaxis_title= dict(text='Average Number of Patients', font = y_font),
    legend_title_text='Legend'   
)

covid_graph.show()