# Door Andeshi: Data Visualization Workshop

## Please run the cell below if you do not have the required libraries

In [None]:
!pip install pandas
!pip install numpy
!pip install seaborn
!pip install matplotlib

## Importing Libraries

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
import plotly.express as px

## Logistics Dataset

The following code will load the Road Traffic Accident (RTA) data for Lahore, July 2021, that was obtained from Rescue 1122. It has been preprocessed for the purpose of this exercise.

In [None]:
# loading dataset
df_rta = pd.read_csv('1122_RTA_dataset.csv')

## Understanding the Features of the Dataset

Run the following code to display the first 3 entries of the dataset.

In [None]:
df_rta.head(3)

Since all the columns are not visible, run the following code to view the column names.

In [None]:
df_rta.columns

Some important column names are explained below:

- address: the addresss of the location where the incident took place
- call_received_at: the time when the call was received
- response_time: the response time stored as a datetime object
- elapsed_time: the elapsed time stored as a datetime object
- age: patient age
- gender: patient gender
- education: patient education level
- injury_type: type of injury incurred by the patient
- patient_fate: the outcome of the emergency call with regards to the patient
- accident_cause: cause of accident
- vehicles_involved: the vehicles involved in the RTA
- peak_nonpeak: whether the call was received during peak hours or non-peak hours
- patient_deal_time_mins: the time it took to deal with the patient
- mileage_km: the distance travelled by the ambulance (km)
- motorbikes/cars/pedestrians/rickshaws: the no. of motorbikes/cars/pedestrians/rickshaws involved in the RTA
- lat/long: the geocoordinates of the incident in degrees
- utm_x/utm_y: the geocoordinates of the incident in meters
- severity: the severity of injury in each accident (severity increases from 1 to 5)

You may go over each column if you wish so by replacing the column names in the following code. This is only if you wish to view each column.

In [None]:
df_rta['injury_cause']

## Exercise 1

The following code creates a heatmap of the RTA count against the hour of the day and the day of the week:

In [None]:
temp = np.zeros(shape=(24,7))

for h in df_rta['call_hour'].unique():
    for wd in df_rta['call_weekday_no'].unique():
        temp[h,wd] = len(df_rta[(df_rta['call_hour']==h) & (df_rta['call_weekday_no']==wd)].loc[:,'call_hour'])

sns.set(rc={'figure.figsize':(12,12)})
fig = plt.figure(figsize = (8,7), tight_layout=True)


# Main heatmap code
ax = sns.heatmap(temp)



##################To align x labels:####################
ax.set(xlabel='Day of Week', ylabel='Hour of Day')
ax.set_title('Road Accidents, Day of Week & Hour of Day', fontsize=20)
plt.xticks([0,1,2,3,4,5,6],['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'],rotation=45)

plt.setp( ax.xaxis.get_majorticklabels(), rotation=-45) 

# Create offset transform by 5 points in x direction
dx = 35/72.; dy = 0/72. 
offset = matplotlib.transforms.ScaledTranslation(dx, dy, fig.dpi_scale_trans)

# apply offset transform to all x ticklabels.
for label in ax.xaxis.get_majorticklabels():
    label.set_transform(label.get_transform() + offset)
########################################################

plt.show()

Try using different color palettes for the heatmap above. The following may help you: https://python-graph-gallery.com/92-control-color-in-seaborn-heatmaps.

Does a single colored heatmap do a better job of conveying what you want your audience to know, or a multi-colored heatmap?

## Exercise 2

We would like to see if the response time has any link to the death of patients in an RTA emergency.

We first start with a scatter plot. Use the 'plt.scatter' function to plot a scatter plot. Plot 'response_time_mins' on the x axis and 'patient_fate' on the y-axis.

In [None]:
fig, ax = plt.subplots(figsize = (8,5))

plt.scatter()

From a first look, it seems that response times are much more for EMS calls that either resulted in patients receving first aid or taken to the hospital, compared to deaths. Let us now use our good old friends, the bar plots, in determining if this is really the case. Run the following code to plot the bar plots.

In [None]:
rt_vs_fate = df_rta[['response_time_mins']].groupby(df_rta['patient_fate']).mean()

fig, ax = plt.subplots(figsize = (8,5))

rt_vs_fate.plot.bar(ax=ax)
plt.xticks(rotation=0)
for index, value in enumerate(rt_vs_fate['response_time_mins'].to_list()):
    plt.text(index-0.08, value+0.1, str(round(value,2)))

plt.xlabel('')
plt.ylabel('Response Time (mins)')
plt.title('Average Response Time vs Patient Fate', fontsize=17)
ax.get_legend().remove()

This is surprising; according to our bar plots, the average response time for calls that resulted in deaths is actually **greater** than the other two outcomes. This is completely opposite to our initial observations from the scatter plot!

However, this does not present the whole picture. Run the following code to plot a violin plot for the same problem. Note that the distribution for the "Dead" patient fate has bimodal tendencies, perhaps due to insufficient data.

In [None]:
sns.set(rc={'figure.figsize':(15,6)})
sns.violinplot(x = df_rta['patient_fate'], y = df_rta['response_time_mins'], data=pd.melt(df_rta))

We conclude that we do not have enough data to determine a correlation between deaths and response times. Obtaining more data on patient deaths and their corresponding response times may help us make such claims with greater confidence.

## Exericse 3

Let us now create an RTA hotspot and analyze which areas of Lahore experienced the most traffic accidents in July 2021. Run the following code to first observe the geospatial distirbution of our dataset. (It is made using Plotly, an interactive plots library that we will be discussing later on)

In [None]:
fig = px.scatter_mapbox(df_rta, lat=df_rta["lat"], lon=df_rta["long"],size_max=5,zoom=9,mapbox_style="open-street-map",range_color=[0,500])

fig.show()

For the sake of simplicity, we will be creating a simple hexbin to determine accident hotspots. Complete the following code and run it.

Try changing the number of bins and color maps and observe any differences in the outcome.

In [None]:
fig, axes = plt.subplots(figsize=(10, 10))

# Complete the following 2 lines
x = 
y = 

nbins = 50

axes.hexbin(x, y, gridsize=nbins, cmap = 'Blues')

## Congratulations! You are done with the exercises.