# Visualization of accident patterns in Kiel 🚗🚗🚑🚑🚒

In this notebook we are going to dive deeper into accidents that happened in the Kieler area in Germany from 2016 to 2022. We will use a dataset that contains 6772 entries of accidents by type, date, time, location, weather, traffic conditions, and other relevant factors.

The main goal of this notebook is to explore how different factors affect the occurrence and severity of accidents in Kiel over time. We will also compare the accident patterns across different weekdays and hours.

This topic is important because accidents can have serious consequences for people's health, safety, and well-being. They can also cause economic losses and environmental damage. By understanding the causes and trends of accidents in Kiel, we can identify potential risks and opportunities for improving road safety.

In this notebook we will show:

- A map of accidents in Kiel 🔎
- A plot of accidents in Kiel by type and year 📊
- A time series plot of accidents in Kiel 📈
- A bar chart of accident type per weekday 📉
- A line chart of accidents per hour/month🔥

Note: All the provided diagrams are interactive so you can get detailed information from hovering over them.

Author: Martin J. Brucker 👨‍💻

Student Number: 942815 👩‍‍🎓

Fachhochschule Kiel 🏫


In [99]:
# preperation for the analysis
import altair as alt
import geopandas as gpd
import pandas as pd

# setting configuration so dhat altari ignores max row limit
alt.data_transformers.disable_max_rows()

# importing the data
accidentsKiel = gpd.read_file("data/accidents.geojson")
districtsKiel = gpd.read_file("data/districts.geojson")
roadsKiel = gpd.read_file("data/roads.geojson")

# data preparation

# create a column with the year and month combined
accidentsKiel["year_month"] = accidentsKiel["year"].astype(str) + "-" + accidentsKiel["month"].astype(str)

accidentsKiel["year_month"] = pd.to_datetime(accidentsKiel["year_month"], format="%Y-%B")

# Map of accidents in Kiel

In [100]:
# Copy the GeoJSON files
accidents = accidentsKiel.copy()
districts = districtsKiel.copy()

# First, let's create a count of accidents for each unique point (assuming each point represents an accident location)
accidentsKiel['count'] = 1
accidentsKiel['point'] = accidentsKiel.geometry
accidents_count = accidentsKiel.groupby('point').agg({'count': 'size'}).reset_index()

# Convert the accidents_count dataframe to a GeoDataFrame
accidents_count = gpd.GeoDataFrame(accidents_count, geometry='point')

# Now we spatially join this with the districts dataframe to get accidents per district
accidents_per_district = gpd.sjoin(accidents_count, districtsKiel, how='inner', predicate='within')

# Group by district name to get the total count of accidents per district
accidents_per_district = accidents_per_district.groupby('name').agg({'count':'sum'}).reset_index()

# Merge the accidents_per_district with the districts dataframe
choropleth_data = districtsKiel.merge(accidents_per_district, left_on='name', right_on='name', how='left')

# Create the Altair chart
alt.Chart(choropleth_data).mark_geoshape(
    stroke='black',
    strokeWidth=0.5
).encode(
    color=alt.Color('count:Q', scale=alt.Scale(scheme='greenblue'), title='Number of Accidents'),
    tooltip=['name:N', 'count:Q']
).properties(
    width=500,
    height=300,
    title='Accidents per District in Kiel Germany'
).project(
    type='identity', 
    reflectY=True
)


### Interpretation of the Map

The map presents a detailed breakdown of accidents in Kiel by district. Each district is color-coded to represent the number of accidents, with darker colors indicating higher numbers.

#### Key Observations

- **High Accident Districts**: The districts of Gaarden-Ost (528 accidents), Gaarden-Süd/Kronsburg (545 accidents), Hassee (348 accidents), Ravensberg (388 accidents), Schreventeich (514 accidents), Südfriedhof (783 accidents), and Wik (420 accidents) have the highest number of accidents.
- **Low Accident Districts**: The districts of Friedrichsort (36 accidents), Meimersdorf (38 accidents), Russee (84 accidents), Rönne (1 accident), Schilksee (96 accidents), and Wellsee (76 accidents) have the lowest number of accidents.

#### Insights

This map provides a comprehensive view of accident trends in Kiel by district. The data can be instrumental for traffic authorities and policymakers to identify specific areas requiring intervention. The notable high number of accidents in certain districts suggests that measures need to be put into place in these areas.


# Accidents in Kiel by Type and Year

In the following diagram the number of accidents in Kiel is shown by type and year. The number of accidents is shown on the y-axis and the year on the x-axis. The different types of accidents are shown in different colors. The diagram shows that the number of accidents in Kiel has been decreasing since 2015. The number of accidents with injuries is the highest, followed by accidents with property damage and accidents with only material damage.

In [101]:
# Assuming 'year' is the column in your dataframe that contains the year information
melted_df = accidentsKiel.melt(value_vars=['bike', 'car', 'pedestrian', 'motorcycle', 'truck', 'other'], 
                               var_name='type', value_name='accidentCount', id_vars='year')

# Create an interactive Altair chart with tooltip and title
chart = alt.Chart(melted_df).mark_bar().encode(
    x=alt.X('type:N', sort='-y', title='Type of Accident'),
    y=alt.Y('accidentCount:Q', title='Count of Accidents'),
    color=alt.Color('year:N', title='Year of Accident'),
    tooltip=['type', 'year']  # Add tooltip
).properties(
    title='Accidents in Kiel by Type and Year'  # Add chart title
).interactive()

chart


### Interpretation of the Graph

The bar graph presents a detailed breakdown of accidents in Kiel by type and year, from 2016 to 2022. Each bar is color-coded to represent different years, and the types of accidents include car, bike, other, pedestrian, motorcycle, and truck.

#### Key Observations

- **Car Accidents Predominance**: Car accidents are significantly higher than any other type throughout all years.
- **Decline in Car Accidents**: There is a noticeable decline in car accidents over the years.
- **Stable Other Types**: Accidents involving bikes, pedestrians, motorcycles, and trucks remain relatively stable with slight variations.

#### Insights

This graph provides a comprehensive view of accident trends in Kiel by both type and year. The data can be instrumental for traffic authorities and policymakers to identify specific areas requiring intervention. The notable decline in car accidents suggests that measures put into place are effective or there's a change in reporting or driving behaviors.


# Time Series

In the following diagram a time Series with a count of accidents per month is represented.

In [102]:
# Pre-calculate the monthly counts of accidents in Kiel
monthly_accident_counts = accidentsKiel.groupby('year_month').size().reset_index(name='counts')

# Create a line chart to visualize the counts
line = alt.Chart(monthly_accident_counts).mark_line(color='blue').encode(
    x=alt.X('year_month:T', title='Year and Month'),
    y=alt.Y('counts:Q', title='Number of Accidents'),
    tooltip=['year_month:T', 'counts:Q']  # Add tooltips
)

# Create a LOESS line to show the trend
loess = line.transform_loess('year_month', 'counts', bandwidth=0.3).mark_line(color="red")

# Combine the line chart and the LOESS line
chart = line + loess 

# Improve aesthetics
chart = chart.properties(
    title='Accidents in Kiel Over Time',
    width=600,
    height=400
)

chart


### Interpretation of the Graph

The line graph illustrates the trend of accidents in Kiel over time, spanning from 2016 to end-2022. The blue line represents the monthly number of accidents, while the red line depicts a smoothed trend to better visualize long-term patterns.

#### Key Observations

- **Volatility**: The number of accidents each month is highly volatile, with noticeable peaks and troughs.
- **Trend Analysis**: Despite short-term fluctuations, the red trend line indicates a general decline in accidents over time.
- **Peak Accident Periods**: Notable spikes are observed around mid-2018 and early 2020.

#### Insights

This graph is instrumental for understanding accident trends in Kiel. It highlights periods with increased accident rates, aiding authorities and policymakers in identifying potential causes and implementing preventive measures. The declining trend suggests that overall safety might be improving or reporting mechanisms are changing.


# Accident Type per Weekday

In [103]:
# Calculate the total count for each weekday
total = accidentsKiel.groupby(['weekday']).size().reset_index(name='total')

# Calculate the count for each type1 for each weekday
type1_counts = accidentsKiel.groupby(['weekday', 'type1']).size().reset_index(name='count')

# Merge the two dataframes
merged = pd.merge(type1_counts, total, on='weekday')

# Calculate the percentage
merged['percentage'] = merged['count'] / merged['total']

# Create the chart
chart = alt.Chart(merged).mark_bar().encode(
    x=alt.X('weekday', title='Weekday'),
    y=alt.Y('percentage', axis=alt.Axis(format='%'), title='Percentage'),
    color='type1',
    tooltip=[alt.Tooltip('type1', title='Type 1'), alt.Tooltip('percentage', format='.2%', title='Percentage')]
).properties(
    title='Percentage Representation of Each Type1 per Weekday'
).interactive()

chart

### Interpretation of the Chart

The stacked bar chart represents the percentage representation of each type of accident occurring on different days of the week. Each color in the bars corresponds to a specific type of accident as indicated in the legend on the right.

#### Key Observations

- **Accident Diversity**: There is a diverse range of accidents happening every day, with collisions between vehicles and pedestrians being notably prevalent.
- **Weekend Trends**: On Saturday and Sunday, there's an observable increase in "Accident of another kind".
- **Midweek Patterns**: Collisions with another vehicle while changing lanes are more frequent during midweek.

#### Insights

This visual representation aids in quickly identifying patterns and anomalies related to accident occurrences on different weekdays. Such insights are crucial for implementing targeted safety measures to minimize specific types of accidents prevalent on particular days.


# Accidents per Hour/Month

In [104]:
# Convert the 'point' column to string
accidentsKiel['point'] = accidentsKiel['point'].astype(str)

# Then create your chart
chart = alt.Chart(accidentsKiel).mark_rect().encode(
    x=alt.X('hour:O', title='Hour'),
    y=alt.Y('month:O', title='Month', sort=['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']),
    color=alt.Color('count():Q', title='Number of Accidents', scale=alt.Scale(scheme='redyellowblue', reverse=True)),
    tooltip=[alt.Tooltip('month:O', title='Month'), alt.Tooltip('hour:O', title='Hour'), alt.Tooltip('count():Q', title='Number of Accidents')]
).properties(
    title='Traffic Accidents in Kiel',
    width=600,
    height=400
).interactive()

# display chart
chart


### Interpretation of the Chart

The heatmap represents the frequency of traffic accidents in Kiel, categorized by month and hour of the day. The color intensity, ranging from blue (low) to red (high), indicates the number of accidents.

#### Key Observations

- **Nighttime Safety**: There is a significant reduction in accidents during the late night and early morning hours across all months.
- **April Afternoons**: In April, there is a noticeable spike in accidents around 15:00.
- **Consistency Across Months**: While there are variations, no single month stands out as having a significantly higher overall rate of accidents.

#### Insights

This visual data provides an understanding of when traffic accidents are most likely to occur in Kiel. It can be instrumental for law enforcement and city planners to implement safety measures, manage traffic flow, and perhaps increase patrols during high-risk hours or months. This information can also be useful for drivers, who can adjust their driving habits during these high-risk times to ensure their safety. 

Please note that the outlined cells in the heatmap seem significant, but without additional context or a legend for these outlines, their meaning is unclear. If these outlines represent additional data or highlight specific points of interest, including this information in the chart's tooltip or legend would be beneficial for the viewer. 
