# Modelling
## Introduction
In this section we develop models guided by insights from the exploratory data analysis. Our goal is to identify the factors that most strongly predict cyclist injury severity in San Francisco. We also estimate crash severity and crash count models to understand both the likelihood of severe outcomes when a crash occurs and the frequency of crashes across the network. This differs from studies such as Scarano et al. (2023), which use national datasets and more advanced modeling frameworks; our work applies similar count and severity models to San Franciscoâ€™s TIMS bicycle crash data. This is useful because a city-level analysis captures local patterns and street conditions that broader national studies cannot reflect. Although TIMS data are pre-processed and standardized, additional cleaning and filtering were required to obtain a consistent set of San Francisco bicycle crashes suitable for modeling.

## Crash Severity Model
The data considers four crash severities. The outcome of this kind of statistical modelling is highly dependent on the proportion of data available for each crash severity.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")

# Importing custom data cleaning functions
from tools.data_cleaning import*

In [2]:
# Importing the data
crashes = clean_crashes("data/Crashes.csv")
parties = clean_parties("data/Parties.csv")
victims = clean_victims("data/Victims.csv")
victim_level = build_victim_level_table("data")

In [3]:
# Creating a summary table for proportions of different severity crashes

# Mapping the different severity levels to their descriptions
severity_labels = {
    1: "Fatal",
    2: "Severe injury",
    3: "Other visible injury",
    4: "Complaint of pain",
}

crashes["Crash severity"] = crashes["COLLISION_SEVERITY"].map(severity_labels)

# Counting the number of crashes for each severity level
counts = crashes["Crash severity"].value_counts()
total = counts.sum()

# Populating the table
table = pd.DataFrame(
    {
        "Crash severity": counts.index,
        "Number of events": counts.values,
        "Percent of total": (counts / total * 100).round(1),
    }
)
table.loc[len(table)] = ["Total", total, 100.0]

# Defining the table style
display(table.style.hide(axis="index").format({"Number of events": "{:,}", "Percent of total": "{:.1f}%"}))

Crash severity,Number of events,Percent of total
Complaint of pain,2277,45.7%
Other visible injury,2212,44.4%
Severe injury,474,9.5%
Fatal,23,0.5%
Total,4986,100.0%


In [4]:
# Creating a summary table for proportions of different severity crashes
# Consider fatal and severe injury to be KSI (killed or severely injured)

severity_to_group = {
    1: "KSI",   # fatal
    2: "KSI",   # severe injury
    3: "Other injury",
    4: "Other injury",
}
crashes["severity_group"] = crashes["COLLISION_SEVERITY"].map(severity_to_group)

counts = crashes["severity_group"].value_counts().reindex(["KSI", "Other injury"], fill_value=0)
total = counts.sum()

# Populating the table
table = pd.DataFrame(
    {
        "Severity group": counts.index,
        "Number of events": counts.values,
        "Percent of total": (counts / total * 100).round(1),
    }
)
table.loc[len(table)] = ["Total", total, 100.0]

# Defining the table style
display(table.style.hide(axis="index").format({"Number of events": "{:,}", "Percent of total": "{:.1f}%"}))



Severity group,Number of events,Percent of total
KSI,497,10.0%
Other injury,4489,90.0%
Total,4986,100.0%
