# Modelling
## Introduction
In this section we develop models guided by insights from the exploratory data analysis. Our goal is to identify the factors that most strongly predict cyclist injury severity in San Francisco. We also estimate crash severity and crash count models to understand both the likelihood of severe outcomes when a crash occurs and the frequency of crashes across the network. This differs from studies such as Scarano et al. (2023), which use national datasets and more advanced modeling frameworks; our work applies similar count and severity models to San Franciscoâ€™s TIMS bicycle crash data. This is useful because a city-level analysis captures local patterns and street conditions that broader national studies cannot reflect. Although TIMS data are pre-processed and standardized, additional cleaning and filtering were required to obtain a consistent set of San Francisco bicycle crashes suitable for modeling.

## Crash Severity Model
The data considers four crash severities. The outcome of this kind of statistical modelling is highly dependent on the proportion of data available for each crash severity.

In [7]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")

from pathlib import Path
# Importing custom data cleaning functions
from tools.data_cleaning import*

In [None]:
# Importing the data

crashes, parties, victims, victim_level = load_all_clean(Path("data"))

In [11]:
import pandas as pd

def categorize_outcome(row):
    # Fatal trumps injury; otherwise no injury
    if pd.notna(row["NUMBER_KILLED"]) and row["NUMBER_KILLED"] > 0:
        return "Fatality"
    if pd.notna(row["NUMBER_INJURED"]) and row["NUMBER_INJURED"] > 0:
        return "Injury"
    return "No injury"

crashes["Crash outcome"] = crashes.apply(categorize_outcome, axis=1)

counts = crashes["Crash outcome"].value_counts().reindex(
    ["No injury", "Injury", "Fatality"], fill_value=0
)
total = counts.sum()

table = pd.DataFrame(
    {
        "Crash outcome": counts.index,
        "Number of events": counts.values,
        "Percent of total": (counts / total * 100).round(1),
    }
)
table.loc[len(table)] = ["Total", total, 100.0]

table_style = (
    table.style.format({"Number of events": "{:,}", "Percent of total": "{:.1f}%"})
    .hide(axis="index")
    .set_table_styles(
        [{"selector": "th", "props": [("font-weight", "bold"), ("text-align", "left")]}]
    )
)

display(table_style)


Crash outcome,Number of events,Percent of total
No injury,0,0.0%
Injury,4963,99.5%
Fatality,23,0.5%
Total,4986,100.0%
