<a href="https://colab.research.google.com/github/ScottishTrooper/SC3021_pROJECT/blob/main/SC3021_Deliverable1_SG_RoadSafety.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SC3021 Deliverable 1 — Singapore Road Safety: Severity & Risk Drivers  
## ASK + PREPARE

**Deliverable 1 scope:** ASK + PREPARE only.  
We define the analytical question and requirements, then explore **5 Singapore datasets** you can download as **single CSV files** and upload into Colab’s `/content/sample_data/`.

**Profiling rule (as required):** For each dataset we run:
- `df.head()`  
- `df.info()`  
- `df.describe(include="all")`

We also provide for each dataset:
- brief description + how it matches requirements  
- strengths + weaknesses (≥2 quality criteria: one positive, one problematic)  
- concluding suitability paragraph




In [None]:

import pandas as pd
import numpy as np

pd.set_option("display.max_columns", 200)
pd.set_option("display.width", 160)

# During evaluation, datasets will be uploaded to:
# /content/sample_data/


## 1) ASK — Research Question

### Research Question
> **Which measurable factors (weather, vehicle mix, and time patterns) are associated with higher accident severity in Singapore?**

### Motivation
Singapore publishes high-quality public statistics across road safety, vehicle population, and weather.  
Even when datasets are aggregated (monthly/annual), they allow:
- building severity proxies (e.g., casualty counts),
- testing relationships with rainfall,
- controlling for vehicle population and vehicle types involved.

### Stakeholders
- Singapore Police Force (Traffic Police) / public safety policy
- Land Transport Authority (LTA) / transport planning
- Emergency response agencies / demand planning


## 2) Key Terms

- **Accident severity:** seriousness of outcomes (fatal / injury categories).  
- **Severity proxy (binary):** “high-severity month” if casualties exceed a threshold; used later for logistic regression.  
- **Severity proxy (continuous):** number of casualties in a month; used later for linear regression.  
- **Exposure:** vehicle population and composition of vehicles involved.  
- **Weather stress:** rainfall as a proxy for poor driving conditions.


## 3) Hypotheses (to test in later deliverables)

- **H₁ (Weather):** Higher rainfall months are associated with higher accident casualty counts.  
- **H₂ (Vehicle mix):** Higher involvement of certain vehicle types aligns with higher severity outcomes.  
- **H₃ (Temporal):** Accident severity exhibits trend/seasonality across time.

Deliverable 1 does not test hypotheses yet; it verifies dataset suitability to test them later.


# REQUIREMENT ANALYSIS
The objective of this project is to identify measurable factors associated with higher traffic accident severity in Singapore.

To support this objective, the datasets must collectively provide a proxy for accident severity, temporal information for alignment, and explanatory variables such as weather and vehicle composition.

Severity is represented using injury severity categories and monthly casualty counts, enabling both continuous and binary severity measures.

All datasets must include a time dimension (monthly or annual) to allow trend analysis and integration.

Weather data (rainfall) is required to test hypotheses relating adverse conditions to accident severity.
Vehicle-related datasets are required to capture exposure and vehicle mix effects.

Exposure control variables (vehicle population) are necessary to normalize severity outcomes and avoid misleading trends driven by traffic growth.

Datasets must be integrable primarily through common time keys (month or year).


 All datasets must be publicly accessible, downloadable as single CSV files, and readable using pd.read_csv() as a non-functional requirement



## 4) Data Requirements (PREPARE)

To address the research question, the selected datasets should collectively provide:

**Outcome (severity)**
- accident severity categories OR casualty counts (monthly/annual)

**Explanatory features**
- weather: rainfall (monthly)
- vehicle mix: vehicles involved by type (annual) + vehicle population (monthly) to normalize exposure

**Integration feasibility**
- common time keys: month/year (preferred)
- consistent Singapore-wide scope


---
# 5) Candidate Datasets (all exist + downloadable as single CSV)

Download each dataset as CSV and upload into `/content/sample_data/` with the exact filenames below.


## DS1 — Causes of Accidents by Severity of Injury Sustained (SPF)

**Purpose:** Severity-focused dataset that links accident causes to severity categories (aggregated).  
**Why it helps:** Supports defining a “severe” label and interpreting which causes are associated with higher severity.

The dataset is available at https://data.gov.sg/datasets/d_d085ce60a604f938aff6779ed08a106a/view




In [None]:

df1 = pd.read_csv("/content/CausesofAccidentsbySeverityofInjurySustained.csv")
df1.head()


Unnamed: 0,year,accident_classification,road_user_group,causes_of_accident,number_of_accidents
0,2012,FATAL,"Drivers, Riders or Cyclists",Failing to Keep a Proper Lookout,59
1,2012,FATAL,"Drivers, Riders or Cyclists",Failing to Have Proper Control,50
2,2012,FATAL,"Drivers, Riders or Cyclists",Failing to Give Way to Traffic with Right of Way,9
3,2012,FATAL,"Drivers, Riders or Cyclists",Changing Lane without Due Care,6
4,2012,FATAL,"Drivers, Riders or Cyclists",Disobeying Traffic Light Signals Resulting in ...,9


In [None]:

df1.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 328 entries, 0 to 327
Data columns (total 5 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   year                     328 non-null    int64 
 1   accident_classification  328 non-null    object
 2   road_user_group          286 non-null    object
 3   causes_of_accident       328 non-null    object
 4   number_of_accidents      328 non-null    int64 
dtypes: int64(2), object(3)
memory usage: 12.9+ KB


In [None]:

df1.describe(include="all")


Unnamed: 0,year,accident_classification,road_user_group,causes_of_accident,number_of_accidents
count,328.0,328,286,328,328.0
unique,,2,2,24,
top,,FATAL,"Drivers, Riders or Cyclists",Failing to Keep a Proper Lookout,
freq,,164,154,14,
mean,2015.036585,,,,140.121951
std,2.005772,,,,403.8008
min,2012.0,,,,0.0
25%,2013.0,,,,2.0
50%,2015.0,,,,8.0
75%,2017.0,,,,56.5


### DS1 Strengths & Weaknesses
- **Strength:** Organized around *severity categories* (amenable to defining “severe” vs “non-severe”).  
-  **Weakness:** “One accident may have multiple causes” → potential double-counting; must interpret carefully.

### DS1 Suitability Conclusion
DS1 is **suitable** for severity breakdowns and defining severity proxies, but causal claims are limited due to aggregation and multi-cause counting.


## DS2 — Road Traffic Accident Casualties, Monthly (SINGSTAT; Source: SPF)

**Purpose:** Core monthly outcome dataset for severity proxy (casualties).  
**Why it helps:** A strong target variable for later regression (continuous monthly casualties; binary “high casualty month”).
 The dataset is available at https://data.gov.sg/datasets/d_5dec466b08a55497218daf8bafbfe96c/view



In [None]:

df2 = pd.read_csv("/content/RoadTrafficAccidentCasualtiesMonthly.csv")
df2.head()


Unnamed: 0,DataSeries,2025Nov,2025Oct,2025Sep,2025Aug,2025Jul,2025Jun,2025May,2025Apr,2025Mar,2025Feb,2025Jan,2024Dec,2024Nov,2024Oct,2024Sep,2024Aug,2024Jul,2024Jun,2024May,2024Apr,2024Mar,2024Feb,2024Jan,2023Dec,2023Nov,2023Oct,2023Sep,2023Aug,2023Jul,2023Jun,2023May,2023Apr,2023Mar,2023Feb,2023Jan,2022Dec,2022Nov,2022Oct,2022Sep,2022Aug,2022Jul,2022Jun,2022May,2022Apr,2022Mar,2022Feb,2022Jan,2021Dec,2021Nov,2021Oct,2021Sep,2021Aug,2021Jul,2021Jun,2021May,2021Apr,2021Mar,2021Feb,2021Jan,2020Dec,2020Nov,2020Oct,2020Sep,2020Aug,2020Jul,2020Jun,2020May,2020Apr,2020Mar,2020Feb,2020Jan,2019Dec,2019Nov,2019Oct,2019Sep,2019Aug,2019Jul,2019Jun,2019May,2019Apr,2019Mar,2019Feb,2019Jan,2018Dec,2018Nov,2018Oct,2018Sep,2018Aug,2018Jul,2018Jun,2018May,2018Apr,2018Mar,2018Feb,2018Jan,2017Dec,2017Nov,2017Oct,2017Sep,...,2017Apr,2017Mar,2017Feb,2017Jan,2016Dec,2016Nov,2016Oct,2016Sep,2016Aug,2016Jul,2016Jun,2016May,2016Apr,2016Mar,2016Feb,2016Jan,2015Dec,2015Nov,2015Oct,2015Sep,2015Aug,2015Jul,2015Jun,2015May,2015Apr,2015Mar,2015Feb,2015Jan,2014Dec,2014Nov,2014Oct,2014Sep,2014Aug,2014Jul,2014Jun,2014May,2014Apr,2014Mar,2014Feb,2014Jan,2013Dec,2013Nov,2013Oct,2013Sep,2013Aug,2013Jul,2013Jun,2013May,2013Apr,2013Mar,2013Feb,2013Jan,2012Dec,2012Nov,2012Oct,2012Sep,2012Aug,2012Jul,2012Jun,2012May,2012Apr,2012Mar,2012Feb,2012Jan,2011Dec,2011Nov,2011Oct,2011Sep,2011Aug,2011Jul,2011Jun,2011May,2011Apr,2011Mar,2011Feb,2011Jan,2010Dec,2010Nov,2010Oct,2010Sep,2010Aug,2010Jul,2010Jun,2010May,2010Apr,2010Mar,2010Feb,2010Jan,2009Dec,2009Nov,2009Oct,2009Sep,2009Aug,2009Jul,2009Jun,2009May,2009Apr,2009Mar,2009Feb,2009Jan
0,Total Casualties Fatalities,13,13,12,10,11,17,18,11,12,9,12,16,15,10,10,8,11,13,7,11,12,11,18,13,11,12,7,11,11,13,10,11,11,16,10,12,3,14,15,9,10,6,5,13,11,6,4,12,10,6,6,6,9,7,12,6,14,12,7,4,6,5,4,10,3,7,2,11,11,9,11,12,15,9,10,4,9,11,6,12,13,9,8,14,8,13,9,9,11,11,9,12,6,12,10,9,15,3,11,...,10,9,11,8,13,11,15,8,12,16,11,11,13,10,8,13,11,14,14,12,12,12,16,7,15,12,14,12,10,7,11,17,9,12,11,20,13,12,18,15,12,16,9,7,13,11,17,13,10,14,19,19,19,14,15,11,15,9,10,11,15,15,17,17,12,12,19,15,19,19,16,15,16,17,15,20,14,14,13,14,20,16,17,18,19,23,7,18,24,16,13,18,14,16,10,22,14,11,12,13
1,Pedestrians,4,1,3,2,2,4,3,3,4,1,4,2,2,1,1,2,3,2,2,2,2,2,4,2,0,5,1,5,1,10,1,5,4,2,3,6,0,5,6,3,2,3,2,1,2,2,1,2,3,2,2,4,0,1,1,0,5,2,1,1,3,3,2,0,0,0,0,2,1,2,4,4,4,3,3,3,1,4,2,5,3,3,4,3,6,1,4,0,5,3,3,5,2,4,3,4,6,1,4,...,6,2,4,3,5,3,6,3,0,4,6,4,4,6,4,2,3,5,3,4,3,4,5,1,5,3,2,5,3,5,6,5,3,4,4,5,2,3,3,2,6,7,2,1,6,1,4,1,2,3,5,5,3,4,7,3,3,1,5,0,6,6,2,4,3,5,3,5,7,3,4,2,2,5,5,5,4,3,3,3,4,5,3,4,9,9,3,5,7,4,3,4,3,4,5,2,3,3,3,4
2,Personal Mobility Device Users,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,...,1,0,0,0,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na,na
3,Cyclists & Pillions,4,0,1,1,0,2,1,0,0,3,2,2,0,1,1,1,1,1,1,1,0,2,1,1,1,1,0,1,2,1,1,2,0,2,2,2,1,0,1,1,0,1,1,1,3,0,0,2,0,0,0,0,0,1,5,1,2,0,0,0,0,1,0,1,0,2,0,0,1,1,1,0,4,1,0,1,0,0,0,1,0,0,1,2,0,1,0,1,0,1,1,1,1,1,0,0,2,2,1,...,0,0,2,1,0,2,5,1,5,3,1,1,0,0,0,2,2,1,1,2,0,1,2,1,1,1,3,2,0,0,0,1,1,2,2,2,2,2,2,1,2,3,0,0,0,1,1,4,0,0,1,3,1,1,0,0,2,1,0,3,1,2,1,4,0,1,1,3,2,0,0,1,1,2,2,2,1,1,1,1,3,1,1,3,0,2,0,2,2,1,1,3,1,2,1,2,3,1,0,0
4,Motor Cyclists & Pillion Riders,5,10,5,4,8,10,11,7,5,4,6,9,10,7,5,4,6,9,4,5,8,7,11,6,9,6,6,3,6,1,7,3,6,11,4,3,1,6,3,3,6,2,2,9,5,4,3,6,3,3,1,2,9,5,6,2,5,3,5,3,3,1,2,7,2,4,2,8,5,6,6,7,6,4,7,0,8,6,4,5,10,4,3,8,2,7,5,5,6,7,4,2,3,5,7,3,4,0,5,...,2,6,4,4,7,5,3,2,6,8,3,5,9,4,3,7,4,6,7,6,7,6,7,3,7,8,8,3,6,2,4,8,3,4,5,11,7,4,11,9,4,5,5,6,7,4,10,8,4,5,10,5,11,6,7,5,8,5,2,2,6,5,11,8,6,4,9,6,9,12,9,8,9,9,7,11,4,6,6,7,11,6,10,7,9,9,4,10,12,9,7,9,10,8,4,8,4,6,8,7


In [None]:

df2.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Columns: 204 entries, DataSeries to 2009Jan
dtypes: int64(107), object(97)
memory usage: 47.9+ KB


In [None]:

df2.describe(include="all")


Unnamed: 0,DataSeries,2025Nov,2025Oct,2025Sep,2025Aug,2025Jul,2025Jun,2025May,2025Apr,2025Mar,2025Feb,2025Jan,2024Dec,2024Nov,2024Oct,2024Sep,2024Aug,2024Jul,2024Jun,2024May,2024Apr,2024Mar,2024Feb,2024Jan,2023Dec,2023Nov,2023Oct,2023Sep,2023Aug,2023Jul,2023Jun,2023May,2023Apr,2023Mar,2023Feb,2023Jan,2022Dec,2022Nov,2022Oct,2022Sep,2022Aug,2022Jul,2022Jun,2022May,2022Apr,2022Mar,2022Feb,2022Jan,2021Dec,2021Nov,2021Oct,2021Sep,2021Aug,2021Jul,2021Jun,2021May,2021Apr,2021Mar,2021Feb,2021Jan,2020Dec,2020Nov,2020Oct,2020Sep,2020Aug,2020Jul,2020Jun,2020May,2020Apr,2020Mar,2020Feb,2020Jan,2019Dec,2019Nov,2019Oct,2019Sep,2019Aug,2019Jul,2019Jun,2019May,2019Apr,2019Mar,2019Feb,2019Jan,2018Dec,2018Nov,2018Oct,2018Sep,2018Aug,2018Jul,2018Jun,2018May,2018Apr,2018Mar,2018Feb,2018Jan,2017Dec,2017Nov,2017Oct,2017Sep,...,2017Apr,2017Mar,2017Feb,2017Jan,2016Dec,2016Nov,2016Oct,2016Sep,2016Aug,2016Jul,2016Jun,2016May,2016Apr,2016Mar,2016Feb,2016Jan,2015Dec,2015Nov,2015Oct,2015Sep,2015Aug,2015Jul,2015Jun,2015May,2015Apr,2015Mar,2015Feb,2015Jan,2014Dec,2014Nov,2014Oct,2014Sep,2014Aug,2014Jul,2014Jun,2014May,2014Apr,2014Mar,2014Feb,2014Jan,2013Dec,2013Nov,2013Oct,2013Sep,2013Aug,2013Jul,2013Jun,2013May,2013Apr,2013Mar,2013Feb,2013Jan,2012Dec,2012Nov,2012Oct,2012Sep,2012Aug,2012Jul,2012Jun,2012May,2012Apr,2012Mar,2012Feb,2012Jan,2011Dec,2011Nov,2011Oct,2011Sep,2011Aug,2011Jul,2011Jun,2011May,2011Apr,2011Mar,2011Feb,2011Jan,2010Dec,2010Nov,2010Oct,2010Sep,2010Aug,2010Jul,2010Jun,2010May,2010Apr,2010Mar,2010Feb,2010Jan,2009Dec,2009Nov,2009Oct,2009Sep,2009Aug,2009Jul,2009Jun,2009May,2009Apr,2009Mar,2009Feb,2009Jan
count,30,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,...,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0
unique,16,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,17.0,18.0,16.0,13.0,15.0,17.0,17.0,16.0,16.0,16.0,17.0,17.0,18.0,16.0,16.0,17.0,16.0,17.0,17.0,14.0,17.0,17.0,18.0,17.0,16.0,16.0,16.0,17.0,16.0,15.0,16.0,16.0,16.0,17.0,16.0,15.0,13.0,19.0,17.0,17.0,17.0,16.0,16.0,17.0,16.0,17.0,17.0,17.0,16.0,17.0,17.0,17.0,17.0,16.0,17.0,17.0,16.0,18.0,17.0,16.0,17.0,17.0,16.0,16.0,19.0,17.0,17.0,17.0,17.0,17.0,15.0,16.0,16.0,18.0,17.0,17.0,18.0,16.0,17.0,17.0,17.0,17.0,16.0,16.0,17.0,17.0,15.0,18.0,17.0,15.0,18.0,16.0,18.0,16.0,18.0,18.0
top,Pedestrians,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0
freq,2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,10.0,11.0,10.0,8.0,10.0,11.0,10.0,9.0,12.0,12.0,11.0,7.0,7.0,9.0,8.0,11.0,9.0,10.0,8.0,8.0,9.0,11.0,9.0,8.0,11.0,12.0,11.0,9.0,11.0,9.0,12.0,11.0,7.0,8.0,7.0,9.0,13.0,9.0,10.0,11.0,12.0,6.0,8.0,10.0,8.0,9.0,7.0,6.0,7.0,7.0,10.0,10.0,7.0,8.0,9.0,10.0,11.0,9.0,9.0,9.0,9.0,9.0,8.0,9.0,8.0,8.0,11.0,9.0,8.0,9.0,9.0,10.0,8.0,7.0,9.0,10.0,8.0,7.0,9.0,7.0,10.0,6.0,12.0,11.0,8.0,9.0,9.0,7.0,10.0,9.0,11.0,5.0,7.0,11.0,10.0,9.0
mean,,63.0,61.166667,58.866667,60.333333,55.5,54.366667,56.1,65.066667,55.8,55.333333,54.8,47.866667,58.7,52.9,52.8,58.3,57.666667,52.066667,53.866667,55.166667,58.566667,48.566667,59.666667,48.366667,49.466667,56.633333,49.733333,53.0,53.866667,52.366667,52.533333,56.766667,52.966667,47.366667,55.733333,60.033333,54.333333,51.3,50.766667,50.633333,54.666667,46.333333,49.9,46.033333,39.533333,36.533333,49.0,43.7,41.533333,35.066667,45.0,43.8,41.033333,40.133333,39.366667,43.266667,45.433333,39.5,46.4,51.0,42.333333,43.8,39.933333,36.0,38.133333,29.1,20.833333,24.1,43.233333,40.666667,54.7,58.866667,58.866667,63.166667,55.233333,51.833333,60.233333,57.0,57.4,57.133333,51.733333,51.0,61.3,55.166667,66.033333,60.3,55.8,61.033333,56.633333,59.933333,54.6,56.466667,60.133333,52.166667,62.233333,62.9,60.566667,57.433333,56.433333,...,58.366667,58.233333,49.066667,62.533333,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
std,,178.061012,170.57329,168.700864,170.844824,161.76798,155.628976,161.120291,186.940694,159.259796,159.024252,158.346064,135.547608,166.241028,154.480073,151.080339,170.389442,159.577747,148.335345,155.731142,154.724913,163.115927,140.649401,171.165242,135.823537,143.178146,163.366413,143.293028,154.897742,150.697169,146.526797,151.627205,157.091602,154.327347,135.130967,156.428222,171.477048,157.925415,148.947144,147.143065,146.55927,157.765285,134.161851,143.375165,133.55342,113.427094,105.449361,142.833397,120.282469,119.586279,104.862121,131.689813,129.182416,117.969746,115.609549,110.854202,123.24686,129.475516,114.343931,130.89443,147.477175,118.505153,125.241395,116.955321,102.507527,112.86175,87.066503,64.151267,69.77864,124.195253,117.974379,160.391794,168.76483,172.220535,179.368064,164.956772,154.765692,175.920374,166.164854,164.790693,166.93458,151.951656,150.283869,174.400876,159.125646,187.321866,170.266961,159.57344,172.562329,165.068739,172.432882,160.950753,164.915806,172.198108,149.852821,178.45712,180.330415,172.08011,161.453459,161.593292,...,169.940656,170.633037,143.314444,177.223166,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
min,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
25%,,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,...,0.0,0.0,0.0,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
50%,,2.5,1.5,2.0,2.5,2.0,2.5,2.5,1.0,4.0,1.0,2.0,2.0,1.5,1.0,2.5,2.5,2.0,1.5,2.0,1.0,1.5,2.0,2.0,2.0,1.0,1.5,1.0,1.5,2.5,1.5,2.0,2.5,1.0,2.0,2.0,1.5,1.0,2.0,2.5,2.0,2.5,1.5,1.5,1.0,2.5,0.5,2.0,1.5,2.0,1.0,1.0,1.0,1.0,1.0,2.5,2.0,3.5,2.5,1.5,0.5,2.0,2.5,0.5,1.0,1.0,1.0,0.5,1.0,2.0,0.5,1.5,1.5,4.0,2.5,2.5,1.0,1.0,3.0,1.0,1.0,2.5,2.5,1.5,2.0,2.0,2.0,4.5,3.0,1.5,2.5,2.0,1.5,2.5,2.0,1.5,1.5,3.5,1.0,2.0,...,2.0,2.0,1.5,1.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
75%,,20.75,31.0,17.0,31.25,16.0,15.25,17.75,30.75,20.25,19.75,13.0,20.5,38.0,20.5,22.75,14.5,21.75,27.5,23.5,20.0,23.0,10.75,18.75,26.0,19.75,20.75,24.5,20.0,23.25,19.0,21.0,19.75,17.0,16.75,27.5,27.0,21.75,14.0,27.0,25.0,26.25,20.0,18.25,16.5,13.0,15.5,19.5,23.0,15.5,10.75,12.0,15.0,17.5,12.75,17.0,17.0,19.75,11.5,27.0,24.5,22.0,20.75,12.75,13.5,12.0,8.5,5.75,10.25,15.0,15.0,19.25,21.0,17.0,38.0,16.0,14.75,27.75,20.5,23.75,22.75,16.25,15.25,24.25,23.75,30.75,29.5,24.0,21.5,19.25,23.75,24.25,21.75,28.0,13.0,13.25,34.5,28.5,22.25,17.0,...,20.75,13.0,16.25,35.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


### DS2 Strengths & Weaknesses
-  **Strength:** Monthly granularity aligns with rainfall; long time span supports trend/seasonality analysis.  
-  **Weakness:** Not event-level; cannot attribute severity to specific locations; needs exposure normalization.

### DS2 Suitability Conclusion
DS2 is **highly suitable** as the primary severity outcome proxy for later linear/logistic regression after temporal alignment and normalization.


## DS3 — Vehicles Involved in Fatal and Injury Road Traffic Accidents by Type of Vehicle, Annual (SINGSTAT)

**Purpose:** Vehicle-type involvement for exposure/composition.  
**Why it helps:** Helps test whether certain vehicle types align with more severe outcomes across years.
The dataset is available as https://data.gov.sg/datasets/d_0c4f0d2d9f99c20124cbb52ef83ad6df/view  



In [None]:

df3 = pd.read_csv("/content/VehiclesInvolvedInFatalAndInjuryRoadTrafficAccidentsByTypeOfVehicleAnnual.csv")
df3.head()


Unnamed: 0,DataSeries,2023,2022,2021,2020,2019,2018,2017,2016,2015
0,Total,13507,12346,10964,9852,14133,14062,14168,15369,14982
1,Bicycles And Power Assisted Bicycles,598,745,812,581,473,513,605,633,643
2,Motor Cycles & Scooters,4157,4102,3636,3364,4860,4748,4619,4913,4694
3,Motor Cars & Station Wagons,6409,5420,4716,4374,6643,6423,6680,7172,6930
4,Goods Vans & Pick-Ups,689,581,577,499,552,549,539,657,617


In [None]:

df3.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   DataSeries  8 non-null      object
 1   2023        8 non-null      int64 
 2   2022        8 non-null      int64 
 3   2021        8 non-null      int64 
 4   2020        8 non-null      int64 
 5   2019        8 non-null      int64 
 6   2018        8 non-null      int64 
 7   2017        8 non-null      int64 
 8   2016        8 non-null      int64 
 9   2015        8 non-null      int64 
dtypes: int64(9), object(1)
memory usage: 772.0+ bytes


In [None]:

df3.describe(include="all")


Unnamed: 0,DataSeries,2023,2022,2021,2020,2019,2018,2017,2016,2015
count,8,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0,8.0
unique,8,,,,,,,,,
top,Total,,,,,,,,,
freq,1,,,,,,,,,
mean,,3376.75,3086.5,2741.0,2463.0,3533.25,3515.5,3542.0,3842.25,3745.5
std,,4657.638971,4217.712921,3729.969207,3383.450901,4920.501535,4854.912475,4906.369737,5299.501291,5146.645482
min,,136.0,96.0,97.0,57.0,86.0,99.0,103.0,101.0,117.0
25%,,545.25,527.25,497.0,444.0,471.75,512.25,527.75,617.75,604.75
50%,,910.0,890.5,840.5,639.5,801.5,884.5,866.5,989.0,1028.0
75%,,4720.0,4431.5,3906.0,3616.5,5305.75,5166.75,5134.25,5477.75,5253.0


### DS3 Strengths & Weaknesses
- **Strength:** Explicit vehicle type categories; relevant to safety risk differences (e.g., motorcycles vs cars).  
-  **Weakness:** Annual granularity complicates alignment with monthly outcomes unless we aggregate monthly outcomes to annual.

### DS3 Suitability Conclusion
DS3 is **suitable** as a vehicle-mix dataset, with the caveat that later integration requires temporal aggregation/alignment decisions.


## DS4 — Rainfall: Monthly Total (NEA)

**Purpose:** Weather stress proxy.  
**Why it helps:** Monthly rainfall to test H₁ against monthly casualty outcomes.
The dataset can be accessed at https://data.gov.sg/datasets/d_b16d06b83473fdfcc92ed9d37b66ba58/view  



In [None]:

df4 = pd.read_csv("/content/RainfallMonthlyTotal.csv")
df4.head()


Unnamed: 0,month,total_rainfall
0,1982-01,107.1
1,1982-02,27.8
2,1982-03,160.8
3,1982-04,157.0
4,1982-05,102.2


In [None]:

df4.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 528 entries, 0 to 527
Data columns (total 2 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   month           528 non-null    object 
 1   total_rainfall  528 non-null    float64
dtypes: float64(1), object(1)
memory usage: 8.4+ KB


In [None]:

df4.describe(include="all")


Unnamed: 0,month,total_rainfall
count,528,528.0
unique,528,
top,2024-08,
freq,1,
mean,,180.041288
std,,114.742194
min,,0.2
25%,,96.75
50%,,160.8
75%,,239.65


### DS4 Strengths & Weaknesses
-  **Strength:** Consistent meteorological measurement; monthly aligns with DS2.  
-  **Weakness:** Rainfall alone may not capture visibility/storm intensity; extreme short spikes may be smoothed out.

### DS4 Suitability Conclusion
DS4 is **highly suitable** as a monthly weather covariate after standardizing date formats to match DS2.


## DS5 — Monthly Motor Vehicle Population by Vehicle Type (LTA)

**Purpose:** Exposure/control variable to normalize casualty counts and control for fleet growth.  
**Why it helps:** Enables per-vehicle casualty rates and controls for growth in the number of vehicles.
The dataset can be downloaded through https://data.gov.sg/datasets/d_2ecb009f1e1ec5a816a454944dec4022/view
**Upload as:** `sg_vehicle_population_monthly.csv`


In [None]:

df5 = pd.read_csv("/content/MonthlyMotorVehiclePopulationbyVehicleType.csv")
df5.head()


Unnamed: 0,month,vehicle_type,number
0,2012-01,Cars,593555
1,2012-01,Rental Cars,13970
2,2012-01,Taxi,27059
3,2012-01,Buses,17037
4,2012-01,Goods & Other Vehicles,159854


In [None]:

df5.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 444 entries, 0 to 443
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   month         444 non-null    object
 1   vehicle_type  444 non-null    object
 2   number        444 non-null    int64 
dtypes: int64(1), object(2)
memory usage: 10.5+ KB


In [None]:

df5.describe(include="all")


Unnamed: 0,month,vehicle_type,number
count,444,444,444.0
unique,74,9,
top,2012-01,Taxi,
freq,6,74,
mean,,,160577.155405
std,,,198119.328833
min,,,13970.0
25%,,,19371.25
50%,,,104473.5
75%,,,160561.25


### DS5 Strengths & Weaknesses
- ✅ **Strength:** Monthly exposure measure; supports normalization and controls for fleet growth.  
- ⚠️ **Weakness:** Vehicle population ≠ usage intensity (does not measure vehicle-km traveled).

### DS5 Suitability Conclusion
DS5 is **suitable** as an exposure/control dataset, particularly for constructing per-vehicle casualty rates.


---
## 6) Overall Selection Summary (Deliverable 1 conclusion)

**Selected datasets (5/5):**
- **DS2** monthly casualties as core outcome proxy  
- **DS4** monthly rainfall as weather covariate  
- **DS5** monthly vehicle population as exposure/control  
- **DS3** annual vehicles involved by type for vehicle-mix context (requires alignment)  
- **DS1** severity-by-cause to interpret drivers and define severity labels

**Main caveats to address later:**
- mixed temporal granularity (annual vs monthly) → decide aggregation strategy
- aggregation & multi-cause counting (DS1) → interpret carefully
- normalization needed (per vehicle population) to avoid misleading trends


## 7) Next Steps Plan (for Deliverables 2+)

1. **Standardize time keys** (parse Month/Year into a common datetime column).  
2. **Cleaning & transformation**
   - handle missing values and inconsistent category labels
   - reshape wide-to-long if needed (e.g., vehicle type columns)
3. **Temporal alignment**
   - align DS2 (monthly) with DS4 and DS5 directly
   - incorporate DS3 annual data by aggregating DS2 to annual OR mapping annual to months
4. **Feature engineering**
   - rainfall bins (low/medium/high)
   - per-vehicle casualty rates (casualties / vehicle population)
   - lagged rainfall features (previous month rainfall)
5. **Modeling (later deliverables)**
   - **Linear regression:** predict monthly casualties (or rate)
   - **Logistic regression:** classify “high-severity month” vs “normal month”
6. **Interpretation & reporting**
   - check assumptions, discuss limitations (aggregation, omitted variables)
