# Traffic Crash Analysis in Chicago

This notebook explores the relationship between weather, lighting conditions, time, and the severity of traffic crash injuries in Chicago. The dataset is cleaned, encoded, and analyzed using regression models to understand the impact of these factors on the total number of injuries.

In [3]:
# Imports and Loading Data
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

# Load the dataset
df = pd.read_csv("https://raw.githubusercontent.com/AlanK3/is204-final-group-project/refs/heads/main/data/2024ChicagoTrafficCrashes.csv")

df.head()

Unnamed: 0,CRASH_RECORD_ID,CRASH_DATE_EST_I,CRASH_DATE,POSTED_SPEED_LIMIT,TRAFFIC_CONTROL_DEVICE,DEVICE_CONDITION,WEATHER_CONDITION,LIGHTING_CONDITION,FIRST_CRASH_TYPE,TRAFFICWAY_TYPE,...,INJURIES_NON_INCAPACITATING,INJURIES_REPORTED_NOT_EVIDENT,INJURIES_NO_INDICATION,INJURIES_UNKNOWN,CRASH_HOUR,CRASH_DAY_OF_WEEK,CRASH_MONTH,LATITUDE,LONGITUDE,LOCATION
0,1ba71bdbb2e32c87c6da5f67f57e438da46506d0ac1478...,,11/19/2024 01:32:00 AM,40,TRAFFIC SIGNAL,FUNCTIONING PROPERLY,RAIN,"DARKNESS, LIGHTED ROAD",ANGLE,FOUR WAY,...,0.0,2.0,1.0,0.0,1,3,11,41.758878,-87.585682,POINT (-87.585682440932 41.758877741252)
1,ba0a5de0700aafcbe9a70baa842602009e5f3e277f1c09...,,11/19/2024 12:56:00 AM,30,TRAFFIC SIGNAL,FUNCTIONING PROPERLY,RAIN,DARKNESS,REAR TO SIDE,RAMP,...,0.0,0.0,3.0,0.0,0,3,11,41.834117,-87.675232,POINT (-87.675231579807 41.83411690885)
2,5e395590aa62bccdca20695731c4e61a239d2a8d5bc4a1...,,11/19/2024 12:30:00 AM,30,STOP SIGN/FLASHER,FUNCTIONING PROPERLY,RAIN,"DARKNESS, LIGHTED ROAD",REAR END,FOUR WAY,...,0.0,2.0,1.0,0.0,0,3,11,41.660502,-87.641476,POINT (-87.641475597346 41.660502294205)
3,5f2131e35aea6dda695e5f898e7bf6d2db433d03226f5d...,,11/19/2024 12:00:00 AM,25,NO CONTROLS,NO CONTROLS,RAIN,DARKNESS,PARKED MOTOR VEHICLE,ONE-WAY,...,0.0,0.0,1.0,0.0,0,3,11,41.762226,-87.614114,POINT (-87.614114000658 41.762226365253)
4,9d18dc01964e9d31ffffca1f97aa99d90c0a96b423e31f...,,11/18/2024 11:30:00 PM,30,TRAFFIC SIGNAL,FUNCTIONING PROPERLY,RAIN,"DARKNESS, LIGHTED ROAD",REAR END,NOT DIVIDED,...,0.0,0.0,3.0,0.0,23,2,11,41.87497,-87.676632,POINT (-87.67663229632 41.874969684894)


## Cleaning and Preparing Final Dataset
We filter and clean the dataset to include only relevant columns and remove unknown and missing values.

In [None]:
# Select relevant columns
df = df[["INJURIES_TOTAL", "WEATHER_CONDITION", "DAMAGE", "LIGHTING_CONDITION", "POSTED_SPEED_LIMIT", "CRASH_HOUR", "MOST_SEVERE_INJURY", "INJURIES_FATAL","INJURIES_INCAPACITATING","INJURIES_NON_INCAPACITATING"]]

# Remove rows with unknown lighting or weather conditions
df = df[(df["LIGHTING_CONDITION"] != 'UNKNOWN') & (df["WEATHER_CONDITION"] != 'UNKNOWN')]

# Create the final dataset
fin_df = df[["WEATHER_CONDITION", "LIGHTING_CONDITION", "INJURIES_TOTAL", "CRASH_HOUR"]]

# Drop missing values
fin_df = fin_df.dropna()

fin_df.head()

Unnamed: 0,WEATHER_CONDITION,LIGHTING_CONDITION,INJURIES_TOTAL,CRASH_HOUR
0,RAIN,"DARKNESS, LIGHTED ROAD",2.0,1
1,RAIN,DARKNESS,0.0,0
2,RAIN,"DARKNESS, LIGHTED ROAD",2.0,0
3,RAIN,DARKNESS,0.0,0
4,RAIN,"DARKNESS, LIGHTED ROAD",0.0,23


## Encoding Categorical Variables
We use `LabelEncoder` to convert categorical variables into numerical values suitable for regression analysis.

In [5]:
from sklearn.preprocessing import LabelEncoder

# Encode categorical variables
weather_encoder = LabelEncoder()
lighting_encoder = LabelEncoder()

fin_df['WEATHER_CONDITION_ENCODED'] = weather_encoder.fit_transform(fin_df['WEATHER_CONDITION'])
fin_df['LIGHTING_CONDITION_ENCODED'] = lighting_encoder.fit_transform(fin_df['LIGHTING_CONDITION'])

fin_df.head()

Unnamed: 0,WEATHER_CONDITION,LIGHTING_CONDITION,INJURIES_TOTAL,CRASH_HOUR,WEATHER_CONDITION_ENCODED,LIGHTING_CONDITION_ENCODED
0,RAIN,"DARKNESS, LIGHTED ROAD",2.0,1,6,1
1,RAIN,DARKNESS,0.0,0,6,0
2,RAIN,"DARKNESS, LIGHTED ROAD",2.0,0,6,1
3,RAIN,DARKNESS,0.0,0,6,0
4,RAIN,"DARKNESS, LIGHTED ROAD",0.0,23,6,1


## Regression Analysis
We evaluate the relationship between encoded weather, lighting conditions, crash hour, and total injuries using linear regression.

In [6]:
# Prepare data for regression
X_weather = fin_df[["WEATHER_CONDITION_ENCODED"]]
X_lighting = fin_df[["LIGHTING_CONDITION_ENCODED"]]
X_time = fin_df[["CRASH_HOUR"]]
X_combined = fin_df[["WEATHER_CONDITION_ENCODED", "LIGHTING_CONDITION_ENCODED", "CRASH_HOUR"]]
y = fin_df["INJURIES_TOTAL"]

# Initialize the regressor
regressor = LinearRegression()

# Weather condition regression
regressor.fit(X_weather, y)
y_pred_weather = regressor.predict(X_weather)
r2_weather = r2_score(y, y_pred_weather)

# Lighting condition regression
regressor.fit(X_lighting, y)
y_pred_lighting = regressor.predict(X_lighting)
r2_lighting = r2_score(y, y_pred_lighting)

# Time regression
regressor.fit(X_time, y)
y_pred_time = regressor.predict(X_time)
r2_time = r2_score(y, y_pred_time)

# Combined regression
regressor.fit(X_combined, y)
y_pred_combined = regressor.predict(X_combined)
r2_combined = r2_score(y, y_pred_combined)

# Create results dictionary
results = {
    'r2_weather_and_injuries': r2_weather,
    'r2_lighting_and_injuries': r2_lighting,
    'r2_time_and_injuries': r2_time,
    'r2_combined': r2_combined
}

results

{'r2_weather_and_injuries': 0.00011517118132731152,
 'r2_lighting_and_injuries': 0.0019889647066874128,
 'r2_time_and_injuries': 2.6697114307250303e-05,
 'r2_combined': 0.002093284924448491}

## Results and Conclusion
The R² values below indicate how well each factor explains the variation in total injuries.

### Results:
- Weather Condition and Injuries: **0.00011517118132731152**
- Lighting Condition and Injuries: **0.0019889647066874128**
- Time of Crash and Injuries: **2.6697114307250303e-05**
- Combined Factors and Injuries: **2.6697114307250303e-05**

### Observations:
The combined model marginally improves the explanatory power, but individual factors have very low R² values, suggesting limited predictive capability.