In [1]:
import pandas as pd

url = "https://raw.githubusercontent.com/fivethirtyeight/data/master/bad-drivers/bad-drivers.csv"
df = pd.read_csv(url)

df.head()


Unnamed: 0,State,Number of drivers involved in fatal collisions per billion miles,Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding,Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired,Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted,Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents,Car Insurance Premiums ($),Losses incurred by insurance companies for collisions per insured driver ($)
0,Alabama,18.8,39,30,96,80,784.55,145.08
1,Alaska,18.1,41,25,90,94,1053.48,133.93
2,Arizona,18.6,35,28,84,96,899.47,110.35
3,Arkansas,22.4,18,26,94,95,827.34,142.39
4,California,12.0,35,28,91,89,878.41,165.63


# Bad Drivers â€“ Risk & Insurance Impact Analysis

This notebook analyzes how driving risk and driver behavior relate to
insurance premiums and insurance company losses across U.S. states.

Data Source: FiveThirtyEight Bad Drivers dataset


In [2]:
df.info()
df.isna().sum()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51 entries, 0 to 50
Data columns (total 8 columns):
 #   Column                                                                                                  Non-Null Count  Dtype  
---  ------                                                                                                  --------------  -----  
 0   State                                                                                                   51 non-null     object 
 1   Number of drivers involved in fatal collisions per billion miles                                        51 non-null     float64
 2   Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding                                    51 non-null     int64  
 3   Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired                            51 non-null     int64  
 4   Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted                     

Unnamed: 0,0
State,0
Number of drivers involved in fatal collisions per billion miles,0
Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding,0
Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired,0
Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted,0
Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents,0
Car Insurance Premiums ($),0
Losses incurred by insurance companies for collisions per insured driver ($),0


In [3]:
#Do risky states pay more for car insurance?
df[[
    "Number of drivers involved in fatal collisions per billion miles",
    "Car Insurance Premiums ($)"
]].corr()


Unnamed: 0,Number of drivers involved in fatal collisions per billion miles,Car Insurance Premiums ($)
Number of drivers involved in fatal collisions per billion miles,1.0,-0.199702
Car Insurance Premiums ($),-0.199702,1.0


### Insight

The correlation between fatal crash rates and insurance premiums is weak and negative (-0.20).
This suggests that states with higher crash risk do not necessarily have higher insurance costs,
indicating potential mispricing of risk by insurers.


In [4]:
#Does drunk driving increase insurance company losses?
df[[
    "Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired",
    "Losses incurred by insurance companies for collisions per insured driver ($)"
]].corr()


Unnamed: 0,Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired,Losses incurred by insurance companies for collisions per insured driver ($)
Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired,1.0,-0.083916
Losses incurred by insurance companies for collisions per insured driver ($),-0.083916,1.0


### Insight

The correlation between alcohol-impaired fatal crashes and insurance losses is very weak (-0.08).
This suggests that drunk-driving rates alone do not strongly explain how much insurers lose per driver.


In [5]:
#Are drivers in high-risk states paying appropriately higher insurance premiums?
df["Risk"] = df["Number of drivers involved in fatal collisions per billion miles"]
df["Price"] = df["Car Insurance Premiums ($)"]

df.sort_values("Risk", ascending=False).head()


Unnamed: 0,State,Number of drivers involved in fatal collisions per billion miles,Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding,Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired,Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted,Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents,Car Insurance Premiums ($),Losses incurred by insurance companies for collisions per insured driver ($),Risk,Price
34,North Dakota,23.9,23,42,99,86,688.75,109.72,23.9,688.75
40,South Carolina,23.9,38,41,96,81,858.97,116.29,23.9,858.97
48,West Virginia,23.8,34,28,97,87,992.61,152.56,23.8,992.61
3,Arkansas,22.4,18,26,94,95,827.34,142.39,22.4,827.34
26,Montana,21.4,39,44,84,85,816.21,85.15,21.4,816.21


In [6]:
df.sort_values("Price", ascending=False).head()


Unnamed: 0,State,Number of drivers involved in fatal collisions per billion miles,Percentage Of Drivers Involved In Fatal Collisions Who Were Speeding,Percentage Of Drivers Involved In Fatal Collisions Who Were Alcohol-Impaired,Percentage Of Drivers Involved In Fatal Collisions Who Were Not Distracted,Percentage Of Drivers Involved In Fatal Collisions Who Had Not Been Involved In Any Previous Accidents,Car Insurance Premiums ($),Losses incurred by insurance companies for collisions per insured driver ($),Risk,Price
30,New Jersey,11.2,16,28,86,78,1301.52,159.85,11.2,1301.52
18,Louisiana,20.5,35,33,73,98,1281.55,194.78,20.5,1281.55
8,District of Columbia,5.9,34,27,100,100,1273.89,136.05,5.9,1273.89
32,New York,12.3,32,29,88,80,1234.31,150.01,12.3,1234.31
9,Florida,17.9,21,29,92,94,1160.13,144.18,17.9,1160.13


### Risk vs Price Mismatch

States with the highest fatal crash rates (North Dakota, South Carolina, West Virginia)
have relatively low insurance premiums, while safer states like DC and New Jersey
pay some of the highest premiums.

This indicates that insurance pricing across states does not closely follow actual
road safety risk.
