# Verifying two hypothesis with 95% certainty on data about accidents in the Czech Republic

In [3]:
import pandas as pd
import numpy as np
import scipy.stats as stats

In [4]:
df = pd.read_pickle("accidents.pkl.gz")

## Hypothesis 1.
Accidents on first-class and third-class roads have the same chance of causing a death. (𝜒2, chi-square test)

**Null Hypothesis - H0**: Death rate is same on the first-class roads as on the third-class roads. \
**Alternative Hypothesis - H1**: Death rate is different on the first-class roads as on the third-class roads.

In [5]:
dfl = df.copy()
dfl = dfl[["p36", "p13a"]]
dfl = dfl[dfl["p36"].isin([1, 3])]
dfl["caused"] = np.where(dfl["p13a"] > 0, np.bool_(True), np.bool_(False))
dfl["first-class"] = np.where(dfl["p36"] == 1, np.bool_(True), np.bool_(False))

ct = pd.crosstab(dfl["caused"], dfl["first-class"])
print(ct)
f"P-value: {stats.chi2_contingency(ct)[1]}"

first-class  False   True
caused                   
False        73352  78618
True           448    911


'P-value: 3.5395243450138555e-29'

Our `P-value` is way smaller than our treshold of acceptance so we can **reject the null hypothesis (H0)** and  declare that **there is** a statistically significant correlation between deaths and road class (first or third).

## Hypothesis 2.
In accidents of Škoda vehicles the cost of damage is lower than in accidents of Audi vehicles. (independent t-test, chosen for this hypothesis, because we are comparing two independent groups on the same continuous dependable variable)

**Null Hypothesis - H0**: μ1 = μ2 (the cost is the same) \
**Alternative Hypothesis - H1 (left-tailed)**: μ1 < μ2 (the cost is lower with Škoda vehicles).

In [6]:
dfl = df.copy()
dfl = dfl[["p45a", "p14"]].rename(columns={"p45a": "brand", "p14": "cost of damage"})
dfl = dfl[dfl["brand"].isin([2, 39])]
dfl["brand"] = np.where(dfl["brand"] == 2, "Audi", "Skoda")

group1 = dfl[dfl["brand"] == "Skoda"]
group2 = dfl[dfl["brand"] == "Audi"]

p_val = stats.ttest_ind(group1["cost of damage"], group2["cost of damage"], alternative="less")
f"P-value: {p_val[1]}"

'P-value: 4.622063681663005e-121'

Our `P-value` is way smaller than our treshold of acceptance so we can **reject the null hypothesis (H0)** and **accept** our left-tailed alternative hypothesis, which signals that the cost of damages should be lower with Škoda vehicles.